Skip to main content

FID3025 Architecting Efficient AI Hardware with technology and architectural design space exploration 7.5 credits

This course will teach the attendees how to efficiently design AI hardware that is 3-4 orders more efficient than GPUs and FPGAs. In essence, this course aims at teaching how to achieve non-incremental improvement in computational, silicon and engineering efficiencies. The course will specifically focus on widely used artificial neural networks like CNN, LSTM and SOM and show how to analyze their requirements and what are the architectural and technology options to do trade-off to find optimal solutions. The course will also cover how to do trade-off in implementation cost vs. Accuracy of neural networks.

Course offering missing for current semester as well as for previous and coming semesters
Headings with content from the Course syllabus FID3025 (Autumn 2020–) are denoted with an asterisk ( )

Content and learning outcomes

Course contents

The course consists of the following two modules:

Requirements Analysis

In this module, we study how to systematically extract requirements in terms of computational operations, their types, interconnect and storage. These requirements are logical and are independent of the implementation style. Many real-life examples will be discussed in class and students assigned problems for hands-on experience.

Being able to understand the energy requirements is the first step in creating low-energy and thus sustainable solutions.

Architecting AI Hardware and Understanding technology and architectural trade-offs

In this module, we study what are the architectural trade-offs when implementing AI hardware. We go into the details of memory hierarchy and their technology options. Memory is the most dominant cost-component and we study how to exploit temporal locality to minimize the cost of memory storage and memory access.

Next to memory, interconnect is the biggest challenge. Wires are the worst scaling aspect of technology today. For instance, moving data by 1 mm on a chip is comparable in energy cost to a single precision floating point. Besides energy cost, interconnect plays a strong role in architectural decisions as well. For instance, it is a common mistake to increase parallelism in computation without increasing the parallelism in access to memory. We show how we can architect designs that allow increase in computation with matching increase in bandwidth to memory.

Finally, we also study what are the options to implement the arithmetic operations in Neural Networks. We also study how to do trade-offs in terms of accuracy vs. implementation cost with the help of a concrete case study from the field of bacterial genome recognition.

Knowing these architectural and technological options to reduce energy will contribute to sustainable AI solutions.

Intended learning outcomes

After passing this course, the students will be able to

  1. Analyze the requirements of a real-life machine learning problem in terms of storage, computation and power,
  2. Make informed decisions based on available technology, architectural options, accurate estimates of area, performance and energy that would best meet the targets for the machine learning problem,
  3. Create low-energy custom AI solutions that would contribute to a sustainable development,
  4. Evaluate major research trends and understand what are the open challenges that the community is focusing on.

Course disposition

The course will be organized in terms of

  1. Introductory lectures that will introduce the challenge and the research reports that discusses these challenges and solutions. This part of the course will involve 4-5 lectures, spread over two weeks. After every lecture, students will be assigned the reading material and what specific questions they are supposed to answer in their reports and presentations. N.B: Students do not give any presentations or submit reports during the first two weeks, but they start working on them. These lectures will cover a) Requirements Analysis, b) Storage and Interconnect technology and trade-offs and c) Arithmetic Implementation Options.
  2. Student presentations and group discussions on state-of-the-art solutions. Students will present during the next 3 weeks, 4 X 30 mins presentations each with additional 15 mins reserved for discussion. Each presentation will involve one presenter and two active listeners. Students need to prepare a report based on their presentation and the feedback they get from the teacher and the active listeners. In the 3rd and 4th Weeks, the projects are introduced and assigned, along with the format in which the students submit their project report.
  3. Student presentations of their projects. During the 5th and 6th weeks, students work on their project and prepare the project presentation and report. Teacher available on demand for discussions. In weeks 7/8, the project presentations will be given and reports submitted

Literature and preparations

Specific prerequisites

Enrolled as a doctoral student.

Recommended prerequisites

The PhD students are expected to have a good understanding of neural networks and how they are used. Students are also expected to have a good understanding of computer architecture and digital design principles.




A selection of tutorials, survey and research papers will be provided.

Examination and completion

If the course is discontinued, students may request to be examined during the following two academic years.

Grading scale

P, F


  • EXA1 - Written examination, 7.5 credits, grading scale: P, F

Based on recommendation from KTH’s coordinator for disabilities, the examiner will decide how to adapt an examination for students with documented disability.

The examiner may apply another examination format when re-examining individual students.

Students will be assigned papers to read and present to the class. Their presentation will be used to judge how well they have grasped the contents of the paper and relate it to their problems.

Small projects will be defined keeping in mind the research interests of the students or groups of students.

Other requirements for final grade

To get a passing grade, the students must attend at least 80% of classes, submit all assignments and give the final presentation on their project.

Opportunity to complete the requirements via supplementary examination

No information inserted

Opportunity to raise an approved grade via renewed examination

No information inserted


Ethical approach

  • All members of a group are responsible for the group's work.
  • In any assessment, every student shall honestly disclose any help received and sources used.
  • In an oral assessment, every student shall be able to present and answer questions about the entire assignment and solution.

Further information

Course web

Further information about the course can be found on the Course web at the link below. Information on the Course web will later be moved to this site.

Course web FID3025

Offered by

Main field of study

This course does not belong to any Main field of study.

Education cycle

Third cycle

Add-on studies

No information inserted


Ahmed Hemani (

Postgraduate course

Postgraduate courses at EECS/Electronics and Embedded Systems