ID2218 Design of Fault-tolerant Systems 7.5 credits

Design of Fault-tolerant Systems

Show course information based on the chosen semester and course offering:

Offering and execution

No offering selected

Select the semester and course offering above to get information from the correct course syllabus and course offering.

Course information

Content and learning outcomes

Course contents *

Fault tolerance is the ability of a system to continue performing its intended function despite of faults. In a broad sense, fault tolerance is associated with reliability, with successful operation, and with the absence of breakdowns.

The ultimate goal of fault tolerance is the development of a dependable system. As computer systems become relied upon by society more and more, dependability of these systems becomes a critical issue. In airplanes, chemical plants, heart pace-makers or other safety critical applications, a system failure can cost people's lives or environmental disaster.

There are various approaches to achieve fault-tolerance. Common to all these approaches is a certain amount of redundancy. This can a replicated hardware component, an additional check bit attached to a string of digital data, or a few lines of program code verifying the correctness of the program's results. In this course, we are going to study hardware as well and software fault tolerance. The rapid development of real-time computing applications that started around the mid-1990s, especially the demand for software-embedded intelligent devices, made software fault tolerance a pressing issue.

The following is a tentative list of topics to be covered:

  • Introduction
  • Definition of fault tolerance
  • Redundancy
  • Applications of fault-tolerance
  • Fundamentals of dependability
  • Attributes: reliability, availability, safety
  • Impairments: faults, errors and failures
  • Means: fault prevention, removal and forecasting
  • Dependability evaluation
  • Common measures: failures rate, mean time to failure, mean time to repair, etc.
  • Reliability block diagrams
  • Markov processes
  • Hardware redundancy
  • Redundancy schemes
  • Evaluation and comparison
  • Applications
  • Information redundancy
  • Codes: linear, Hamming, cyclic, unordered, arithmetic, etc.
  • Encoding and decoding techniques
  • Applications
  • Time redundancy
  • Software fault tolerance
  • Specific features
  • Software fault tolerance techniques: N-version programming, recovery blocks, self-checking software, etc.

Intended learning outcomes *

The aims of this course are:

  • to create understanding of the fundamental concepts of fault-tolerance
  • to learn basic techniques for achieving fault-tolerance in electronic,communication and software systems
  • to develop skills in modeling and evaluatingfault-tolerant architectures in terms of reliability, availability andsafety
  • to gain knowledge in sources of faults and means for their preventionand forecasting
  • to understand merits and limitations of fault-tolerant design 

Course Disposition

No information inserted

Literature and preparations

Specific prerequisites *

No information inserted

Recommended prerequisites

Basic understanding of circuits and digital logic.

Equipment

No information inserted

Literature

Course notes E. Dubrova, "Fault-Tolerant Design: An Introduction" (draft, will be distributed in the class).

Examination and completion

Grading scale *

A, B, C, D, E, FX, F

Examination *

  • ANN1 - Assignment, 1.5 credits, Grading scale: A, B, C, D, E, FX, F
  • TEN1 - Examination, 1.5 credits, Grading scale: A, B, C, D, E, FX, F
  • TEN2 - Examination, 4.5 credits, Grading scale: A, B, C, D, E, FX, F

Based on recommendation from KTH’s coordinator for disabilities, the examiner will decide how to adapt an examination for students with documented disability.

The examiner may apply another examination format when re-examining individual students.

Other requirements for final grade *

The final grade is based on five homework assignments (20%), a midterm exam (20%) and a final exam (60%). For PhD students, an additional task will be to read and present a paper approved by the instructor (20 min talk).

Opportunity to complete the requirements via supplementary examination

No information inserted

Opportunity to raise an approved grade via renewed examination

No information inserted

Examiner

Elena Dubrova

Further information

Course web

Further information about the course can be found on the Course web at the link below. Information on the Course web will later be moved to this site.

Course web ID2218

Offered by

EECS/Electronics and Embedded Systems

Main field of study *

Electrical Engineering

Education cycle *

Second cycle

Add-on studies

No information inserted

Ethical approach *

  • All members of a group are responsible for the group's work.
  • In any assessment, every student shall honestly disclose any help received and sources used.
  • In an oral assessment, every student shall be able to present and answer questions about the entire assignment and solution.

Supplementary information

In this course, the EECS code of honor applies, see: http://www.kth.se/en/eecs/utbildning/hederskodex.