Fault tolerance is the ability of a system to continue to carry out their intended function despite faults. In a broad sense, fault tolerance is associated with reliability, with successful operation, and with the absence of breakdowns The ultimate goal of fault tolerance is to develop a reliable system. When the society becomes increasingly dependant on computer systems, the reliability of these systems becomes a critical question. In airplanes, chemical plants, heart pace-makers or other safety-critical applications, a system failure can cost people's lives or environmental disaster. There are different approaches to achieve fault tolerance. . Common to all these approaches is a certain amount of redundancy. This can be a replicated hardware component, an additional check bit attached to a string of digital data, or a few lines of program code verifying the correctness of the program's results. In this course, we will study fault tolerance in both hardware and software. The rapid development of real-time computing applications that started around the mid-1990s, especially the demand for software-embedded intelligent devices, made software fault tolerance a pressing issue The following is a tentative list of topics to be covered:
- Introduction
 - Definition of fault tolerance
 - Redundancy
 - Applications of fault tolerance
 - Basic principles of reliability
 - Attribute: reliability, accessibility, safety
 - Deficiencies: errors, mistakes and failures
 - Means: prevention, removal and forecasting
 - Reliability evaluation
 - Common measures: failure rate, mean time to failure, mean time to repair, etc.
 - Reliability block diagram
 - Markov processes
 - Hardware redundancy
 - Redundancy timetables
 - Evaluation and comparison and applications
 - Information redundancy
 - Codes: linear, Hamming, cyclic, unordered, arithmetic, etc.
 - Encoding and decoding technicians and applications
 - Time redundancy
 - Software fault tolerance
 - Specific features
 - Software fault tolerance techniques: N-version programming, recovery block, self-monitoring software, etc.
 
