Fault-Tolerant Architectures

The document discusses reliability engineering and fault-tolerant system architectures. It describes topics like fault tolerance, self-monitoring architectures, triple modular redundancy, and software diversity. It also provides examples of fault-tolerant systems like protection systems and the Airbus flight control system.

Uploaded by

shiva

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

75 views23 pages

Fault-Tolerant Architectures

Uploaded by

shiva

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 23

Reliability Engineering

1
Topics covered
• Fault-tolerant architectures
• Programming for reliability

2
Reliability

• The probability of failure-free system operation over a

specified time in a given environment for a given
purpose

3
Reliability achievement
• Fault avoidance
– Development technique are used that either minimise the possibility
of mistakes or trap mistakes before they result in the introduction of
system faults.
• Fault detection and removal
– Verification and validation techniques are used that increase the
probability of detecting and correcting errors before the system
goes into service are used.
• Fault tolerance
– Run-time techniques are used to ensure that system errors do not
lead to system failures.
4
ATM reliability specification
• Key concerns
– To ensure that their ATMs carry out customer services as requested and
that they properly record customer transactions in the account
database.
– To ensure that these ATM systems are available for use when required.

5
Fault-tolerant architectures

6
Fault tolerance
• In critical situations, software systems must be fault tolerant.
• Fault tolerance is required where there are high availability
requirements or where system failure costs are very high.
• Fault tolerance means that the system can continue in
operation in spite of software failure.
• Even if the system has been proved to conform to its
specification, it must also be fault tolerant as there may be
specification errors or the validation may be incorrect.

7
Fault-tolerant system architectures
• Fault-tolerant systems architectures are used in situations
where fault tolerance is essential. These architectures are
generally all based on redundancy and diversity.
• Examples of situations where dependable architectures are
used:
– Flight control systems, where system failure could threaten the safety of
passengers
– Reactor systems where failure of a control system could lead to a
chemical or nuclear emergency
– Telecommunication systems, where there is a need for 24/7 availability.
8
Protection systems
• A specialized system that is associated with some other control
system, which can take emergency action if a failure occurs.
– System to stop a train if it passes a red light
– System to shut down a reactor if temperature/pressure are too high
• Protection systems independently monitor the controlled
system and the environment.
• If a problem is detected, it issues commands to take emergency
action to shut down the system and avoid a catastrophe.

9
Protection system architecture

10
Protection system functionality
• Protection systems are redundant because they include
monitoring and control capabilities that replicate those in the
control software.
• Protection systems should be diverse and use different
technology from the control software.
• Aim is to ensure that there is a low probability of failure on
demand for the protection system.

11
Self-monitoring architectures
• Multi-channel architectures where the system monitors its own
operations and takes action if inconsistencies are detected.
• The same computation is carried out on each channel and the
results are compared. If the results are identical and are
produced at the same time, then it is assumed that the system
is operating correctly.
• If the results are different, then a failure is assumed and a
failure exception is raised.

12
Self-monitoring architecture

13
Self-monitoring systems
• Hardware in each channel has to be diverse so that
common mode hardware failure will not lead to each
channel producing the same results.
• Software in each channel must also be diverse, otherwise
the same software error would affect each channel.
• If high-availability is required, you may use several self-
checking systems in parallel.
– This is the approach used in the Airbus family of aircraft for their
flight control systems.
14
Airbus flight control system
architecture

15
Airbus architecture discussion
• The Airbus FCS has 5 separate computers, any one of which can
run the control software.
• Extensive use has been made of diversity
– Primary systems use a different processor from the secondary systems.
– Primary and secondary systems use chipsets from different manufacturers.
– Software in secondary systems is less complex than in primary system –
provides only critical functionality.
– Software in each channel is developed in different programming languages
by different teams.
– Different programming languages used in primary and secondary systems.

16
N-version programming
• Multiple versions of a software system carry out
computations at the same time. There should be an
odd number of computers involved, typically 3.
• The results are compared using a voting system and
the majority result is taken to be the correct result.
• Approach derived from the notion of triple-modular
redundancy, as used in hardware systems.
17
Hardware fault tolerance
• Depends on triple-modular redundancy (TMR).
• There are three replicated identical components that receive
the same input and whose outputs are compared.
• If one output is different, it is ignored and component failure
is assumed.
• Based on most faults resulting from component failures
rather than design faults and a low probability of
simultaneous component failure.

18
Triple modular redundancy

19
N-version programming

20
Software diversity
• Approaches to software fault tolerance depend on software
diversity where it is assumed that different implementations of
the same software specification will fail in different ways.
• It is assumed that implementations are (a) independent and (b)
do not include common errors.
• Strategies to achieve diversity
– Different programming languages
– Different design methods and tools
– Explicit specification of different algorithms

21
Problems with design diversity
• Teams are not culturally diverse so they tend to tackle problems in
the same way.
• Characteristic errors
– Different teams make the same mistakes. Some parts of an implementation
are more difficult than others so all teams tend to make mistakes in the
same place;
– Specification errors;
– If there is an error in the specification then this is reflected in all
implementations;
– This can be addressed to some extent by using multiple specification
representations.
22
Improvements in practice
• In principle, if diversity and independence can be
achieved, multi-version programming leads to very
significant improvements in reliability.
• In practice, observed improvements are much less
significant but the approach seems leads to reliability
improvements of between 5 and 9 times.
• The key question is whether or not such improvements
are worth the considerable extra development costs for
multi-version programming.
23

Autonomic Computing - Principles, Design and Implementation
50% (2)
Autonomic Computing - Principles, Design and Implementation
298 pages
1 Chapter 13 Dependability Engineering
No ratings yet
1 Chapter 13 Dependability Engineering
50 pages
SSDLC Exercise 10
No ratings yet
SSDLC Exercise 10
13 pages
Dependable and Secure Computing Concepts
No ratings yet
Dependable and Secure Computing Concepts
14 pages
Chapter-02
No ratings yet
Chapter-02
19 pages
Unit4 Reliability Evaluation
No ratings yet
Unit4 Reliability Evaluation
5 pages
Module 5 Software Redundancy-Short
No ratings yet
Module 5 Software Redundancy-Short
48 pages
Ch13 QuizSoln
No ratings yet
Ch13 QuizSoln
4 pages
SWE-600 SW Dependable System
No ratings yet
SWE-600 SW Dependable System
48 pages
RTES RELIABILITY AND FAULT TORELANCE
No ratings yet
RTES RELIABILITY AND FAULT TORELANCE
40 pages
Lecture 12 13_0fb5cd457ac784a391e274f520c0f444
No ratings yet
Lecture 12 13_0fb5cd457ac784a391e274f520c0f444
69 pages
Sivam 219303066 Research Paper Reliability
No ratings yet
Sivam 219303066 Research Paper Reliability
16 pages
Sivam 219303066 Research Paper Reliability 1
No ratings yet
Sivam 219303066 Research Paper Reliability 1
16 pages
Rajib Mall Lecture Notes
No ratings yet
Rajib Mall Lecture Notes
78 pages
Rts
No ratings yet
Rts
44 pages
Design, Testing, and Evaluation Techniques For Software Reliability Engineering
No ratings yet
Design, Testing, and Evaluation Techniques For Software Reliability Engineering
8 pages
Reliable System Design: Hardware Design Checklist Testing Embedded Systems Critical Systems
No ratings yet
Reliable System Design: Hardware Design Checklist Testing Embedded Systems Critical Systems
28 pages
Lect8 FaultTolerance
No ratings yet
Lect8 FaultTolerance
37 pages
Design Patterns For High Availability
No ratings yet
Design Patterns For High Availability
10 pages
UNIT3
No ratings yet
UNIT3
15 pages
July 2011 Master of Computer Application (MCA) - Semester 3 MC0071 - Software Engineering - 4 Credits
No ratings yet
July 2011 Master of Computer Application (MCA) - Semester 3 MC0071 - Software Engineering - 4 Credits
11 pages
Unit 11 Dependability-and-Security
No ratings yet
Unit 11 Dependability-and-Security
39 pages
Chapter 1 – Dependable systems
No ratings yet
Chapter 1 – Dependable systems
35 pages
Software Reliability
No ratings yet
Software Reliability
24 pages
N-Version Programming A Fault-Tolerance Approach To Reliability Software Operation
No ratings yet
N-Version Programming A Fault-Tolerance Approach To Reliability Software Operation
7 pages
Chapter 10
No ratings yet
Chapter 10
5 pages
Fault Tolerance Computing Lecture Note
No ratings yet
Fault Tolerance Computing Lecture Note
61 pages
Software: Its Nature and Qualities
100% (1)
Software: Its Nature and Qualities
11 pages
CS-3224 Tuto II (ကျက်ရန်)
No ratings yet
CS-3224 Tuto II (ကျက်ရန်)
2 pages
Assignment Two
No ratings yet
Assignment Two
7 pages
Reasoning About The Reliability of Diverse Two-Channel Systems in Which One Channel Is "Possibly Perfect"
No ratings yet
Reasoning About The Reliability of Diverse Two-Channel Systems in Which One Channel Is "Possibly Perfect"
19 pages
AI 940 Dep Architectures
No ratings yet
AI 940 Dep Architectures
65 pages
Cap 10
No ratings yet
Cap 10
21 pages
16 Fault Tolerance
No ratings yet
16 Fault Tolerance
34 pages
Fault Tolerance in Distributed Systems
No ratings yet
Fault Tolerance in Distributed Systems
6 pages
sr notes
No ratings yet
sr notes
11 pages
Robust Subh PDF
No ratings yet
Robust Subh PDF
30 pages
001. Lesson 1 - Introduction to Fault-Tolerant Computing
No ratings yet
001. Lesson 1 - Introduction to Fault-Tolerant Computing
6 pages
Faulttolerancech5 150426005118 Conversion Gate02
No ratings yet
Faulttolerancech5 150426005118 Conversion Gate02
24 pages
7.Fault_Tolerance
No ratings yet
7.Fault_Tolerance
35 pages
Chapter 5
No ratings yet
Chapter 5
40 pages
LEC17 (SW) (2)
No ratings yet
LEC17 (SW) (2)
40 pages
RTS UNiT 4
No ratings yet
RTS UNiT 4
19 pages
Basic Concepts of Reliability
No ratings yet
Basic Concepts of Reliability
9 pages
Fault Avoidance and Tolerance Technique
No ratings yet
Fault Avoidance and Tolerance Technique
15 pages
Fault Tolerance: Click To Add Text Dealing Successfully With Partial System. Key Technique: Redundancy
No ratings yet
Fault Tolerance: Click To Add Text Dealing Successfully With Partial System. Key Technique: Redundancy
48 pages
Week09-Fault Tolerant System
No ratings yet
Week09-Fault Tolerant System
26 pages
Software Reliability1
No ratings yet
Software Reliability1
37 pages
IAU-ST-Lecture2
No ratings yet
IAU-ST-Lecture2
30 pages
Testing
No ratings yet
Testing
4 pages
Software Engg. (Unit-5)
No ratings yet
Software Engg. (Unit-5)
13 pages
Software Reliability
No ratings yet
Software Reliability
7 pages
Chapter-01
No ratings yet
Chapter-01
34 pages
Software Quality Attributes: Software Quality Is A Broad and Important Field of
No ratings yet
Software Quality Attributes: Software Quality Is A Broad and Important Field of
5 pages
Dependability
No ratings yet
Dependability
42 pages
Design of Fault Tolerant Systems
No ratings yet
Design of Fault Tolerant Systems
7 pages
Software Fault Tolerance Methods
No ratings yet
Software Fault Tolerance Methods
50 pages
Computer and Spftware Reliability
No ratings yet
Computer and Spftware Reliability
4 pages
Embedded Systems Programming with C++: Real-World Techniques
From Everand
Embedded Systems Programming with C++: Real-World Techniques
Robert Johnson
No ratings yet
Penetration Testing Fundamentals-2: Penetration Testing Study Guide To Breaking Into Systems
From Everand
Penetration Testing Fundamentals-2: Penetration Testing Study Guide To Breaking Into Systems
Devi Prasad
No ratings yet
Efficient Deployment Automation with Fabric: Definitive Reference for Developers and Engineers
From Everand
Efficient Deployment Automation with Fabric: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Software Developmentnew
No ratings yet
Software Developmentnew
39 pages
Security Engineering
No ratings yet
Security Engineering
26 pages
Programming For Reliability
No ratings yet
Programming For Reliability
17 pages
Android - Android Program To Create Multiple Activities Within An Application
No ratings yet
Android - Android Program To Create Multiple Activities Within An Application
5 pages
Mini Project Guideline
No ratings yet
Mini Project Guideline
2 pages
Computer Science Syllabus
No ratings yet
Computer Science Syllabus
115 pages
LTRCRT-2608 - CCNP Data Center Unified Fabric Implementation (DCUFI) Lab (2014 San Francisco) - 4 Hours
No ratings yet
LTRCRT-2608 - CCNP Data Center Unified Fabric Implementation (DCUFI) Lab (2014 San Francisco) - 4 Hours
14 pages
2025 Cybersecurity Attacks Playbooks
No ratings yet
2025 Cybersecurity Attacks Playbooks
81 pages
Internet and Email
No ratings yet
Internet and Email
6 pages
LEVELMASTER H8 Utility User's Guide Rel 25 Apr 2006
No ratings yet
LEVELMASTER H8 Utility User's Guide Rel 25 Apr 2006
20 pages
Lab 6
No ratings yet
Lab 6
25 pages
Advanced Web Programming-Part 2
No ratings yet
Advanced Web Programming-Part 2
23 pages
Academic Period Allotment System Excel Template - ExcelDataPro
No ratings yet
Academic Period Allotment System Excel Template - ExcelDataPro
14 pages
8 Functional Testing Types Explained With Examples
No ratings yet
8 Functional Testing Types Explained With Examples
7 pages
016 International Conference On New Scientific Trends and Challenges (ITALY) - 135-143 Pages
No ratings yet
016 International Conference On New Scientific Trends and Challenges (ITALY) - 135-143 Pages
10 pages
PTP Tips
No ratings yet
PTP Tips
3 pages
Manual Workshop Comandos
No ratings yet
Manual Workshop Comandos
554 pages
FMW 122140 Certmatrix
No ratings yet
FMW 122140 Certmatrix
143 pages
FAQ en No Valid License Available
No ratings yet
FAQ en No Valid License Available
2 pages
制作精美的幻灯片演示文稿
100% (2)
制作精美的幻灯片演示文稿
11 pages
This Is Probably From The Viewpoint of Criticality of The Problem Faced by The Client As Defined by SAP
No ratings yet
This Is Probably From The Viewpoint of Criticality of The Problem Faced by The Client As Defined by SAP
13 pages
Karthik Miniproject
No ratings yet
Karthik Miniproject
4 pages
Bootcamp Week 1
No ratings yet
Bootcamp Week 1
11 pages
Turbo HD DVR V3.5.37 & V3.5.51 - Build - 180718 Release Note - EU Version
No ratings yet
Turbo HD DVR V3.5.37 & V3.5.51 - Build - 180718 Release Note - EU Version
5 pages
PS CORE Graduate Programme Overview
100% (1)
PS CORE Graduate Programme Overview
2 pages
Java/J2EE Programmer Practice Test: A) B) C) D) E)
No ratings yet
Java/J2EE Programmer Practice Test: A) B) C) D) E)
16 pages
Security Guide
100% (2)
Security Guide
134 pages
Internet Communication Manager (ICM) : Implementation Considerations
No ratings yet
Internet Communication Manager (ICM) : Implementation Considerations
3 pages
Tutorial 2: On Objects & Classes, Inheritance and Polymorphism
No ratings yet
Tutorial 2: On Objects & Classes, Inheritance and Polymorphism
2 pages
CiscoUmbrella TDMHighlights PreSales 2023 PDF
No ratings yet
CiscoUmbrella TDMHighlights PreSales 2023 PDF
39 pages
InputDirector LOG
No ratings yet
InputDirector LOG
3 pages
Addendum For CARESCAPE Monitor B650 Service Manual: Networking Disclosure To Facilitate Network Risk Management
No ratings yet
Addendum For CARESCAPE Monitor B650 Service Manual: Networking Disclosure To Facilitate Network Risk Management
16 pages
70-740 Exam Some Questions That I Have Attempted
No ratings yet
70-740 Exam Some Questions That I Have Attempted
19 pages

Fault-Tolerant Architectures

Uploaded by

Fault-Tolerant Architectures

Uploaded by

Reliability Engineering

• The probability of failure-free system operation over a

You might also like