Rajib Mall Lecture Notes
Rajib Mall Lecture Notes
Dr. R. Mall
1
Fault-tolerance concepts
Fail safe:
design the system so that when it sustains a specified fault
it fails in a safe mode
Fault-tolerance concepts
Fail stop:
when the system sustains specified faults:
provides a subset of its required behavior
design time
Fault avoidance
Fault avoidance
Precise (and preferably formal) specification Adoption of quality principles Adoption of a design strategy based on information hiding Use of a strongly typed programming language Restriction on error-prone programming constructs such as pointers.
8
10
11
12
Fault diagnosis
The process of determining what caused the fault:
i.e. exactly which subsystem or component is faulty.
13
14
Fault recovery
The system must restore its state to a known safe state. Two options available:
correct the damaged state (forward error recovery) restore the system to a known safe state (backward error recovery)
15
Fault recovery
16
Fault repair
Involves modifying the system:
In many cases, software failures are transient:
occur due to peculiar combination of system inputs no repair is necessary as normal processing can continue immediately after fault recovery.
17
18
Module 2 Module 3
Result 3
Voting
19
20
21
N-version programming
Different versions of the software
implemented by different teams
executed in parallel
N-version programming
Version 1
Input Result 1
Version 2
Output comparator
Agreed result
Version n
Result n
23
N-version programming At least three versions of the software should be available The basic assumption:
versions developed by different engineers would not have similar errors.
24
Recovery blocks
A fine grain approach Each program component includes
an acceptance test:
checks if it executed successfully. Acceptance tests cannot determine what has gone wrong try blocks or recovery blocks.
25
Recovery blocks
Algorithm 1 test Acceptance test test result
retry
test
retry
Algorithm 2
Algorithm 1
27
Exception handling An exception is an error or an unexpected event When an exception has not been anticipated:
control is transferred to the system exception handling mechanism
29
32
sends the data with the checksum value receiver computes the checksum again if the two checksum values differ,
Watchdog timers
Used when a function must complete within a specific time period. Watchdog timer is a timer:
must be reset by the function after it completes execution. may be interrogated by a controller at regular intervals if for some reason,
the function does not terminate, the watchdog timer is not reset.
34
Fault recovery
Forward recovery
Correct damaged system state
use redundant information with Data corruption: Use coding technique which add redundant information Corruption of linked structures: include redundant pointers, e.g both forward and backward pointers.
36
37
38
Checkpointing
Transactions allow error recovery
because they do not commit changes to the database until they are completed. However, they do not allow recovery from system states that are valid but incorrect. Checkpointing can be used.
39
40
42
Fault-tree analysis
For each identified hazard:
a detailed analysis is carried out to discover the conditions which might cause the hazard. Fault-tree analysis involves identifying the undesired event
working backwards from the event to discover the possible causes.
43
Fault-tree analysis
44
45
46
used in the program, N1 be the total number of operators used in the program, N2 be the total number of operands used in the program.
47
48
Operators
Some general guidelines can be provided: All assignment, arithmetic, and logical operators are operators. A pair of parentheses,
as well as a block begin --- block end pair, are considered as single operators.
49
Operators
An if ... then ... else ... endif and a while ... do construct are single operators. A sequence (statement termination) operator ';' is a single operator. function call
Function name is an operator, I/O parameters are considered as operands.
50
The set of operators and operands for the ANSI C language:() [] . , -> *
+ - ~ ! ++ -- * / % + - << >> < > <= >= != == & ^ | && || = *= /= %= += -= <<= >>= &= ^= |= : ? { ; CASE DEFAULT IF ELSE SWITCH WHILE DO FOR GOTO CONTINUE BREAK RETURN and a 51 function name in a function
52
Examples:
53
Examples:
The function name in a function definition
not counted as an operator. int func ( int a, int b ) { ... }
the operators are: {}, ( ) We do not consider func, a, and b as operands.
54
Examples
(CONT.)
55
Program vocabulary:
number of unique operators and operands used in the program.
program vocabulary
= 1 + 2 .
56
Program Volume:
The length of a program:
total number of operators and operands used in the code depends on the choice of the operators and operands,
i.e. for the same program, the length depends on the style of programming.
57
Program Volume:
We can have highly different measures of length
for essentially the same problem.
58
To represent
different identifiers,
60
61
a program would have at least two operators and no less than the requisite number of operands (i.e. input/output data items).
62
1 = 2, 2 =n
63
64
65
Experience shows
E is well correlated to the effort needed for maintenance.
66
67
Length Estimation:
Halstead assumed that it is quite unlikely that a program has several identical parts --or substrings of length greater than ( being the program vocabulary).
68
Length Estimation:
In fact, once a piece of code occurs identically in several places,
it is usually made into a procedure or a function.
69
Length Estimation
(CONT.)
It is a standard combinatorial result that for any given alphabet of size K, there are exactly Kr different strings
of length r.
Thus, N/
Or,
< +1 N<
70
Length Estimation:
71
Length Estimation
(CONT.)
Length Estimation:
1 ( )2 ) (approximately) N=log2((1) 2 1 + log2 ( )2 Or, N=log2 (1) 2
=
1log2 1 + 2log2 2
Example:
main() { int a,b,c,avg; scanf("%d %d %d",&a,&b,&c); avg=(a+b+c)/3; printf("avg= %d",avg); }
74
Example:
The unique operators are: main, (), \{\}, int, scanf, \&, ",", ";", =, +, /, printf The unique operands are: a,b,c,\&a,\&b,\&c,a+b+c,avg,3," \%d \%d \%d", "avg=\%d
75
Example
(CONT.)
76
Summary
High reliability achieved through 3 complementary strategies:
fault avoidance fault tolerance fault detection
Fault tolerance:
Summary
Halsteads software science
analytical method. Lets us determine:
length volume effort time
78