Computerized Approach for Matrixform Fmea 1979
Computerized Approach for Matrixform Fmea 1979
217B (MIL 217B) are compared with semiconductor chip distribution system reflects the goals of service continuity
vendor data and data from Carnegie Mellon University's and improvement of service reliability. However, this
multiprocessor systems. Based on these comparisons a results in higher initial cost.
modified MIL 217B model is proposed. The modified A possible means of analyzing costs vs reliability may be
model is employed to calculate module failure rates for reliability analysis which may offer a cost reliability trade-
the three multiprocessors designed, implemented, and off decision affecting design.
currently operating at C M U . Hard failure reliability models In this paper, a set of general equations developed at
for these three systems are presented. These models use the The Ralph M. Parsons C o m p a n y is presented for use of
calculated module failure rates as a basis for a consistent computing the failure rate of a redundant system with any
comparison of the three systems. given number of identical units, some of which arc
necessary t\~r successful operation. The remainder arc
Practical applications of accelerated testing of electronic redundant. From the obtained system failure rate, system
devices. C. E. JOWETT. Mieroelectronics J. 9, (2) 19 (1978). reliability and system availability can be determined.
The value of accelerated tests lies in their ability to create
in a short period of time the same failure modes that Designing reliable computer systems--the fault-tolerant
would take much longer to occur under normal use condi- approach--I. R. G. BENNETTS. Electron. Power p. 846
tions. (November/December 1978). There are two strategies lbr
Accelerated testing can be used extensively for the increasing the reliability of a computer system. The first,
evaluation of new products, process and material changes called fault avoidance, is to try and remove the source of
and long-term lhilure rate prediction. every possible fault that could give rise to an error condi-
The effectiveness of equipment built with microcircuits tion. In practice this is not possible and the second strategy,
can be achieved only if the microcircuit reliability is called fault tolerance, is to incorporate variotis forms of
assured. protective redundancy in the form of additional hardware,
additional software and time replication. This first part of a
Variance and approximate confidence limits for probability two-part tutorial guide to the fault-tolerant approach is a
and frequency of system failure. GUSTAVO E. GONZALEZ- general survey of the m a n y techniques for improving fault
URDANETA and BRIAN J. CORY. IEEE Trans. Reliab. R-27, tolerance, and comments also on the design of sell'-
(4) 289 (October 1978). Two different types of approxima- checking logic circuits. The second part will look in some
tion account for the effect of uncertain component failure detail at one particular implementation (the JPL-Star
and repair rates on the probability and the frequency of computer) and concludes with a brief survey of other
failure of repairable systems: I. Use the orthodox statistical proposals and implementations.
procedure of characterizing distributions by their low order
moments and then use the conservative Chebyshev in- Architectures for fault-tolerant spacecraft computers. DAVID
equality to bound the probabilities so that the random A. RENNELS. Proc. IEEE 66, (10) 1255 (October 1978).
variables lie within a certain range. 2. Apply Monte Carlo This paper summarizes the results of a long-term research
simulation based on c o m m o n probability models, with program in fault-tolerant computing ['or spacecraft on-
which component test data can be translated into approxi- board processing. In response to changing device technology
mate system reliability limits al any s-confidence level. The this program has progressed from the design of a fault-
two types are compared by an example. The performance tolerant uniprocessor to the development of fault-tolerant
of both types improve with the sample size of component distributed computer systems. The unusual requirements
data, but bounds from the Chebyshev inequality are wider of spacecraft computing are described along with the
than those obtained from Monte Carlo simulation. Both resulting real-time computer architectures. The following
approaches represent the system by minimal cut-sets. The aspects of these designs are discussed: (1) architectural
algorithms are intended for digital-computer implementa- features to minimize complexity in the distributed computer
tion : computational times are provided. system, (2) fault-detection and recovery, (3) techniques to
enhance reliability and testability, and (4) design approaches
Tests for "New Different than Repaired". V. S. SRINIVASAN. for LSI implementation.
IEEE TrwTs. Reliab. R-27, (4) 280 (October 1978). Distri-
bution-free methods are evolved for evaluating whether Computerized approach for matrix-form FMEA. JortN M.
"New Better than Repaired" (NBR) or "New Worse than LEGG. IEEE Trans. Reliab. R-27, (4) 254 (October 1978).
Repaired" ( N W R ) are good alternatives to New Same as This paper discusses a computerized technique for preparing
Repaired. The methods are simple and optimal. a matrix-lbrm of failure modes and effects analysis which
has previously been completed using manual methods. The
Fault-tolerance: The survival attribute of digital systems. basic input to the computer is a definition of each element
ALGIRDAS AVIZIENIS. Proc. IEEE 66, (10) 1109 (October of the system, applicable failure modes, and the resultant
1978). Fault-tolerance is the architectural attribute of a effect of each failure mode. From this input, the computer
digital system that keeps the logic machine doing its develops the matrix, locates the intersections on the matrix
specified tasks when its host, the physical system, suffers between elements of the system and effects, and formulates
various kinds of failures of its components. A more the text of the failure effects. The principal advantages in
general concept of fault-tolerance also includes h u m a n using the computer include lower cost, greater accuracy,
mistakes committed during software and hardware imple- faster preparation, overall consistency in the final effect
mentation and during man/machine interaction a m o n g the statements, and elimination of many repetitious and
causes of faults that are to be tolerated by the logic tedious tasks.
machine. This paper discusses the concept of fault-
tolerance, the reasons for its inclusion in digital system Designing reliable computer systems--the fault-tolerant
architecture, and the methods of its implementation. A approach--2. R. G. BENNETTS. Electron. Power p. 51
chronological view of the evolution of fault-tolerant (January 1979). This article is the second part of a two-part
systems and an outline of some goals for its further tutorial article on the design of fault-tolerant digital systems.
development conclude the presentation. The first part, published in November, presented the
general principles of fault tolerance and discussed the use
Reliability analysis for redundancy of industrial power dis- of redundancy to increase system reliability and availability.
tribution systems. LUKE Yu. Microelectron. Reliab. 18, The article also commented on the design of self-testing
259 (1978). Redundancy in design of industrial power logic circuits. In this second part, we concentrate more on