0% found this document useful (0 votes)
213 views

Human Error Analysis Harrison

The document discusses human error analysis and reliability assessment. It provides an overview of: 1) The purpose of human error assessment which is to explore design difficulties early and assess the likelihood of human error in a developed design. 2) Requirements from the CAA that all design precautions must be taken to prevent human errors causing hazardous effects. 3) Issues with assessing dependability without considering human reliability, and problems with using probabilities to model human error due to lack of valid data.
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
213 views

Human Error Analysis Harrison

The document discusses human error analysis and reliability assessment. It provides an overview of: 1) The purpose of human error assessment which is to explore design difficulties early and assess the likelihood of human error in a developed design. 2) Requirements from the CAA that all design precautions must be taken to prevent human errors causing hazardous effects. 3) Issues with assessing dependability without considering human reliability, and problems with using probabilities to model human error due to lack of valid data.
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 64

Human error analysis and reliability assessment

Michael Harrison

iri

Purpose of Human Error Assessment


explore difficulties of use early in design with the aim of improving design
- hence comparable with other usability and walkthrough techniques

assessing likelihood of human error of a developed design as part of an assessment process


- hence comparable with other reliability assessment techniques - Method for focus here will be qualitative or descriptive rather than a quantitative but will discuss role of quantitative techniques

iri

CAA Draft Requirements and Advisory Material


It must be shown by analysis, substantiated where necessary by test, that as far as reasonably practicable all design precautions have been taken to prevent human errors in production, maintenance and operation causing hazardous or catastrophic effect
iri
3

Assessing Dependability
Analysing and measuring dependability without assessing human reliability is at best incomplete and at worst misleading Human dependability not just at the sharp end, also maintenance crew, operator support, management teams, organisational personnel However still not part of core standards such as 61508 (i.e. tends to be in ancillary documentation)

iri

The human in the loop?


Treat those aspects of the system as totally unreliable?
(or to ignore it - which has been the approach in some sectors)
problem: either fails to recognise safety issues or produces an absolute worst case design.

Provide probabilistic safety arguments at the same level as for the rest of the system
plugging in to a failure modes and effects analysis.

Provide a structured qualitative assessment


systematic consideration of the design of the system.

iri

validity

Problem with probabilities

probabilities either collected in the laboratory, simulator or the field


data rarely available for the same system and the same circumstances, particularly when a new design.

feasibility
failure rates of 10**-9 required for catastrophic events - trials could never produce data at these levels where humans are involved, and certainly not on a new design.

range
5 x 10 -5 to 5 x 10-3 for automatic acts 5 x 10-4 to 5 x 10-2 for rule-based acts 5 x 10-3 to 5 x 10-1 for knowledge-based acts

However there are good reasons for taking a numerical approach


numbers is better than no numbers all of the time

iri

Difficulties with probabilities


Engineering approach Quantitative decomposition Cognitive approach

Human treated as a component


in a complex system The mechanistic assumption: The human / mind as a fallible machine The atomistic assumption: Human performance can be adequately described by considering individual elements of the performance. Total performance is an aggregate of the individual performance elements (dominant approach to HRA)

Explicit use of models and theories of cognitive functions which underlie human behaviour

Cognitive psychology still immature Problem: human cognition is not directly observable

iri

Human reliability assessment


Generic process covering many methods Used extensively in the nuclear power industry Usually a back-end assessment exercise Adapted from Kirwan (1994) Will discuss some issues of system description

Problem definition System description Task description Error identification Representation Screening required? Quantification Impact assessment Reliability acceptable? QA / Documentation Factors influencing performance

Error reduction

iri

HRA History
35 years, pace accelerated after Three Mile Island; Interdisciplinary - based on, and copies approaches used in, reliability engineering. Attempts to provide same level of quantification; 1st generation HRA: Simple manifestations (omissions, commissions, extraneous actions) + Behavioural categories (detection/diagnosis/execution) ==> less complexity in determining associated probabilities; not ideal, but no clear alternatives; Increasing recognition today of the fundamental role and importance of human cognition. iri
9

Human error identification


Behavioural and/or causal guide words systematically applied to what people do (using tasks or scenarios as a starting point)
imaginatively finding situations where there are problems

Human hazard identification


identifying areas in the design where problems might arise

Behavioural guide words


traditional approach of HRA, for example THERP and HAZOP.

Causal guide words


based on some model of what causes error, approach used by THEA.

iri

10

Behavioural guide words


Traditional HRA guidewords for error analysis: (Swain & Guttman,1983) Take a so called phenotypical perspective as a starting point
Errors of Omission Commission
Omit actions / sub-goals Substitute actions / sub-goals Carry out action incorrectly Insert extraneous action

Errors of Sequence Repetition Qualitative error Time error iri

Actions in wrong order Actions repeated unnecessarily Too much / too little Too early / too late / too long
11

Examples
Omission: operator fails to close the valve. Commission: operator turns the valve clockwise thereby opening it wider rather than closing it. Commission (extraneous): instead of closing the isolation valve, operator switches off the pump because pump on-off switch is close to isolation valve (doing the wrong thing) iri

12

Some problems of definition


e.g. task: Entering an altitude (ALT) value into the altitude alert (ALT ALERT) window: Substitution error could be
Doing something other than entering data; Entering data into a different device; Entering a distance value instead of the altitude.

Commission error is not very constraining as a guide due to the large number of substitutions possible; What is needed is more cognitive analysis for attributing error causes. iri
13

Error analysis method


Start at a goal (task or scenario) Apply guide words to goal and corresponding plan Record the guide word and error generated in a table Risks associated with errors significant? If so, consider sub-goals and repeat process Issues: what questions applied to what Guide word

Guide word

ActionAction TypeHuman ErrorL FIREAlert acknowledge- not read (pilot unaware of warning)- misread (pi

Extract of pilot procedure for a left engine failure during takeoff (pilot task: deal with the fire)

Error Error

iri

14

Error analysis
ActionAction TypeHuman ErrorL FIREAlert acknowledge- not read (pilot unaware of warning- misread (pil

Human ErrorApplicable design featuresNot reading L FIRE- Highlight (flashing red caption head-down)

Assign HEPs from lookup tables:


1 x 10-4 per action 1 x 10-3 per action

iri

15

Further developments
How might we make the questions richer and more easy to apply by non-experts? Use a questionnaire based on human causes of error Focus questions around some classification of aspects of system use where error might occur Take account of contextual factors when asking questions.
iri
16

Requirements for THEA


Make it possible for system engineers to use without specific human factors expertise Easy and efficient to apply and sensible to use iteratively to refine design Relevant to display and action as well as goals and plans Not goal and plan focussed - takes account of context Descriptive rather than quantitative - however possible to indicate significance or severity Traceable technique iri
17

Scenario
Task
Analysis

THEA
Question1 Question2 Question3

Question

Causal Issue

Argumentation

Consequence Problem

Mitigation No Problem

Advice

iri

Design Issue
18

Representing the work context


Usage scenarios represent the system in context; Focussing questions about the design through the way the artefact functions in the scenario; Take the specific and concrete rather than the abstract and general bottom up analysis rather than top down; scenarios describe: agents, rationale, situation and environment, task context, system context, action and exceptional circumstances. iri 19

Agents

What is in a scenario?

flown by two flight deck crew (in contrast to the three currently present on the flight deck); ECAM is the agent being analysed.

Rationale
involves activities in which, in the old system, the flight engineer heavily involved

Situation and Environment


aircraft at low level (200 feet) during daytime, over water, photographing a fishing vessel

Task Context
crew must take immediate action to keep aircraft flying then commence the drills in response to the engine fire/failure and any secondary warnings

System Context
This scenario is particularly concerned with the way recovery procedures are displayed by the ECAM. Here the ECAM displays a selected recovery procedure one at a time.

Action

How are the tasks carried out in context? How do the activities overlap? Which goals do actions correspond to? How might the scenario evolve differently, either as a result of uncertainty in the environment or because of variations in agents, situation, design options, system and task context?

Exceptional circumstances

iri

20

What is in a scenario? (continued)


Agents
flown by two flight deck crew (in contrast to the three currently present on the flight deck); ECAM is the agent being analysed.

Rationale
involves activities in which, in the old system, the flight engineer heavily involved

Situation and Environment


Aircraft at low level (200 feet) during daytime, over water, photographing a fishing vessel

Task Context
crew must take immediate action to keep aircraft flying then commence the drills in response to the engine fire/failure and any secondary warnings

System Context
This scenario is particularly concerned with the way recovery procedures are displayed by the ECAM. Here the ECAM displays a selected recovery procedure one at a time.

iri

21

What is in a scenario?(continued)
Action
How are the tasks carried out in context? How do the activities overlap? Which goals do actions correspond to?

Exceptional circumstances
How might the scenario evolve differently, either as a result of uncertainty in the environment or because of variations in agents, situation, design options, system and task context?

iri

22

HTA task description in THEA format

iri

23

Hierarchical goal structuring of scenario actions


Maintain safe flight Maintain airframe integrity Shut down engine 3
Warnings

Shut down engine 4

Maintain & gain altitude

Reduce drag PILOT: Increase power

Scenario progression
Close BB doors Flaps 0

Throttle 1 idle

Throttle 1 max

CO-PILOT: Engine 3 shutdown

Goal decomposition
Engine 4 shutdown Engine 3 cleanup

Throttle 3 close

LP cock 3 close

Ext 3 fire shot 1

Cancel warnings

Switch warnings

Switch warnings

iri

24

Causal guidewords
A more cognitive approach to error analysis using Normans cyclic model of human information processing S
Goal formation Intention Planning Evaluation Interpret

The cycle may start with goal formation or perception Processes in the model may be more or less significant as a result of different styles of interaction (for example, direct manipulation or plan following)

Action

Perception

World

iri

25

Cognitive Error Analysis

iri

26

Scenario
Task

THEA
Question1 Question2 Question3

Question

Causal Issue

Consequence

iri

27

THEA Error Analysis questions 1 - 4

iri

28

THEA Questionnaire

iri

29

THEA Error questionnaire completion example THEA Question Causal Issues Consequences
It is also possible that Engine 4 shutdown or Engine 3 cleanup might be omitted or delayed

G1: Triggers, task initiation Some goals are poorly triggered, especially if there are several goals with only a single trigger on the display e.g. Engine 4 shutdown or Engine 3 cleanup G3: Goal conflicts Goals to increase power and Engine 3 shutdown are in conflict (although inevitable here)

Resolving the conflict satisfactorily requires negotiation between pilot and co-pilot. The time required for this may lead to a non-optimal (too late) decision

iri

30

THEA Error Analysis questions 5 - 9

iri

31

THEA Error Analysis questions 10 - 13

iri

32

THEA Error Analysis questions 14 - 17

iri

33

THEA Error Analysis questions 18 - 21

iri

34

Using the checklist


Not all items on the checklist are applicable in all situations; Style of interaction will vary and is influenced by:
- type of user; - type of interface; - type of task.

The checklist is meant to raise questions, not provide definitive answers. Guides analyst in a structured way to consider areas of design for potential interaction difficulties. iri
35

Consequences and design issues


Consequences of failure may be in terms of:
- performance and successful outcome of the scenario - workload of participants; - state of the systems involved including hazardous states.

Design issues
- analyst provided with a space for documenting ideas about design changes that could ameliorate/avoid the problems identified.

iri

36

Scenario
Task
Analysis

THEA
Question1 Question2 Question3

Question

Causal Issue

Argumentation

Consequence Problem

Mitigation No Problem

Advice

iri

Design Issue
37

Bringing it all together


ERROR ANALYSIS Structure the scenarios (HTA, Plans) Error identification questionnaire OUTPUT Suggestions for new requirements & implications for design

Detailed system description INPUTS Usage scenarios

The THEA process

Human Error model

iri

38

ProtoTHEA
A tool is available that supports the use of THEA. Generates an appropriate database and provides triggers for the questionnaire Summaries include information about coverage of questions against HTA and where concentrations of errors are. Available as informally supported system.

iri

39

Where has THEA been applied?


NATS: software maintenance and configuration
- method used without our help

BAE SYSTEMS companies: Operating and maintenance procedures


- case studies but some independent analysis - some quantitative analysis for difficult situations

BAE SYSTEMS companies: Flight deck assessment


- case studies - used in preliminary hazard assessment

iri

40

Where is THEA deficient?


THEA only deals with single-user computer interaction Most work is actually performed in groups/teams Susceptible to a different type of error No error analysis technique that deals with the specific problems associated with collaborative work Other EA methods dont help lead to design improvements
iri
41

Type of Work CHLOE Analyses (work done by Angela Miguel and Peter Wright)
2 types of collaboration under consideration:
Social Collaborative Human agents collaborate directly between themselves to achieve joint goals Technology-mediated Collaborative Human agents,as before, collaborate to achieve mutual goals, but either by means of, or through, the technology medium itself

Different types of collaborative system: ATC, Hospital work Involve: Collaboration - coordination, communication, cooperation Error/failures caused by: e.g. lack of awareness, misunderstandings
between participants, conflicts, failures of coordination

iri

42

Aims of CHLOE
Analysis technique specifically for collaborative errors Take an HCI/CSCW evaluation approach to analysis
Based on failures within a cognitive model of collaboration

Usable by non-Human Factors experts Analysis that helps lead to re-design Applied to quite mature systems to check vulnerability to failure ( consistent with HEA process)

iri

43

The CHLOE Process

1. Scenario Description (Sequence diagrams)

2. Task Identification (Goal decomposition) To identify the units of analysis within the scenario.

3. Error Identification (21 Error analysis questions)

4. Suggestions to improve design

Model of collaboration (Causal model)

CHLOE: A Technique for Analysing Collaborative Systems

iri

44

Quantification
selecting some actions, or subtasks that have been identified during human error identification. attaching numbers to the action or subtask. smaller the unit of analysis the more time consuming the process
THERP uses actions HEART uses generic tasks.

ISSUE about how the numbers should be used.

iri

45

Atomistic assumption appropriate?


Assigning numbers to events identified in the human error identification phase Safety engineering issues:
can clearly separable actions be seen as unique causes with no interaction between events (could be at a goal level or an action level)? can human reliability really be viewed as an aggregation of parts? Failures usually result from a cascade of actions. Probability assessments view each action in isolation; No account of cognitive functions - person treated unrealistically as black box.

iri

46

Quantitative or Qualitative?
Reliability analyses often neglect the importance of qualitative aspects Qualitative and Quantitative predictions are really two aspects of the same thing Quantitative descriptions (e.g. a probability measure) are based on qualitative descriptions - quantities must be quantities of something previously described Purpose is to identify potential for human erroneous actions, especially where theyre likely Numbers work best when serving as tokens for negotiation of concerns
Meaningful Meaningful numbers numbers
Relate to

Identifiable Identifiable structure structure (model (model instantiation) instantiation)


Requires

Theory Theory & & model model


Must be based on

Qualitative Qualitative Descriptions Descriptions (conceptual (conceptual basis) basis)


(Adapted from Hollnagel, 1993)

iri

47

Numbers and context


Data which does exist derived from military and nuclear industries, known as Human Error Probabilities (HEPs). Represent the probability estimates of general or universal (failure) characteristics of human performance To modify these from nominal to actual situations, Performance Shaping Factors (PSFs) devised. Represent specific context or task characteristics, and serve to:
Compensate for lack of appropriate empirical data Compensate for lack of context in a decomposition-based analysis BUT: Often treated in a very simplistic way An artefact derived from the decomposition principle in HRA

iri

48

Quantification techniques
HEART: a human performance model-based technique utilising some standard probabilities
A data-based method for assessing and reducing human error to improve operational performance.
J.C. Williams (1988) IEEE Fourth Conference on Human Factors and Power Plants (pp.436-450)

SLIM: a utility-based technique using team based judgements THERP: earliest method (Many more, mostly based on engineering models) iri
49

Example approach: HEART


HEART employed to assess significant sequences within a scenario; A pre-processed HRA technique, assisting with:
identification of Error Producing Conditions (EPCs) assessment of their importance calculation of the predicted probabilities of task failure

Based on long-term sizeable human reliability database; weighting factors based on HF literature. Assumes human performance usually deteriorates when EPCs interact (eg a conflict of objectives + shortage of time); iri 50

HEART generic categories (after Williams, 1986)


Generic Task
(A) Totally unfamiliar, performed at speed with no real idea of likely consequences (B) Shift or restore system to a new or original state on a single attempt without supervision or procedures (C) Complex task requiring high level of comprehension and skill (D) Fairly simple task performed rapidly or given scant attention (E) Routine, highly practised, rapid task involving relatively low level of skill (F) Restore or shift a system to original or new state following procedures, with some checking (G) Completely familiar, well-designed, highly practised, routine task occurring several times per hour, performed to highest possible standards by highly motivated, highlytrained and experienced personnel, with time to correct potential error, but without the benefit of significant job aids (H) (H) Respond correctly to system command even when iri 51 there is an augmented or automated supervisory system providing accurate interpretation of system state

Nominal human unreliability


0.55 0.26 0.16 0.09 0.02 0.003 0.0004 (0.35-0.97)* (0.14-0.42) (0.12 - 0.28) (0.06 - 0.13) (0.007 - 0.045) (0.0008 - 0.007) (0.00008 - 0.009)

0.00002 (0.000006 - 0.0009)

(*5th-95th percentile bounds)

Error producing Conditions (EPCs) [1]


1

Unfamiliarity with a situation which is potentially important but which only occurs infrequently or which is novel A shortage of time available for error detection and correction A low signal-noise ratio A means of suppressing or over-riding information or features which is too easily accessible No means of conveying spatial and functional information to operators in a form which they can readily assimilate A mismatch between an operators model of the world and that imagined by the designer No obvious means of reversing an unintended action A channel capacity overload, particularly one caused by simultaneous presentation of non-redundant information
52

17

2 3 4 5

11 10 9 8

7 8

8 6

iri

Error producing Conditions (EPCs) [2]


9

A need to unlearn a technique and apply one which requires the application of an opposing philosophy

10 The need to transfer specific knowledge from task to task without loss 11 Ambiguity in the required performance standards 12 A means of suppressing or over-riding information or features which is too easily accessible 13 A mismatch between perceived and real risk. 14 No clear, direct and timely confirmation of an intended action from the portion of the system over which control is exerted.

5.5 5 4 4 4

15 Operator inexperience (e.g., a newly qualified tradesman but not 3 an expert) 16 An impoverished quality of information conveyed by procedures and person-person interaction 3

iri

53

Error producing Conditions (EPCs) [3]


17 Little or no independent checking or testing of output 18 A conflict between immediate and long term objectives 19 Ambiguity in the required performance standards 20 A mismatch between the educational achievement level of an individual and the requirements of the task 21 An incentive to use other more dangerous procedures 22 Little opportunity to exercise mind and body outside the immediate confines of a job 23 Unreliable instrumentation (enough that it is noticed) 24 A need for absolute judgements which are beyond the capabilities or experience of an operator 3 2.5 2.5 2

2 1.8

1.6 1.6

iri

54

Error producing Conditions (EPCs) [4]


25 Unclear allocation of function and responsibility 1.6

26 No obvious way to keep track of progress during an activity 27 A danger that finite physical capabilities will be exceeded. 28 Little or no intrinsic meaning in a task 29 High level emotional stress 30 Evidence of ill-health amongst operatives especially fever. 31 Low workforce morale 32 Inconsistency of meaning of displays and procedures

1.4 1.4 1.4 1.3 1.2 1.2 1.2

iri

55

Error producing Conditions (EPCs) [5]


33 A poor or hostile environment 34 Prolonged inactivity or highly repetitious cycling of low mental workload tasks (1st half hour) 34 (thereafter) 35 Disruption of normal work sleep cycles 36 Task pacing caused by the intervention of others 37 Additional team members over and above those necessary to perform task normally and satisfactorily. (per additional team member) 38 Age of personnel performing perceptual tasks 1.15 1.1

1.05 1.1 1.06 1.03

1.02

iri

56

Application of HEART to case study


Generic Task (F): Task 1: Shift system to new state using procedures: 0.003 Operator removes lifeboat tarpaulin and safety anchoring bolts

Error Producing Conditions

Total HEART effect (E)

Assessed Assessed proportion (P) Effect ( 1) = ((E-1)*P)+1 2.00 1.30 1.40 1.02 1.12 1.08 1.01

2. Shortage of time 13. Poor feedback 17. No independent check 27. Physical capabilities exceeded 29. Emotional stress 33. Hostile environment 35. Disruption of sleep

11.00 4.00 3.00 1.40 1.30 1.15 1.10


57

0.10 0.10 0.20 0.05 0.40 0.50 0.10

iri

Application of HEART to case study (cont)


Assessed probability of failure = 0.003 * 2.00 * 1.30 * 1.40 * 1.02 * 1.12 * 1.08 * 1.01 = 0.014 Task 2: Stow bolts in pre-designated central location; Assessed probability of failure = 0.378 Task 3: Check bolts are stowed prior to pressing detonator button; Assessed probability of failure = 0.397

iri

58

Error-producing conditions (EPCs)


MAXIMUM PREDICTED MULTIPLIER OF CONDITION NOMINAL PROBABILITY
x3 x1.15

ERROR PRODUCING
(17) (33)

Little or no independent checking or testing of output Hostile environment

Identify task EPCs considered to have a negative influence on human performance...

then, add amount by which the EPC modifies the unreliability

iri

59

Case study example

Operator deckside tasks for a quick-release lifeboat:


1. 2. 3. 4. Remove lifeboat tarpaulin and two safety anchoring bolts Physically stow bolts in a pre-designated central location Check that bolts are indeed stowed Pressing the release lifeboat detonator button

CONCERN: What is the probability that the operator will attempt to launch the lifeboat without first removing the safety anchoring bolts?

iri

60

Application of HEART to case study


Generic Task (F):
Task 1: Error Producing Conditions

Shift system to new state using procedures:


Total HEART effect (E) Assessed proportion (P) ( 1)

0.003
Assessed Effect = ((E-1)*P)+1 2.00 1.30 1.40 1.02 1.12 1.08 1.01

Operator removes lifeboat tarpaulin and safety anchoring bolts

2. Shortage of time 13. Poor feedback 17. No independent check 27. Physical capabilities exceeded 29. Emotional stress 33. Hostile environment 35. Disruption of sleep

11.00 4.00 3.00 1.40 1.30 1.15 1.10

0.10 0.10 0.20 0.05 0.40 0.50 0.10

Assessed probability of failure = 0.003 * 2.00 * 1.30 * 1.40 * 1.02 * 1.12 * 1.08 * 1.01 = 0.014

Task 2: Task 3:

Stow bolts in pre-designated central location; Assessed probability of failure = 0.378 Check bolts are stowed prior to pressing detonator button; Assessed probability of failure = 0.397

iri

61

SLIM: Success Likelihood Index Method


Problem with HEART:
Small database of numbers, highly susceptible to expert judgement the proportions tend to dominate. Therefore discuss it to indicate how the number issue might be dealt with. No suggestion that you should use it. However, the process of generating the numbers might be valuable.

SLIM addresses the numerical but is probably more susceptible to expert judgement. Based on meeting involving expert panel (for example, two operators with minimum 10 years experience; one human factors analyst; one reliability analyst). Calculates success likelihood index from performance shaping factors ratings. Converts SLIs into probabilities. iri 62

SLIM: Success Likelihood Index Method


not based on tables of human performance data rather based on similarity with similar situations. based on meeting involving expert panel (for example, two operators with minimum 10 years experience; one human factors analyst; one reliability analyst); calculates success likelihood index from performance shaping factors ratings converts SLIs into probabilities
iri
63

Summary
Introduced a technique for human error identification based on a cognitive model of human behaviour, including a checklist for assessing the complexity of the interface of the system Brief introduction to issues of quantification
quantification should be considered with extreme caution HEART based on data which is highly dependent on contextual factors

CREAM combining attributes of THEA and HEART, but has an initial phase in which the control mode and generic reliability characteristics are derived. iri
64

You might also like