Setting A New Standard in Alarm Management
Setting A New Standard in Alarm Management
How to follow the ISA 18.2 alarm management standard to create a safer and more productive plant
www.sea.siemens.com
www.usa.siemens.com/process
Summary
A white paper issued by Siemens. ©2010 Siemens Industry, Inc. All rights reserved.
White Paper | Alarm Management | January 2010 2
Contents
2.0 What is the new ISA standard and why was it created.................................................... 4
4.0 Overview of the standard and how to comply/follow with PCS 7.................................... 6
4.1 Philosophy........................................................................................................... 6
4.2 Identification and rationalization.......................................................................... 6
4.3 Detailed design.................................................................................................... 7
4.4 Implementation................................................................................................... 9
4.5 Operation and maintenance............................................................................... 10
4.6 Monitoring and assessment................................................................................ 12
4.7 Management of change..................................................................................... 14
4.8 Audit................................................................................................................. 14
6.0 Conclusion................................................................................................................. 15
7.0 References................................................................................................................. 15
A white paper issued by Siemens. ©2010 Siemens Industry, Inc. All rights reserved.
White Paper | Alarm Management | January 2010 3
The global recession hurt the bottom line for many manufacturers in the process industries. Focusing on operational excellence
is a key to short-term survival and to future growth. Poor alarm management is a major barrier to reaching operational
excellence. It is one of the leading causes of unplanned downtime, which can cost $10K/hr to $1M/hr for facilities that run
24 x 7. It also impacts the safety of a plant and its personnel, having played a major part in the accidents at Three Mile Island
(PA), the Milford Haven Refinery (UK), Texas City Refinery (TX), and the Buncefield Oil Depot (UK), which all resulted in
significant cost - injury, loss of life, equipment and property damages, fines, and damage to company reputations.
At the Buncefield Oil Depot, a tank overflow and resultant fire caused a $1.6B loss. It could have been prevented if the tank’s
level gauge or high level safety switch had notified the operator of the high level condition.1 The explosion and fire at the Texas
City refinery killed 15 people and injured 180 more. It might not have occurred if key level alarms had not failed to notify the
operators of the unsafe and abnormal conditions that existed within the tower and blowdown drum. 2
In June of 2009 the standard ANSI/ISA-18.2-2009, “Management of Alarm Systems for the Process Industries”, was released.
This paper reviews ISA-18.2 and describes how it impacts end users, suppliers, integrators, and consultants. It also provides
examples of the tools, practices, and procedures that make it easier to follow the standard and reap the rewards of improved
alarm management.
A white paper issued by Siemens. ©2010 Siemens Industry, Inc. All rights reserved.
White Paper | Alarm Management | January 2010 4
ISA-18.2 provides a framework for the successful design, implementation, operation, and management of alarm systems in
a process plant. It builds on the work of other standards and guidelines such as EEMUA 191, NAMUR NA 102, and ASM (the
Abnormal Situation Management Consortium). Alarm management is not a “once and done” activity, rather it is a process that
requires continuous attention. Consequently, the basis of the standard is to follow a life-cycle approach as shown in Figure 1.
The connection between poor alarm management and process safety accidents was one of the motivations for the
development of ISA-18.2. Both OSHA and the HSE have identified the need for improved industry practices to prevent these
incidents. Consequently, ISA-18.2 is expected to be “recognized and generally accepted good engineering practice” (RAGAGEP)
by both insurance companies and regulatory agencies. As such, it becomes the expected minimum practice.
A J
Philosophy
B I
Identification
C
Rationalization
Management
of Change
D
Detailed Design
Audit
E
Implementation
F H
Operation
Monitoring &
Assessment
G
Maintenance
A white paper issued by Siemens. ©2010 Siemens Industry, Inc. All rights reserved.
White Paper | Alarm Management | January 2010 5
Reviewing the definition of an alarm is helpful to understand its intended purpose and how misapplication can lead to
problems.
Alarm: An audible and/or visible means of indicating to the operator an equipment malfunction, process deviation, or
abnormal condition requiring a response.
One of the most important principles of alarm management is that an alarm requires a response. This means if the operator
does not need to respond to an alarm (because unacceptable consequences do not occur), then the point should not include an
alarm. Following this cardinal rule will help eliminate many potential alarm management issues. The recommendations in the
standard provide the “blueprint” for eliminating and preventing the most common alarm management problems, such as those
shown in Table 1.
Table 1 – Common alarm management problems that can be addressed by following the alarm management life-cycle of ISA-18.2
A white paper issued by Siemens. ©2010 Siemens Industry, Inc. All rights reserved.
White Paper | Alarm Management | January 2010 6
Next, these candidate alarms are rationalized, which means each one is evaluated with a critical eye to justify that it meets the
requirements of being an alarm.
Alarms that pass this screening are further analyzed to define their attributes (e.g. limit, priority, classification, and type).
Alarm priority should be set based on the severity of the consequences and the time to respond. Classification identifies
groups of alarms with similar characteristics (e.g. environmental or safety) and common requirements for training, testing,
documentation, or data retention. Safety alarms coming from a Safety Instrumented System (SIS) are typically classified as
“highly managed alarms”. These alarms should receive special treatment particularly when it comes to viewing their status in
the HMI.
Alarm attributes (i.e. settings) are documented in a Master Alarm Database, which also records important details discussed
during rationalization - the cause, consequence, recommended operator response, and the time to respond for each alarm. This
information is used during many phases of the life-cycle. For example, many plant operations and engineering teams are afraid
to eliminate an existing alarm because it was “obviously put there for a reason”. With the Master Alarm Database, one can look
back years afterward and see why a specific alarm was created (and evaluate whether it should remain).
Documentation about an alarm’s cause and consequence can be invaluable to the operator who must diagnose the problem
and determine the best response. The system should allow the alarm rationalization information to be entered directly into the
configuration (e.g. as an alarm attribute) so that it is part of the control system database and so that it can be made available to
the operator online through the HMI.
Figure 3 Entering cause, corrective action information from rationalization directly into PCS 7
A white paper issued by Siemens. ©2010 Siemens Industry, Inc. All rights reserved.
White Paper | Alarm Management | January 2010 7
One of the major benefits of conducting a rationalization is determining the minimum set of alarm points that are needed to
keep the process safe and under control. Too many projects follow an approach where the practitioner enables all of the alarms
that are provided by the DCS, whether they are needed or not, and sets them to default limits of 10%, 20%, 80%, and 90%
of range. A typical analog indicator can have six or more different alarms configured (e.g. high-high, high, low, low-low, bad
quality, rate-of-change, etc.), making it easy to end up with significantly more alarm points than are needed. To prevent the
creation of nuisance alarms and alarm overload conditions, it is important to enable only those alarms that are called for after
completing a rationalization. Thus an analog indicator, for example, may have only a single alarm condition enabled (e.g. high).
During the detailed design phase, the information contained in the Master Alarm Database (such as alarm limit and priority) is
used to configure the system. Alarm settings should be copied and pasted or imported from the Master Alarm Database directly
into the control system configuration to prevent configuration errors. Spreadsheet style engineering tools can help speed the
process, especially if they allow editing attributes from multiple alarms simultaneously. If the control system configuration
supports the addition of user-defined fields, it may be capable of fulfilling the role of the Master Alarm Database itself.
Figure 4 Spreadsheet-style interface for bulk transfer of alarm settings from the Master Alarm Database
Following the recommendations for alarm deadbands and on-off delays from the standard (shown in Table 2) can help
prevent “nuisance” alarms during operation. A study by the ASM found that the use of on-off delays in combination with other
configuration changes was able to reduce the alarm load on the operator by 45-90%.4
Table 2. Recommended starting points for alarm deadbands and delay timers3
Note: Proper engineering judgement should be used when setting deadbands and delay times.
A white paper issued by Siemens. ©2010 Siemens Industry, Inc. All rights reserved.
White Paper | Alarm Management | January 2010 8
Configuration of alarm deadband (hysteresis), which is the change in signal from the alarm setpoint necessary to clear the
alarm, can be optimized by a system that displays settings from multiple alarms at the same time, allowing them to be edited in
bulk. This capability also makes it easy to review and update the settings after the system has been operating as recommended
by the standard. Similar tools and procedures can be used to configure the on/off delay, which is the time that a process
measurement remains in the alarm/normal state before the alarm is annunciated/cleared.
The design of the human machine interface (HMI) is critical for enabling the operator to detect, diagnose, and respond to an
alarm within the appropriate timeframe. The proper use of color, text, and patterns directly affects the operator’s performance.
Since 8-12% of the male population is color blind, it is important to follow the design recommendations shown in Table 3 to
ensure that changes in alarm state (normal, acknowledged, unacknowledged, suppressed) are easily detected.
Visual Indications
Alarm State Audible Indication
Color Symbol Blinking
Normal No No No No
Unacknowledged (New) Alarm Yes Yes Yes Yes
Acknowledged Alarm No Yes Yes No
Return to Normal State Indication No Optional Optional Optional
Unacknowledged Latched Alarm Yes Yes Yes Yes
Acknowledged Latched Alarm No Yes Yes No
Shelved Alarm No Optional Optional No
Designed Suppression Alarm No Optional Optional No
Out of Service Alarm No Optional Optional No
Table 3 ISA-18.2 Recommended alarm state indications3
Symbols and faceplates provided with the system should comply with ISA-18.2’s recommendations. Figure 5 shows an example
where the unacknowledged alarm state can be clearly distinguished from the normal state by using both color (yellow box)
and symbol (the letter “W”). This ensures that even a color blind operator can detect the alarm. The Out-of-Service state is also
clearly indicated.
The standard recommends that the HMI should make it easy for the operator to navigate to the source of an alarm (single click)
and provide powerful filtering capability within an alarm summary display.
Advanced alarming techniques can improve performance by ensuring that operators are presented with alarms only when they
are relevant. Additional layers of logic, programming, or modeling are configured to modify alarm attributes or suppression
state dynamically. One method described in ISA-18.2 is state-based alarming, wherein alarm attributes are modified based on
the operating state of the plant or a piece of equipment.
State-based alarming can be applied to many situations. It can suppress a low flow alarm from the operator when it is caused
by the trip of an associated pump. It can mask alarms coming from a unit or area that is shut down. In batch processes it can
change which alarms are presented to the operator based on the phase (e.g. running, hold, abort) or based on the recipe.
A white paper issued by Siemens. ©2010 Siemens Industry, Inc. All rights reserved.
White Paper | Alarm Management | January 2010 9
One of the most challenging times for an operator is dealing with the flood of alarms that occur during a major plant upset.
When a distillation column crashes, tens to hundreds of alarms may be generated. To help the operator respond quickly and
correctly, the system should be able to hide all but the most significant alarms during the upset. For example, logic in the
controller can determine the state of the column. The state parameter could then be used to determine which alarms should be
presented to the operator based on a pre-configured state matrix, such as that shown in Figure 6.
A white paper issued by Siemens. ©2010 Siemens Industry, Inc. All rights reserved.
White Paper | Alarm Management | January 2010 10
The standard also documents what should be included in an alarm response procedure. The information fleshed out during
rationalization, such as an alarm’s cause, potential consequence, corrective action, and the time to respond, should be made
available to the operator. Ideally this information should be displayed online rather than in written form.
A white paper issued by Siemens. ©2010 Siemens Industry, Inc. All rights reserved.
White Paper | Alarm Management | January 2010 11
Effective transfer of alarm status information between shifts is important in many facilities. The operator coming on shift in
Texas City was provided with a three-line entry in the operator logbook, ill preparing him to address the situation leading up to
the explosion. To improve shift transition, the system should allow operators to record comments for each alarm.
Maintenance is the stage where an alarm is taken out-of-service for repair, replacement, or testing. The standard describes the
procedures that must be followed, including documenting why an alarm was removed from service, the details concerning
interim alarms, special handling procedures, as well as what testing is required before it is put back into service. The standard
requires that the system be able to show a complete list of alarms that are currently out-of-service. As a safety precaution, this
list should be reviewed before putting a piece of equipment back into operation to ensure that all of the necessary alarms are
operational.
The standard describes three possible methods for alarm suppression, which is any mechanism used to prevent the indication
of the alarm to the operator when the base alarm condition is present. All three methods have a place in helping to optimize
performance.
Suppression Method
Definition Relevant Phase
Per ISA-18.2
A mechanism, typically initiated by the
Shelving operator, to temporarily suppress an Operations
alarm
Any mechanism within the alarm system
that prevents the transmission of the
Suppressed by Design Advanced Alarm Design
alarm indication to the operator based on
plant state or other conditions
The state of an alarm during which
the alarm indication is suppressed,
Out-of-Service Maintenance
typically manually, for reasons such as
maintenance
A white paper issued by Siemens. ©2010 Siemens Industry, Inc. All rights reserved.
White Paper | Alarm Management | January 2010 12
ISA-18.2 recommends using no more than three or four different alarm priorities in the system. To help operators know which
alarms are most important so they can respond correctly, it is recommended that no more than 5% of the alarms be configured
as high priority. The system should make it easy to review the configured alarm priority distribution, for example, by exporting
alarm information to a .csv file for analysis in MS Excel.
Analysis should also include identifying nuisance alarms, which are alarms that annunciate excessively, unnecessarily, or do
not return to normal after the correct response is taken (e.g., chattering, fleeting, or stale alarms). The system should have the
capability of calculating and displaying statistics, such as alarm frequency, average time in alarm, time between alarms, and
time before acknowledgement. It is not uncommon for the majority of alarms (up to 80%) to originate from a small number of
tags (10 – 20). This frequency analysis makes it easy to identify these “bad actors” and fix them. The “average time in alarm”
metric can help identify chattering alarms, which are alarms that repeatedly transition between the alarm state and the normal
state in a short period of time.
Figure 9 Pinpointing nuisance alarms from an alarm frequency display in the HMI
A white paper issued by Siemens. ©2010 Siemens Industry, Inc. All rights reserved.
White Paper | Alarm Management | January 2010 13
Another key objective of the Monitoring & Assessment phase is to identify stale alarms, which are those alarms that remain in
the alarm state for an extended period of time (> 24 hours). The system should allow the alarm display to be filtered, based on
time in alarm, in order to create a stale alarm list. Alarm display filters should be savable and reusable so that on-demand reports
can be easily created. All information contained in the alarm display should be exportable for ad-hoc analysis.
A white paper issued by Siemens. ©2010 Siemens Industry, Inc. All rights reserved.
White Paper | Alarm Management | January 2010 14
All changes made through the HMI should be automatically recorded with the date / time stamp, “from” and “to” values, along
with who made the change. The system should provide the capability to set up access privileges (such as who can acknowledge
alarms, modify limits, or disable alarms) on an individual and a group basis. It is also important to prevent unauthorized
configuration changes from the engineering station.
It is good practice to periodically compare the actual running alarm system configuration to the Master Alarm Database
to ensure that no unauthorized configuration changes have been made. The system should provide tools to facilitate this
comparison in order to make it easy to discover differences (e.g. alarm limit has been changed from 10.0 to 99.99). These
differences can then be corrected to ensure consistency and traceability.
Figure 10 Tools for comparing the online system to the Master Alarm Database
A white paper issued by Siemens. ©2010 Siemens Industry, Inc. All rights reserved.
White Paper | Alarm Management | January 2010 15
1) Develop an alarm philosophy document to establish the standards for how your organization will do alarm management.
2) Rationalize the alarms in the system to ensure that every alarm is necessary, has a purpose, and follows the cardinal rule –
that it requires an operator response.
3) Analyze and benchmark the performance of the system and compare it to the recommended metrics in ISA-18.2. Start by
identifying nuisance alarms, which can be addressed quickly and easily – this rapid return on investment may help justify
additional investment in other alarm management activities.
4) Implement Management of Change. Review access privileges and install tools to facilitate periodic comparisons of the actual
configuration vs. to the Master Alarm Database.
5) Audit the performance of the alarm system. Talk with the operators about how well the system supports them. Do they
know what to do in the event of an alarm? Are they able to quickly diagnose the problem and determine the corrective
action? Also, analyze their ability to detect, diagnose, and respond correctly and in time.
6) Perform a gap analysis on your legacy control system. Identify gaps compared to the standard (e.g. lack of analysis tools)
and opportunities for improvement. Consider the cost vs. benefit of upgrading your system to improve its performance
and for compliance with ISA-18.2. In many cases a modern HMI can be added on top of a legacy control system to provide
enhanced alarm management capability without replacing the controller and I/O.
6.0 Conclusion
Following the ISA-18.2 standard will become increasingly important as it is adopted by industry, insurance, and regulatory
bodies. The standard includes recommendations and requirements that can stop poor alarm management, which acts as a
barrier to operational excellence. Look for a system that provides a comprehensive set of tools that can help you to follow the
alarm management lifecycle and address the most common alarm issues – leading to a safer and more efficient plant.
Depending upon the capabilities of the native control system, additional third-party tools may be required to deliver the benefits
of ISA-18.2. Finding a control system which provides, out-of-the-box, the capabilities demanded by the standard can reduce
life-cycle costs and make it easier for personnel to support and maintain. A checklist of the most important alarm management
capabilities for compliance with ISA-18.2 is provided in Appendix A.
For more information, go the ISA website www.isa.org to get a copy of the standard (free to all ISA members).
7.0 References
1. “The Buncefield Investigation” - www.buncefieldinvestigation.gov.uk/reports/index.htm
2. “BP America Refinery Explosion” U.S. CHEMICAL SAFETY BOARD www.chemsafety.gov/investigations
3. ANSI/ISA-18.2-2009 “Management of Alarm Systems for the Process Industries”. www.isa.org
4. Zapata, R. and Andow, P., “Reducing the Severity of Alarm Floods”, www.controlglobal.com
5. “The Explosion and Fires at the Texaco Refinery, Milford Haven, 24 July 1994”, HSE Books, Sudbury, U.K. (1995).
6. EEMUA 191 (2007), “Alarm Systems: A Guide to Design, Management and Procurement Edition 2”. The Engineering
Equipment and Materials Users Association. www.eemua.co.uk
7. Abnormal Situation Management Consortium, www.asmconsortium.net
8. NAMUR (Interessengemeinschaft Automatisierungstechnik der Prozessindustrie), www.namur.de
9. Podcast: “Saved by the Bell: A look at ISA’s New Standard on Alarm Management”, www.controlglobal.com/
multimedia/2009/AlarmMgmtISA0907.html
A white paper issued by Siemens. ©2010 Siemens Industry, Inc. All rights reserved.
White Paper | Alarm Management | January 2010 16
A white paper issued by Siemens. ©2010 Siemens Industry, Inc. All rights reserved.