Framework For A Smart Data Analytics Platform Towards Process Monitoring and Alarm Management
Framework For A Smart Data Analytics Platform Towards Process Monitoring and Alarm Management
PII: S0098-1354(17)30355-1
DOI: https://doi.org/10.1016/j.compchemeng.2017.10.010
Reference: CACE 5916
Please cite this article as: Hu, Wenkai., Shah, Sirish L., & Chen,
Tongwen., Framework for a Smart Data Analytics Platform towards Process
Monitoring and Alarm Management.Computers and Chemical Engineering
https://doi.org/10.1016/j.compchemeng.2017.10.010
This is a PDF file of an unedited manuscript that has been accepted for publication.
As a service to our customers we are providing this early version of the manuscript.
The manuscript will undergo copyediting, typesetting, and review of the resulting proof
before it is published in its final form. Please note that during the production process
errors may be discovered which could affect the content, and all legal disclaimers that
apply to the journal pertain.
4 CONTROLLERS
Highlights:
A framework for a smart data analytics platform is presented.
The platform integrates information from process and alarm databases complimented with
process connectivity.
Main features include the alarm data analysis, alarm system design, process data analysis, and
causality inference.
Case studies involving real industrial data demonstrate the practical utility of the tools.
Abstract
The fusion of information from disparate sources of data is the key step in devising strategies for a smart analytics
platform. In the context of the application of analytics in the process industry, this paper provides a framework for
seamless integration of information from process and alarm databases complimented with process connectivity
information. The discovery of information from such diverse data sources can be subsequently used for process
and performance monitoring including alarm rationalization, root cause diagnosis of process faults, hazard and
operability analysis, safe and optimal process operation. The utility of the proposed framework is illustrated by
several successful industrial case studies.
Keywords
Analytics; Big data; Performance monitoring; Process monitoring; Alarm systems; Process data analytics; Fault
detection and diagnosis.
1 Introduction1
The operation of modern industrial facilities has become highly automated with the deployment of computerized
control systems, such as the Distributed Control System (DCS) and the Supervisory Control and Data Acquisition
(SCADA) system. However, due to large scale and complex system dynamics, abnormal situations do occur from time to
time and such events easily propagate along the interconnected pathways to cause significant and catastrophic disruptions
(Wang et al., 2016). This has been the main motivation factors for studies to ensure process safety, reliable operation, and
compliance with environmental requirements, via tools for better process monitoring and alarm management. Massive
amount of data in the DCS and SCADA systems contain rich information about process operations, making it
indispensable assets for decision making processes. However, without effective analytical tools, such data would only be
compressed and archived for record keeping rather than being turned into valuable resources to facilitate decision-making
(Qin, 2014).
This work was supported by the Natural Sciences and Engineering Research Council of Canada.
[Type text]
Process data analytic methods rely on the notion of sensor fusion whereby data from many sensors and alarm tags are
combined with process information, such as physical connectivity of process units, to give a holistic picture of health of an
integrated plant. In this study, the process data has a broader meaning compared to many existing publications, which
mainly refer it to as the continuous-valued sensor data. In this paper, process data analytics includes the analytics of sensor
data, alarm and event data, as well as the process connectivity information.
The discovery and learning from process and alarm data refer to a set of tools and techniques for modeling and
understanding of complex data sets. Such data sets generally include normal numerical (or non-categorical) data but should
also take into account categorical (or non-numerical or qualitative) data from Alarm and Event (A&E) logs combined with
process connectivity or topology information. The later refers to the capture of material flow streams in process units as
well as information flow-paths in the process due to control loops. This is particularly useful when one is analyzing data
from highly integrated processes to understand propagation of process faults as would be required in HAZard and
OPerability (HAZOP) analysis for safe process operation. Highly interconnected process plants are now a norm and the
analysis of root causes of process abnormality including predictive risk analysis is non-trivial. It is the extraction of
information from the seamless fusion of process data, alarm and event data and process connectivity that should form the
backbone of a viable process data analytics platform. This paper focuses on an attempt to create such a platform. This idea
of information fusion in the context of process data analytics is depicted in Fig. 1.
For efficient and informative analytics, data analysis is ideally carried out in the temporal as well as spectral domains,
on a multitude and NOT singular sensor signals to detect process abnormality, ideally in a predictive mode. With the
explosion of applications of analytics in diverse areas (such as aircraft engine prognosis, medicine, sports, finance,
insurance, social sciences and the advertising industry) statistical learning skills are in high demand. The emphasis in this
study is on tools and techniques that help in the process of understanding data and discovering information that would lead
to predictive monitoring and diagnosis of process faults, alarm rationalization and safe and optimal process operation.
Typical process data analytic methods require the execution of following steps:
1) Data quality assessment, such as outlier detection, data normalization, and noise filtering;
2) Data visualization and segmentation;
3) Process and performance monitoring including root cause detection of faults;
4) Alarm data analysis;
5) Data-based process topology discovery and validation.
The focus of this paper is to introduce a framework for a smart analytics platform supported by industrial case studies
to demonstrate the practical utility of such a tool. This smart analytics platform has the following unique features: 1) rich
functionalities of alarm data analysis, including alarm data visualization, alarm similarity analysis, quantification of
chattering alarms, design of delay timers, oscillating alarm analysis, similarity analysis of alarm floods, and causality
inference for alarms; 2) powerful alarm configuration analysis for univariate alarm systems, including the design of alarm
limits, filters, delay timers, and deadbands based on continuous-valued sensor data; 3) new methods for process data
analysis, such as the spectral envelope and spectral principle components analysis; as well as 4) useful techniques for
connectivity and causality analysis, such as the spectral correlation and transfer entropy approaches.
The rest of this paper is organized as follows: First, main features of this data analytics platform are introduced in
detail, including several functional modules, such as the alarm data analysis, alarm system design, process data analysis,
and causality inference. To demonstrate the utility of these functional modules, several case studies involving real
industrial data are presented. The concluding remarks are given in the last section.
2 Framework for a Data Analytics Platform
A comprehensive platform should integrate a variety of basic statistical functions as well as advanced analytical
features, which are powerful and insightful in analyzing either continuous-valued process data or binary-valued alarm data
as well as additional categorical data. The proposed analytics platform consists of a data loading section and four
functional modules as shown in Fig. 2. “Data Loading” imports, reorganizes, merges, or exports alarm data and/or process
data. The functional module “Alarm Data Analysis” provides analytic and reporting functions to analyze and visualize
alarm data. The remaining three functional modules are based on process data. The “Alarm Configuration Analysis”
module designs univariate alarm systems for specific process variables. The “Process Data Analysis” module visualizes
and analyzes process data from either time or frequency perspective. The “Connectivity & Causality Analysis” module
uncovers correlations and causal relations between process variables. In addition, a data summary section displays the
basic information of loaded alarm data and/or process data. Details and features in each functional module are presented in
the following subsections. The framework allows merging of process and alarm data, and gives exploratory as well as
analytical insights in the extraction of information from such data.
2.1 Data loading
“Data Loading” is the first highlighted part in Fig. 2. It has six functions that can fulfill different functions related to
data loading, including alarm data loading, process data loading, data exporting, preprocessed data loading, data matching,
and data clearing. The descriptions of these functions are listed in Table 1.
To import alarm historian data to the platform, an Alarm & Event (A&E) log file in Microsoft Excel format is needed.
Databases from various vendor systems can be exported into this toolbox by first converting them into Excel files. Several
requirements on the log file should be satisfied: (1) Each row should reflect one event message; (2) each column should be
a certain field of event messages; (3) the same column in all the sheets of the log file should represent the same field. A
typical A&E log usually consists of configuration attributes, e.g., the tag name, alarm identifier, priority and location, and
realtime messages such as alarm occurrences (ALM), return-to-normal instants (RTN), and their time stamps (Izadi et al.,
2010; Kondaveeti et al., 2012). Among these attributes and messages, the following pieces are key and necessary: time
stamp, tag name, tag identifier, and message type. They may have different field headers in the data archived from different
vendor systems, but usually all of these four pieces of information are provided. In addition to this, the priority and unit
information are optional for the data loading depending on whether the two pieces of information are archived or not. An
example of A&E log in Excel format is shown in Fig. 3. The four mandatory attributes and two optional attributes are
highlighted by square-dotted red and fine dashed blue rectangles, respectively.
To import historical process data to the platform, files that store historical measurements of process variables in
Microsoft Excel format are needed. Process data has totally different format compared to alarm data. Several requirements
on the Excel files should be satisfied: (1) The first column should be time stamps of the sampling instants; (2) the
following columns store the historical values of process variables at these sampling instants. If the time stamp information
is not available, an Excel file without the time stamp column is also acceptable. The log file of process data has a much
simpler structure compared with the A&E log, and usually includes three parts, namely, tag names, measurements, and
time stamps. An example of a process data stream is presented in Fig. 4. The tag names, measurements, and time stamps
are highlighted by round-dotted green, fine dashed blue, and square-dotted red rectangles, respectively.
Once the alarm and/or process data are loaded, the platform will reorganize them in a format that can be easily
processed by the analytical functions. Exporting data before closing the platform is recommended, since it is much less
time-consuming to import the preprocessed data in .mat format than to import the raw data stored in Excel, especially for
subsequent analysis. If both the alarm data and process data are loaded, then all the process variables that have
observations during the time-window of the alarm dataset are listed. Macros can be created to associate alarm tags with a
listed process variable. Some basic information about the data set is shown in the information section at the bottom left
corner of Fig. 2. It shows the directories of the alarm data and process data files, respectively. Alarm historian duration,
average alarm rate, number of alarm tags, number of process tags, and number of process tags that have been matched to
alarm tags are also provided if available.
Fig. 6 shows an alarm similarity color map of the top 20 bad actors. Alarm tags are clustered based on their
correlations. The darker color of a block indicates a higher correlation between alarms. The diagonal of the map consists of
1’s (black squares), indicating the highest correlations of alarm tags with themselves. In this case study, only one pair of
alarms was found to be correlated, namely, “Tag64.COMM” and “Tag60.IOF”. The correlation value was 0.9, indicating a
strong relation.
Fig. 7 displays the chattering indices of the top 20 bad actors using red bars. The green line denotes the threshold of
chattering alarms based on ANSI/ISA-18.2 (2009) standard (no more than 3 alarms over a 60 second period). Any
chattering index that exceeds this threshold indicates a chattering alarm. Among these bad actors, 7 alarms were
determined to have chattering problems. To design delay timers for these chattering alarms, the run length distribution is
used. Fig. 8 shows an example of designing off-delay timer for the topmost bad actor “Tag102.CFN”. The red curve
indicates the alarm count that can be reduced by implementing an off-delay timer with the value of the delay timer on the
horizontal axis. For instance, an off-delay timer of 10 sec reduces 82% of the alarm occurrences. The off-delay timer turns
the chattering alarms into standing alarms. Accordingly, the alarm count was reduced from 4856 to 875 over 10 days.
In the same manner, the types and values of delay timers were recommended for all the seven chattering alarm tags as
shown in Table 8. It is noteworthy that small off-delay timers were very effective in reducing most chattering instants for
these alarm tags. By applying the recommended off-delay timers, the average alarm rate can be reduced to 4.7 alarms over
10 minutes, which is 45% lower than the original alarm rate, namely, 8.5 alarms per 10 minutes.
To analyze plant oscillations, the power spectral correlations are first calculated and shown as a power spectral
correlation color map in Fig. 26, where process variables with similar power spectra are clustered. The color bar on the
right side of the color map indicates the strength of correlation. The red and orange colors indicate strong correlations and
the green color indicates a weak correlation. The diagonal of the color map represents the correlation between one variable
and itself. Based on Fig. 26, it is easy to identify all process variables that share the same oscillating feature. These would
include Process Variables (PVs) as well as the corresponding Manipulative Variables (MVs). If necessary, the MVs can be
omitted to obtain a short-list of all oscillating PVs.
In this case study the spectral envelope method is used to diagnose the plant oscillation. Fig. 27 shows the calculated
spectral envelope, where a clear peak is observed at the frequency of 0.003175 cycles per sample, which was the frequency
of concern for plant engineers.
The Chi-squared test statistics of the 48 process variables at the oscillation frequency of 0.003175 cycles per sample
are calculated and shown using a bar chart in Fig. 28. The dashed red line denotes the significance threshold of 13.82 at the
significance level of 0.001. As a result, the process variables with Chi-squared values larger than this threshold are
identified to be oscillating at the frequency of 0.003175 cycles per sample.
Fig. 29 shows the Oscillation Contribution Indices (OCIs) of the 48 process variables at the oscillation frequency of
0.003175 cycles per sample. The dashed red line denotes the OCI threshold of 1. Variables that have OCIs larger than 1
are regarded as root cause candidates. Among these variables, “LC2.PV” and “LC2.OP” have the largest OCIs, indicating
the loop associated to tag “LC2” contributes most to the spectral envelope at the frequency of 0.003175 cycles per sample.
Thus, this particular loop should be examined as the first root cause candidate.
Fig. 30 shows the scatter plot between “LC2.PV” and “LC2.OP”. The elliptical pattern indicates a valve stiction, that
caused limit cycles in the loop and then propagated to many other variables. Based on a plant test, there indeed existed a
4% stiction in the valve. Thus, this was exactly the root-cause of the plant-wide oscillations. This root cause was also
validated by the connectivity analysis (Jiang et al., 2009).
X: 10
Y: 0.82
1000 0.5
0 0
0 10 20 30 40 50 60
Run length (s)
70
60
alarms/10min
50
40
30
20
10
0
15-Aug 8 PM 18-Aug 8 AM 20-Aug 9 PM 23-Aug 10 AM 25-Aug 11 PM
45
40
35
30
alarms/10min
25
20
15
10
0
15-Aug 8 PM 18-Aug 8 AM 20-Aug 9 PM 23-Aug 10 AM 25-Aug 11 PM
Fig. 10. Alarm burst plot for alarm data with chattering alarms reduced.
Normal
Alarm
Tag2
Normal
Alarm
Tag3
Normal
Alarm
Tag4
Normal
Alarm
Tag5
Normal
Alarm
Tag6
Normal
0.5 1 1.5 2 2.5 3
Time(s) 5
x 10
Fig. 17. Normal (blue) and abnormal (red) parts of a process signal.
LC3.OP FC5.SP
PC2.OP FC6.SP
FC6.OP FC1.SP
TC1.OP FC3.SP
FC4.OP LI1.PV
FC1.OP FI3.PV
LC1.OP PI2.PV
FC3.OP TI7.PV
PC1.OP TI8.PV
FC7.PV FI4.PV
FC8.PV FI5.PV
LC2.PV TI6.PV
FC5.PV TI4.PV
LC3.PV TI5.PV
PC2.PV PI1.PV
FC6.PV FI1.PV
TC1.PV TI2.PV
FC4.PV TI1.PV
FC1.PV FC7.OP
LC1.PV TC2.OP
FC3.PV FC8.OP
PC1.PV LC2.OP
0 1000 2000 3000 4000 5000 0 1000 2000 3000 4000 5000
Samples Samples
LC3.OP FC5.SP
PC2.OP FC6.SP
FC6.OP FC1.SP
TC1.OP FC3.SP
FC4.OP LI1.PV
FC1.OP FI3.PV
LC1.OP PI2.PV
FC3.OP TI7.PV
PC1.OP TI8.PV
FC7.PV FI4.PV
TC2.PV TI3.PV
Tags
Tags
FC8.PV FI5.PV
LC2.PV TI6.PV
FC5.PV TI4.PV
LC3.PV TI5.PV
PC2.PV PI1.PV
FC6.PV FI1.PV
TC1.PV TI2.PV
FC4.PV TI1.PV
FC1.PV FC7.OP
LC1.PV TC2.OP
FC3.PV FC8.OP
PC1.PV LC2.OP
-3 -2 -1 -3 -2 -1
10 10 10 10 10 10
Frequency (Cycles/sample) Frequency (Cycles/sample)
Spectral Envelope
16000
14000
12000
Spectral Envelope, ()
X: 0.003175
Y: 9706
10000
8000
6000
4000
2000
-3 -2 -1
10 10 10
Frequency,
0
500
1000
1500
2000
2500
3000
'PC 1.
P 'PC 1.
'FC 3. V' P
6
7
8
9
10
11
12
13
'FC 3. V'
'LC1.PV'
P 'LC1.PV'
'FC 1. V' P
41.5
'FC 1. V'
'FC 4.PV' 'FC 4.PV'
'TC 1.PV' 'TC 1.PV'
'FC 6. PV' P
'FC 6. V'
'PC 2.PV'
P 'PC 2.PV'
P
'LC3. V' 'LC3. V'
'FC 5. PV' 'FC 5. PV'
'LC2.PV' P
42
P 'LC2. V'
'FC 8. V' 'FC 8. PV'
'TC 2.PV' 'TC 2.PV'
'FC 7. PV' P
'PC 1. V'P 'FC 7. V'
O 'PC 1. PV'
'FC 3. P'
O 'FC 3.O P'
O
'LC1. P' 'LC1. P'
'FC 1. OP' O
O 'FC 1. P'
'FC 4. P' 'FC 4.OP'
O
42.5
'TC 1. P' O
'TC 1. P'
'FC 6. OP' 'FC 6. OP'
'PC 2.OP' 'PC 2.OP'
O O P'
'LC3. P' 'LC3.
'FC 5. OP' 'FC 5. OP'
O O
'LC2. P' 'LC2. P'
43
'FC 8. OP' 'FC 8. OP'
Tag Index
Tag Index
O O
'TC 2. P' 'TC 2. P'
'LC2.PV'
'FC 7. OP' 'FC 7. OP'
O O
'TI1.PP' 'TI1.PP'
'TI2.PV' 'TI2.PV'
'FI1.P V' 'FI1.P V'
Chi-Square Test Statistics
'PI1.PV' 'PI1.PV'
43.5
'TI4.PV'
'TI6.PV' 'TI6.PV'
'FI5.P V' 'FI5.P V'
'TI3.PV' 'TI3.PV'
'FI4.P V' 'FI4.P V'
'TI8.PV' 'TI8.PV'
44
'PI2.P V' 'PI2.P V'
'FI3.PV' 'FI3.PV'
'LI1 .PV' 'LI1 .PV'
'FC 3. V' 'FC 3. V'
'FC 1.SP'
44.5
Fig. 38. Direct information flow paths based on NDTEs in Table 11.
Function Task
Load Alarm Data Load alarm data from an Excel file with structured data format.
Load Process Data Load process data from an Excel file with structured data format.
Load alarm and/or process data from a MATLAB data file with a reorganized data
Load Preprocessed Data
format.
Merge Data Associate the tags of process variables with their corresponding alarm tags.
Export the alarm data and/or process data as a MATLAB data file with a reorganized
Export
data format.
Clear Clear alarm and/or process data.
Function Task
HDAP Visualize alarm data over a selected time period using a high density color map.
RL&DTA Design on/off delay timers based on run length distributions.
CI Detect chattering alarms and calculate chattering indices.
ABP Calculate and visualize the peak alarm rate.
OAA Discover alarms caused by process oscillations.
ASCM Detect correlated alarms and visualize their similarity indices using a color map.
Analyze alarm floods, including identification, comparison, and clustering of alarm
AFA
floods.
Analyze the operator acknowledgement, including the acknowledgement rate and
OPAA
response time.
CIA Detect causal relations between alarm variables.
MDAA Detect mode-dependent nuisance alarms.
Function Task
Moving Average
Moving Variance
Moving Norm Reduce noises, remove bad data, or modify statistical
Filter
Rank Order distributions of process signals.
Low Pass
EWMA
Off-Delay Timer Reduce chattering or fleeting alarms by delaying alarm
Delay Timer
On-Delay Timer raising or clearing instants.
Reduce false or missed alarms by applying different
Deadband
thresholds for alarm raising and clearing.
Alarm Limit Optimization Optimize high or low alarm limit automatically.
y1 y2 y3 y4 y5
y1 0.001 0.089 0.177 0.014
y2 0.131 0.117 0.154 0.010
y3 0.078 0.005 0.008 0.105
y4 0.128 0.005 0.095 0.019
y5 0.016 0.001 0.130 0.012
Table 11. NDTE between each pair of process variables with causal relations.