An Evaluation of Machine Learning Methods To Detect Malicious SCADA Communications PDF
An Evaluation of Machine Learning Methods To Detect Malicious SCADA Communications PDF
Abstract—Critical infrastructure Supervisory Control and deception. Adversaries can leverage the wireless networks to
Data Acquisition (SCADA) systems have been designed to obtain unauthorized access and then manipulate CIS SCADA
operate on closed, proprietary networks where a malicious communications in order to adversely affect the delivery of
insider posed the greatest threat potential. The centralization of resources to a population.
control and the movement towards open systems and standards
has improved the efficiency of industrial control, but has also Information security for CISs must account for these new
exposed legacy SCADA systems to security threats that they were threat scenarios and provide effective critical infrastructure
not designed to mitigate. This work explores the viability of protection, despite the lack of inherent security mechanisms in
machine learning methods in detecting the new threat scenarios commercially available SCADA software. Since mission
of command and data injection. Similar to network intrusion resilience, or maintaining the delivery of a mission-critical
detection systems in the cyber security domain, the command service or resource, is the primary quality indicator of a CIS,
and control communications in a critical infrastructure setting incorporating a security layer a posteriori presents a significant
are monitored, and vetted against examples of benign and challenge. Any implemented security must have a high
malicious command traffic, in order to identify potential attack reliability so that the critical system does not interrupt service
events. Multiple learning methods are evaluated using a dataset based on an incorrect security layer assessment. Also, to
of Remote Terminal Unit communications, which included both accommodate the uniqueness of CIS hardware and software
normal operations and instances of command and data injection
configurations, the security layer must be adaptive and provide
attack scenarios.
the same quality regardless of a system’s equipment or
Keywords—SCADA; machine learning; intrusion detection; technologies.
critical infrastructure protection; network In this work, the viability of machine learning applied to
CIS communications is explored as an approach that can
I. INTRODUCTION
provide a reliable yet adaptive security layer. We focus
SCADA systems used for the command and control of specifically on the problem of intrusion detection in a CIS;
critical infrastructure have primarily been implemented where the control system transactions are monitored in real-
independent of cyber security considerations. These systems, time to detect, independent of the SCADA system, those
also called Critical Infrastructure Systems (CIS), have security transactions that have been manipulated to deceive operators or
with respect to trusted communications in transactions at the automated controls. The prevalence of wireless networks as a
control system layer, but lack a focus on traditional cyber conduit for CIS communication makes this attack scenario very
security implementations such as authentication and intrusion plausible for the interruption of a critical service or to damage
detection at the network layer. Historically, CISs were mission-critical equipment. So, intrusion detection within a
insulated from many security vulnerabilities due to their CIS, analogous to traffic monitoring in an enterprise network,
proprietary implementations and in many cases physical is a necessary element in the broadened scope of security.
isolation from the Internet, and so the independence of these
two different security missions reflects these different levels of We evaluate a set of machine learning algorithms in terms
vulnerability. of their ability to identify various attacks when analyzing
remote terminal unit (RTU) serial communications in a gas
However, modern CISs are moving towards more open pipeline system. The RTU data used in this set of experiments
systems and standards, improving the compatibility of these was developed by the Mississippi State University’s Critical
systems with each other and with commercially available Infrastructure Protection Center [7], and includes examples of
management software [3]. While this evolution increases the benign RTU transactions and variants of command and data
efficiency of operating multiple physical CIS locations, the injection attack transactions generated specifically for critical
lack of designed-in security at the SCADA level has exposed infrastructure protection research. We analyze the accuracy of
the potential for vulnerabilities that bridge both Computer each machine learning algorithm at correctly identifying
Network Defense (CND) and CIS applications. The prolific malicious traffic using a set of features (key-value pairs) that
nature of remotely controlled substations and wireless are derived from the RTU telemetry.
networks has created the potential for malicious and intentional
978-0-7695-5144-9 2013 54
U.S. Government Work Not Protected by U.S. Copyright
DOI 10.1109/ICMLA.2013.105
II. RELATED WORK III. EVALUATION APPROACH
This work is focused on an evaluation of machine learning The focus of the evaluation is to quantify the ability of a set
methods as discriminators for malicious SCADA of machine learning methods to detect malicious (attack)
communications. This is an extension to previous efforts in transactions within a CIS data stream. We evaluate 6 of the
applied machine learning for malicious network traffic most widely used supervised learning algorithms as described
detection. In the cyber security domain, prior art exists where in [16]. Each method is applied in isolation to the RTU
machine learning methods have been used to discriminate communication data set and then compared with competing
network traffic. In many of these, as in [9], machine learning is methods, using 10-fold cross validation for the training/test
used to classify traffic for the purpose of discovering the type data balance. The features selected for classification were
of service being used. Most other work is focused on extracted from the RTU telemetry. In this section, we describe
discriminating malicious network traffic from non-malicious our entire evaluation approach, including the machine learning
traffic. A review of these approaches is described in [5], algorithms applied, the data used in the experiment, the
including both supervised and unsupervised methods such as collection of features used in discriminating traffic types, the
support vector machines, naive Bayes, clustering, decision nature of the malicious traffic, and the overall approach to
trees and random forests. In addition, recent work has focused experimentation.
on more complex approaches to malicious network flow
detection by applying semi-supervised machine learning A. Machine Learning Methods
models [2], and in formal experimentation of scaled machine The methods selected are a mix of simple learners,
learning intrusion detection system (IDS) prototypes [1]. industry-standard learners and traditionally good performers in
terms of generalization. We intend for this work to establish a
There are several advantages to using machine learning to foundation for the application of machine learning to CIS, with
discriminate malicious network communications [11], the expectation that more complex algorithms will be
including minimizing reliance on human analysis and necessary as the complexity of the CIS grows. The evaluations
adaptation to local network environments. Machine learning were performed using the Weka machine learning software
provides an insight that would be difficult for a human to [15]. The methods used were:
explicitly describe as a rule or signature because it can
potentially evaluate thousands of interdependent metrics 1. Naïve Bayes - is a probabilistic classifier based on
simultaneously. Unlike signature-based systems, zero-day Bayes' theorem [20], and adopted into the field of
(previously unseen) attack detection is possible because machine learning in 1992 [21].
communications are classified based on their similarity to
known types rather than the explicit patterns specified in 2. Random Forests – consisting of a combination of
signatures. These systems are also particularly effective in tree predictors where each tree depends on the values
detecting variants of attacks [12], which are small changes in a of a random vector sampled independently and with
known attack vector created to bypass signature-based sensors. the same distribution for all trees in the forest [17].
In the CIS domain, there has been some prior work in the 3. OneR - simply evaluates each feature’s optimum
analysis of SCADA communications for intrusion detection. ruleset and chooses the best one [18].
Signature-based intrusion detection approaches have been 4. J48 - is an implementation of the C4.5 decision tree
implemented on Modbus networks as described in [13].
algorithm [19].
Machine learning methods have also been applied previously
to critical infrastructure, albeit not with an intrusion detection 5. NNge – Nearest-neighbor-like algorithm using non-
focus. Fukuda and Shibata [6] describe an application of nested generalized exemplars [22].
neural networks to supervised control where the neural 6. SVM – Support vector machines [4] trained using
network learns the associations between control measurements sequential minimal optimization [23].
and user actions. Fuzzy learning [10] has been employed to
improve the performance of machine drives. More recently, B. Experiment Data
Won et al. [8] used decision trees to perform fault and failure
diagnosis in industrial control networks. These prior works The data used for this experiment is a collection of labeled
seek to optimize the efficiency of the control system, and the RTU telemetry streams from a gas pipeline system in
use of machine learning as a tool for control system security Mississippi State University’s Critical Infrastructure Protection
does not appear to have been explored. Center [14]. The telemetry streams included examples of
command injection attacks, data injection attacks, and those
The benefits and qualities of machine learners applied to where no attacks occurred (normal). From these streams, we
network traffic are also applicable in SCADA systems, in order generated feature sets used as the basis for discrimination by
to give a CIS system a defense capability in recognizing when the learning methods. The process for feature set generation
system operations are being manipulated. We believe this work was to ingest the RTU data frame-by-frame, break the frames
to be original in terms of using machine learning in a CIS into individual values of telemetry data, translate data items
context for the purpose of control system security. (such as floating point values) into variables that are
programmatically represented, and grouping variables in
collections based on timing. Once the feature sets were
developed from the RTU telemetry, we evaluated various
55
learners and their generalization performance on those labeled Table 1 RTU Feature Descriptions
data sets.
Feature Name Description
Each telemetry stream was comprised of both commands Pipeline Pressure The gas pipeline pressure value,
and responses. A command is used to set the value of an item (PSI) pounds per square inch (PSI), in the
under test. In the case of the gas pipeline system, a transaction response.
programmable logic controller (PLC) is used to maintain a
specific pipeline pressure value. The RTU commands Invalid Function Code A binary indicator of whether the
communicate the desired gas pipeline pressure, called a function code is invalid.
setpoint value, to the PLC that contains the logic and physical Setpoint The setpoint value included in the
relays to control the pipeline to that pressure value. In some command.
cases, the RTU command is simply a request for the latest
pipeline pressure value. A response is the PLC reporting the Invalid Data Length The transaction is comprised of a
current value of the gas pipeline pressure. Nominally, command or response with a data
commands and responses will occur in pairs, where the element of an invalid size.
SCADA system commands the PLC to a specific setpoint or Command A binary indicator of whether the
requests the pressure value, and the PLC reports the current transaction contains command data.
value. Examples of an RTU command and response from the
raw data are shown in Figure 1 and Figure 2, respectively. Command Data The length of the command data for
Length this transaction.
Response A binary indicator of whether the
transaction contains response data.
Response Data Length The length of the command data for
this transaction.
Control Mode An indicator of whether the RTU
Figure 1 RTU Command Example unit is in auto control mode or off.
Control Scheme An indicator of whether control is
accomplished through the solenoid.
Solenoid State A binary indicator of whether the
solenoid is on or off.
Pump State A binary indicator of whether the
pump is on or off.
Figure 2 RTU Response Example
Each telemetry item has a PLC address and function code (FC)
that defines the type of command or response it is. Raw D. Attack Data Description
floating-point values for the setpoint and pipeline pressure are The data/response injection attacks in this experiment focus
embedded in the packet body in addition to bytes with flag on manipulating PLC responses to deceive a human operator or
values that define the state of the gas pipeline system in each the automated control as to the actual state of the gas pipeline
command and response. Timestamps and Cyclic Redundancy system. There were 7 variants of the data injection attack
Checks (CRCs) are additional components to the commands replicated in the RTU telemetry, each manipulating the PLC
and responses. response in a different way.
C. Feature Set Design 1. Negative Values - Injecting negative values as the
pipeline pressure, which are invalid as the pressure
Machine learning systems base both their learning and should always be a positive or zero value.
analysis on a data structure called a feature set, which is a
collection of key-value pairs derived from the raw data that 2. Burst Values - Sending multiple successive pipeline
represent indicative elements of the data. Examples of pressure values, faster than the data display rate for
different classes of data are stored as feature sets, and acquired the operator interface.
raw data is converted into feature sets prior to classification. 3. Fast Change - Sending successive and variant pipeline
The feature sets in this application focus on the specific values pressure values to create a lack of confidence in the
associated with the RTU data and the results of simple tests correct operation of the system.
that provide checks on the integrity of both the data and
protocol of the transactions. The complete set of features used 4. Single Data Injection - Following an actual response
with an artificial one where the gas pipeline value is
in this machine learning evaluation is described in Table 1.
doctored in order to deceive the PLC control loop.
56
5. Slow Change - Sending delayed and variant pipeline IV. RESULTS
pressure values to create a lack of confidence in the This section contains the evaluation results for applying
correct operation of the system. machine learning methods to SCADA system commands and
6. Value Wave Injection - Multiple oscillating pressure responses. For each type of command/data injection attack we
values in order to deceive the PLC control loop and describe the base performance of each learner when using the
undermine the operator’s confidence in the system. features extracted from the RTU telemetry. We present the
precision and recall for each class and each method, for both
7. Setpoint Value Injection – The attacker sends false binary and multiclass classification problems.
pipeline pressure values equal to the setpoint.
A. Data/Response Injection Results
Command injection attacks manipulate outgoing commands
to control the gas pipeline, or to acquire information about the Figure 3 and Figure 4 show the recall and precision for
pipeline system and PLC. The four types of command injection each of the learning methods in discriminating the data
attacks are: injection attack types as well as normal RTU transactions.
57
communications. The binary classification results are very
compelling with several learning algorithms, including the
simplistic One-R method, approaching perfect classification.
Reducing the problem to a binary classification problem
appears to leverage the generalization performance of machine
learning more effectively, albeit at the cost of fidelity in
understanding the specific type of levied attack.
58
V. CONCLUSION REFERENCES
The application of machine learning methods to SCADA [1] J.M. Beaver, C.T. Symons, and R.E. Gillen, “A learning system for
data demonstrates their promise in addressing security discriminating variants of malicious network traffic,” Proc. 8th Annual
Cyber Security and Information Intelligence Workshop, January 2013.
concerns in the CIS domain. With a very basic set of features,
[2] C.T. Symons and J.M. Beaver, “Nonparametric semi-supervised
and treating the detection of malicious RTU communications learning for network intrusion detection: combining performance
as a binary classification problem, this experiment was able to improvements with realistic in-situ training,” Proc. 5th ACM Workshop
demonstrate the power of machine learning at detecting broad on Security and Artificial Intelligence, pp. 49-58, October 2012.
classes of attacks. The six selected learning methods [3] A. Cardenas, S. Amin, and S. Sastry, “Research challenges for the
generalized the RTU data well, and the precision/recall values security of control systems,” Proc. 3rd Conf. on Hot Topics in Security
for the binary classifiers were high enough to be operationally (HOTSEC ’08).
feasible as an a posteriori instrusion detection approach for [4] C. Cortes and V. Vapnik, “Support-vector networks,” Machine
CISs. Learning, vol. 20, no. 3, pp. 273–297, 1995.
[5] S. Dua and X. Du, Data Mining and Machine Learning in Cybersecurity,
Despite the good performance of the various learners, this Boca Raton, FL, Taylor and Francis Group, LLC, 2011.
work identified several opportunities for improvement that [6] T. Fukuda and T. Shibata, “Theory and applications of neural networks
center on considering additional features. As many of the for industrial control systems.,” IEEE Trans. Industrial Electronics, vol.
injection attacks involve timing, format, and protocol 39, no. 6, pp. 472–489, 1992
violations, we intend to extend this work to explore features [7] Mississippi State University, Critical Infrastructure Protection Center,
http://www.security.cse.msstate.edu/cipc/.
that consider these elements of the telemetry stream in making
[8] Y. Won, M. Choi, B. Park, and J.W. Hong, “An approach for failure
the intrusion detection decision. Improved performance will be recognition in IP-based industrial control networks and systems,”
necessary in order to maintain the same machine learning International Journal of Network Management, 2012.
generalization performance with a more minimal training data [9] S. Zander, T.T.T. Nguyen, et al, “Automated traffic classification and
set and in a more complex CIS comprised of multiple PLCs application identification using machine learning,” IEEE Conference on
and multiple SCADA systems. Local Computer Networks 30th Anniversary (LCN '05), Sydney,
Australia, 2005.
We applied multiple learning algorithms to RTU data in [10] L. Zhen and L. Xu, “Fuzzy learning enhanced speed control of an
order to show their viability as an intrusion detection approach indirect field-oriented induction machine drive,” IEEE Transactions on
for CISs. We recognize that the learning methods employed in Control Systems Technology, vol. 8, no. 2, pp. 270-278, 2000.
this work can be considered state-of-the-practice. The novelty [11] R. Sommer and V. Paxson, “Outside the closed world: on using machine
learning for network intrusion detection,” 2010 IEEE Symposium on
is in the application of machine learning to solve the problem Security and Privacy, 2010.
of the post-deployment application of security in the CIS [12] O. Sharma, M. Girolami, et al., “Detecting worm variants using machine
domain. We see this work as a foundation, and encourage the learning,” ACM Conference on emerging Network EXperiments and
future exploration of more complex SCADA systems, more Technologies(CoNEXT), New York, NY, 2007.
difficult attack vectors, and more advanced machine learning [13] T. Morris, R. Vaughn, and Y. Dandass, "A Retrofit Network Intrusion
methods to discriminate those attacks. Detection System for MODBUS RTU and ASCII Industrial Control
Systems," 45th Hawaii Intl. Conf. on System Sciences (HICSS), 2012.
ACKNOWLEDGMENT [14] T. Morris, R. Vaughn, and Y.S. Dandass, “A testbed for SCADA control
system cybersecurity research and pedagogy,” Proceedings of the
Research sponsored by the Laboratory Directed Research Seventh Annual Workshop on Cyber Security and Information
and Development Program of Oak Ridge National Laboratory, Intelligence Research, 2011.
P.O. Box 2008, Oak Ridge, Tennessee 37831-6285; managed [15] M. Hall, E. Frank, et al., “The WEKA Data Mining Software: An
by UT Battelle, LLC, for the U.S. Department of Energy under Update,” SIGKDD Explorations, vol. 11, no. 1, 2009.
contract DE-AC05-00OR2225. This manuscript has been [16] R. Caruana and A. Niculescu-Mizil, “An empirical comparison of
supervised learning algorithms,” Proceedings of the 23rd Intl. Conf. on
authored by UT-Battelle, LLC, under contract DE-AC05-
Machine Learning, pp. 161-168, 2006.
00OR22725 for the U.S. Department of Energy. The United
[17] L. Breiman, "Random forests," Machine Learning vol. 45, no. 1, pp. 5-
States Government retains and the publisher, by accepting the 32, 2001.
article for publication, acknowledges that the United States [18] R.C. Holte, "Very simple classification rules perform well on most
Government retains non-exclusive, paid-up, irrevocable, commonly used datasets," Machine Learning, vol. 11, no. 1, pp. 63-90,
worldwide license to publish or reproduce the published form 1993.
of this manuscript, or allow others to do so, for United States [19] J.R. Quinlan, C4.5: Programs for Machine Learning, Morgan
Government purposes. Kaufmann Publishers Inc., San Francisco, CA, USA, 1993.
[20] T. Bayes. Phil. Trans. of the Royal Soc. of London, 1763.
We would like to thank Dr. Thomas Morris and his staff at
[21] P. Langley, W. Iba, and K. Thompson, "An analysis of Bayesian
the Mississippi State University’s Critical Infrastructure classifiers," AAAI, vol. 90, 1992.
Protection Center for providing the critical infrastructure data [22] B. Martin, Instance-Based Learning: Nearest Neighbor with
that made this study possible. Generalization, University of Waikato, 1995.
[23] J. Platt, “Sequential Minimal Optimization: A Fast Algorithm for
Training Support Vector Machines,” Advances in Kernel Methods –
Support Vector Learning, 1998.
59