Anomaly Detection System in Se
Anomaly Detection System in Se
Abstract—Continuous growth of using the information Signature Detection, Data Center, Cloud Computing,
technologies in the modern world causes gradual Vulnerability, Security, Technology Architecture, Threat
accretion amounts of data that are circulating in Model.
information and telecommunication system. That creates
an urgent need for the establishment of large-scale data
storage and accumulation areas and generates many new I. INTRODUCTION
threats that are not easy to detect. Task of accumulation
Anomaly detection is one of the most important
and storing is solved by datacenters – tools, which are
able to provide and automate any business process. For concepts of data analysis. Information object is
now, almost all service providers use quite promising considered as an anomaly if it is significantly differs from
normal data behavior in some sphere. In general, it means
technology of building datacenters – Cloud Computing,
that object is not like the others in a particular data array
which has some advantages over its traditional opponents.
Nevertheless, problem of the provider’s data protection is [1]. It is important to detect these objects in order to
so huge that risk to lose all your data in the ―cloud‖ is consider them from a different angle and use other
detection methods. During the anomaly detection process
almost constant. It causes the necessity of processing
researchers deal with such problems: as determing of
great amounts of data in real-time and quick notification
of possible threats. Therefore, it is reasonable to normal area that might be presented in adequate form is
implement in data centers’ network an intellectual system, often a difficult task; boundary between normal and
anomaly behavior is not always clear; exact anomaly
which will be able to process large datasets and detect
detection is different depending on field of application;
possible breaches. Usual threat detection methods are
based on signature methods, the main idea of which is availability of relevant data for training or checks; data
comparing the incoming traffic with databases of known can contain noise; normal behavior is dynamic and
constantly evolving.
threats. However, such methods are becoming ineffective,
Anomaly detection methods are widely used in the
when the threat is new and it has not been added to
database yet. In that case, it is more preferable to use following areas: cloud-computing environment, fraud
intellectual methods that are capable of tracking any detection in banking and mobile areas, monitoring of
information systems hardware, network’s intrusions
unusual activity in specific system – anomaly detection
detection system, processing CCTV images, detection of
methods. However, signature module will detect known
threats faster, so it is logical to include it in the system suspicious web-site etc.
too. Big Data methods and tools (e.g. distributed file From this point of view the aim of this paper is to
develop an anomaly detection system in secure cloud
system, parallel computing on many servers) will provide
computing environment. To achieve this aim should be
the speed of such system and allow to process data
dynamically. This paper is aimed to demonstrate solved such tasks:
developed anomaly detection system in secure cloud
computing environment, show its theoretical description Developing of secure cloud data center model;
and conduct appropriate simulation. The result Developing of anomaly detection system for
demonstrate that the developed system provides the high Cloud Computing protected environment;
percentage (>90%) of anomaly detection in secure cloud Big Data concept analysis;
computing environment. Experimental research of anomaly detection
module in developed system for Cloud Computing
Index Terms—Anomaly Detection, Big Data, secure environment.
Information Security, Data Analysis, Machine Learning,
Copyright © 2017 MECS I.J. Computer Network and Information Security, 2017, 4, 10-21
Anomaly Detection System in Secure Cloud Computing Environment 11
Copyright © 2017 MECS I.J. Computer Network and Information Security, 2017, 4, 10-21
12 Anomaly Detection System in Secure Cloud Computing Environment
The absence of known vulnerabilities is also an However, drawbacks uniting all of the above methods
important criterion [7, 24]. According to Table 2 almost are following:
for all data centers there were recorded different, from
powerful lightning strike to the building of the data center Unprotected state of the information system, while
or multiple network attacks. The only data centers for anomaly detection system is learning and building
which vulnerabilities were not detected (or information normal profile;
about them is hidden) – are Tulip Data Center and If malicious activity corresponds to normal profile,
DuPont Fabros Technology. there will be no alert about anomaly;
D. Anomaly Detection Methods High false-positive rate;
Notifications and warnings about anomalies can
The analysis of modern anomaly detection methods contain not enough information for the further
allowed to make their comparison (table 3) by following analysis because of aggregation of big amount of
criteria [3]: data and abstraction from particular information
for moving to mathematical modeling [2].
Low demand on computing resources (LDCR);
Lack of need in particular data distribution Signature databases did not manage to update intime,
(LNDD); that’s why we propose to use a system, which combines
Simplicity of implementation (SI); detection of new anomalies and tracking existing, using
Little amount of false-positive rate (LAFPR); signature methods and available databases. To increase
Unsupervised learning (UL). the speed of such system, it is recommended to use Big
Data methods and instruments.
According to the analysis, Decision Tree method is one
of the best bases for developing anomaly detection
system.
Copyright © 2017 MECS I.J. Computer Network and Information Security, 2017, 4, 10-21
Anomaly Detection System in Secure Cloud Computing Environment 13
Copyright © 2017 MECS I.J. Computer Network and Information Security, 2017, 4, 10-21
14 Anomaly Detection System in Secure Cloud Computing Environment
The main idea of this model is that information 5) Network Security; 6) Secure Encryption system and
security should not be secondary part of overall security. Key Management System.
It must be applied and implemented at all levels of Fig. 5 shows relation between levels of cloud data
architecture. center protection and their interaction.
Copyright © 2017 MECS I.J. Computer Network and Information Security, 2017, 4, 10-21
Anomaly Detection System in Secure Cloud Computing Environment 15
INPUT
DATA
Master Master
Node Node
Slave Slave Slave Slave Slave Slave Slave Slave Slave Slave
Map() Map()
Reduce() Reduce()
NO NO
Copyright © 2017 MECS I.J. Computer Network and Information Security, 2017, 4, 10-21
16 Anomaly Detection System in Secure Cloud Computing Environment
1. Calculate entropy at node A (Fig.7): Decision tree is a greedy algorithm that grows the tree
top-down. At each node it selects the features that best
classifies the local training samples. This process
A
continues until the tree perfectly classifies the training
samples, or all features have been used [2].
C. Big Data concept
To process big amounts of data, a set of special
methods is used. One of the examples is MapReduce [1].
MapReduce – is software framework for distributed
computing, which uses ―divide and conquer‖ method for
a b splitting big data’s difficult problems into the small
blocks of work and processing them in parallel mode.
Fig.7. Model Example MapReduce contains two steps: step ―Map‖ – data
from the master node splitting into great amount smaller
subproblems. Worker nodes process some subsets under
M M N N the JobTracker’s control and save the result in the local
H (S ) log 2 log 2 ,
M N M N N M N M file system. Step ―Reduce‖ - analyses and perform
operation of merge the input data from the previous step.
where M – quantity of anomaly data in the node A, N A large number of Reduce-step is possible in order to
– quantity of normal data in the node A, H ( S ) - value of execute processes of merge in parallel mode, so these
tasks are also performed on worker nodes under the
entropy before the split.
JobTracker’s control.
Another method is Hadoop. Hadoop contains
2. The data set is split into two branches by different
distributed file system; platforms for data analysis and
feature; the entropy for each branch is calculated:
storage; parallel computing management level;
configurations administrations.
H a H (m, n); One more utility is Apache Spark. Spark – is cluster-
m m n n computing engine, which provides extremely fast data
Ha log 2 log 2 ,
mn mn nm nm processing and reliability. It has software interface, which
are based on different programming languages: Java,
Python, and Scala.
Hb H (M m, N n), It supports in-memory computing, which allows access
M m M m to data and process requests much faster, compared to
Hb log 2
( M m) ( N n ) ( M m) ( N n ) disk-based system (i.e. Hadoop).
In general, Spark is progressive and very useful update
N n N n
log 2 , for Hadoop, aimed at improvement of real-time analysis.
( N n ) ( M m) ( N n) ( M m) The main advantages of Apache Spark:
where m – quantity of anomaly data in the node a, n – The fastest engine for processing big arrays of
quantity of normal data in the node a. data;
Worker processes are identified using
3. The entropy for each branch is added proportionally MapReduce-style, which simplifies its
to get total entropy for the split: implementation along with Hadoop;
Simple installation;
H (S | A) Pa H a Pb H b , Spark is written in Scala, modern object-oriented
mn ( M m) ( N n) programming language, which has many resources
H ( S | A) Ha Hb and active community;
M N (M N ) Many platforms is supporting Spark and its
technology stack (MapR, Cloudera, Databricks);
where Pa – ratio between the quantity of node’s а Spark’s reliability can be proved by Intel
elements and the quantity of node’s A elements, Pb – ratio recommendation to use it in healthcare solutions;
between the quantity of node’s b elements and the One of the most used Spark features – capability
quantity of node’s A elements. to consolidate data sets from a few incompatible
sources [8].
4. The resulting entropy is subtracted from the entropy
before the split and the result is the information gain
or decrease in entropy:
Copyright © 2017 MECS I.J. Computer Network and Information Security, 2017, 4, 10-21
Anomaly Detection System in Secure Cloud Computing Environment 17
Copyright © 2017 MECS I.J. Computer Network and Information Security, 2017, 4, 10-21
18 Anomaly Detection System in Secure Cloud Computing Environment
Copyright © 2017 MECS I.J. Computer Network and Information Security, 2017, 4, 10-21
Anomaly Detection System in Secure Cloud Computing Environment 19
security layers connection. Under the conditions of After all relevant simulation results were compared to
inclusion of protective efficiency of the data center with known models of data centers (Table 2) and determined
the end user is higher and is done through a secure that the model of a secure data center lacks identified in
communication. the gap analysis.
1 2
3
а b c
Fig.12. The Graphics Performance Of:Data Center Server (a), Network Data Center (b) Data Center With End User (c)
Copyright © 2017 MECS I.J. Computer Network and Information Security, 2017, 4, 10-21
20 Anomaly Detection System in Secure Cloud Computing Environment
Copyright © 2017 MECS I.J. Computer Network and Information Security, 2017, 4, 10-21
Anomaly Detection System in Secure Cloud Computing Environment 21
[27] ―What type of data center do you need? [Online]. Ukrainian Scientific Journal of Information Security, Chairman
Available:http://www.compassdatacenters.com/type-data- in Young Scientist Association of NAU. Research interests:
center-need/. Cryptography, Quantum Key Distribution, Network & Internet
[28] ―Whitepaper Cloud Computing Use Cases Version 3.0, Security, Information Security Incident Management,
produced by the Cloud Computing Use Case Discussion Cybersecurity & CIIP.
Group‖ [Online]. Available:
http://opencloudmanifesto.org/cloud_computing_use_case
s_whitepaper-3_0.pdf. Oksana Koval Master's Degree Student.
[29] ―4 types of data centers‖, 2012. [Online]. Available: As a result of the Degree Thesis defense
https://gigaom.com/2012/10/15/4-types-of-data-centers/. ―Secured data center model based on
Cloud Computing technology‖ in 2016 she
received Bachelor's Degree in Information
Security Management from NAU.
Authors’ Profiles Research interests: Information Security,
Data Analysis, Cloud Computing,
Zhengbing Hu PhD, Associate Professor Cybersecurity, Information Security Management Systems.
of School of Educational Information
Technology, Central China Normal
University, M.Sc. (2002), Ph.D. (2006) Viktor Gnatyuk PhD Student (2012-2015),
from the National Technical University of Assistant Teacher (from 2013). In 2012 he
Ukraine ―Igor Sikorsky Kyiv Polytechnic received MSc degree in Economic
Institute‖. Postdoc (2008), Huazhong Cybernetic from Khmelnitsky National
University of Science and Technology, University (Khmelnitsky, Ukraine). He is
China. Honorary Associate Researcher (2012), Hong Kong currently working at NAU in
University, Hong Kong. Major research interests: Computer Telecommunication Systems Academic
Science and Technology Applications, Artificial Intelligence, Department. Research interests: Computer
Network Security, Communications, Data Processing, Cloud Network & Internet Security, Information Security Incident
Computing, Education Technology. Management.
How to cite this paper: Zhengbing Hu, Sergiy Gnatyuk, Oksana Koval, Viktor Gnatyuk, Serhii Bondarovets,"Anomaly
Detection System in Secure Cloud Computing Environment", International Journal of Computer Network and
Information Security(IJCNIS), Vol.9, No.4, pp. 10-21, 2017.DOI: 10.5815/ijcnis.2017.04.02
Copyright © 2017 MECS I.J. Computer Network and Information Security, 2017, 4, 10-21
Reproduced with permission of the copyright owner. Further reproduction prohibited without
permission.