Deep Learning-Based Anomaly TR
Deep Learning-Based Anomaly TR
Research Article
Deep Learning-Based Anomaly Traffic Detection Method in
Cloud Computing Environment
1 2
Junjie Cen and Yongbo Li
1
College of Computer Science and Technology, Henan Institute of Technology, Xinxiang, Henan 453002, China
2
College of Computer and Information Engineering, Henan Normal University, Xinxiang, Henan 453002, China
Received 25 January 2022; Revised 3 March 2022; Accepted 7 March 2022; Published 31 March 2022
Copyright © 2022 Junjie Cen and Yongbo Li. This is an open access article distributed under the Creative Commons Attribution
License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
To address the problem of poor detection performance of existing intrusion detection methods in the environment of high-
dimensional massive data with uneven class distribution, a deep learning-based anomaly traffic detection method in cloud
computing environment is proposed. First, the fuzzy C-means (FCM) algorithm is introduced and is combined with the general
regression neural network (GRNN) to cluster the samples to be classified in the original space by FCM. Then, the GRNN model
is trained and the center point is updated using the sample closest to the FCM clustering center until a stable cluster center is
obtained. The parameters in FCM-GRNN are optimized using the global optimization feature of the modified fruit fly
optimization algorithm (MFOA), and the optimal spread value is found using the three-dimensional search method through an
iterative search. Finally, experiments are conducted based on the KDD CUP99 dataset, and the results demonstrate that the
detection rate (DR) and false alarm rate (FAR) of the proposed FCM-MFOA-GRNN method are 91% and 1.176%, respectively,
which are better than those of the comparison methods. Therefore, the proposed method has good anomaly traffic detection ability.
distribution, a deep learning-based anomaly traffic detection algorithm of the ADE model was introduced in a supervised
method in cloud computing environment is proposed. The deep neural network model to efficiently tune its parameters
contributions are as follows: and classify the network traffic. Literature [26] proposed a
supervised LSTM-based intrusion detection algorithm that
(1) The fuzzy C-means (FCM) algorithm is introduced can detect DoS attacks and probe attacks that have unique
and combined with the general regression neural time series features. Zhang et al. proposed a parallel cross
network (GRNN). The samples to be classified in convolution neural network (PCCN) based on deep learning
the original space are clustered by the FCM algo- [27]. By fusing the traffic features learned from the two
rithm, and the sample closest to the FCM clustering branches of CNN, a better feature extraction effect is
center is used to train the GRNN model and update obtained. Literature [28] combines CNN and LSTM to learn
the center until a stable clustering center is obtained, the temporal and spatial characteristics of network traffic.
which improves the stability of the anomaly traffic The above methods are difficult to effectively mine data fea-
detection system tures and have poor detection performance in the face of
high-dimensional data, resulting in low detection rate as well
(2) The parameters of the FCM-GRNN method are
as high false alarm rate.
optimized by using the global search feature of the
modified fruit fly optimization algorithm (MFOA).
And the optimal spread value is found by an itera- 3. Application Scenarios of the Proposed Method
tive search using the three-dimensional search
In the design process of the anomaly traffic detection and
method with the keen olfactory and visual functions
analysis model, the principle of modular design is followed.
of fruit flies, so that the proposed algorithm can
The modular design of the anomaly traffic detection and anal-
converge faster
ysis is conducive to simplifying the complex problems, which
is easy to find the problem in the design and can facilitate the
2. Related Works update and maintenance of the system at a later stage. The
specific functions of each module are shown as follows. As
In recent years, scholars have conducted in-depth research shown in Figure 1, the whole model can be divided into four
on abnormal traffic detection methods. The results show major modules: SDN controller module, traffic collection
that for all abnormal traffic detection data sets, deep learning module, traffic analysis module and traffic cleaning module.
methods are better than traditional methods. Literature [18] SDN controller can realize the centralized control of the
proposed a sliding window abnormal traffic detection whole network. The floodlight controller is used to divert the
method based on the mixed dimension of time and space. traffic from each OpenFlow switch to the traffic collection
The detection algorithm adopts the combination of machine module to collect network traffic. As illustrated in Figure 1,
learning and neural network. A sliding window anomaly the traffic in switch A, switch B, and switch C will be con-
detection method based on network traffic was studied in trolled by the SDN controller and converged to the traffic
the Literature [19]. The method combined the sliding collection module through the secure channel. The traffic
window and deep learning architecture to analyze network analysis module is the core of the entire anomaly traffic
traffic, and features in each window were extracted, vectored detection model. It uses the FCM-MFOA-GRNN algorithm
and then put into a deep neural network for training. Liter- to cluster and analyze the collected traffic to separate the
ature [20] proposed a network intrusion detection method normal traffic from the attack traffic with different attack
based on a lenet5 model, which improved the detection behaviors. The traffic cleaning module consists of many
accuracy. Blanco et al. used the genetic algorithm (GA) to physical devices that can clean different attack traffic, such
optimize a CNN classifier to find better input feature combi- as IDS, UTM, WAF, and other physical devices.
nation [21]. Literature [22] converts variable length data
sequence into fixed length data through LSTM and uses an 4. The Proposed Method
automatic encoder to process fixed length data under unsu-
pervised conditions, so as to reduce the dimension of input 4.1. Algorithm Flow Chart. Although the FCM algorithm can
data and extract reliable features at the same time. On the cluster the data and perform mining analysis, many intru-
basis of cross validation, the threshold is set to classify the sion ways cannot be accurately classified because there are
abnormal parts in the input traffic data series. In Literature many kinds of data characterizing intrusion categories in
[23], a deep autoencoder-based intrusion detection method intrusion detection systems and the differences between
was investigated with layer-by-layer greedy training to avoid these data are subtle. Therefore, combined with the charac-
overfitting. A self-learning framework based on stacked self- teristics of GRNN, this paper proposes an improved FCM-
encoders for feature learning and dimensionality reduction MFOA-GRNN algorithm based on the FCM algorithm.
was proposed in Literature [24]. It applied the support vec- The flow chart of FCM-MFOA-GRNN algorithm is shown
tor machine (SVM) approach for classification, which shows in Figure 2. It can be seen that the core module of the
good performance in two-class and multiclass classification. algorithm includes five parts, which are the FCM clustering
In Literature [25], an unsupervised deep autoencoder model algorithm, initial selection of network training data,
was used for training so as to learn normal network behav- MFOA-GRNN network training, MFOA-GRNN network
iors and generate optimal parameters. Then, the estimation prediction, and network training data selection in order.
Wireless Communications and Mobile Computing 3
Application
It performs arithmetic summation on the outputs of all
Flow controller module neurons in the mode layer, and the transfer function can
be written as
Figure 1: Anomaly traffic detection model based on FCM-MFOA-
GRNN. n
SD = 〠 P i : ð4Þ
Network attack data i=1
Iterative
results
SN j
yi = : ð7Þ
4.2. MFOA-GRNN Network SD
4.2.1. Network Structure of GRNN. Figure 3 shows the 4.2.2. Network Flow of MFOA-Optimized GRNN. The per-
structure diagram of the GRNN network. The input of formance of GRNN can be directly affected by the value of
the network is X = ½x1 , x2 , ⋯, xn T , the output is Y = σ. This paper proposes a new MFOA-optimized GRNN,
½y1 , y2 , ⋯, yn T . which is named as MFOA-GRNN, for the purpose of opti-
mizing the spread value. FOA is prone to local extremes
(1) The number of neurons in the input layer is equal to and cannot search for the global optimum, which is mainly
the vector dimension of the learning sample and is caused by its fitness function. Hence, the fitness function
the same as the number of neurons in the mode layer. must be modified to get rid of the local extremes. On the
The neuron transfer function in the mode layer is other hand, if the distance DistðiÞ is positive, its reciprocal
4 Wireless Communications and Mobile Computing
x1 x2 xn 1
Si = + Δ, ð10Þ
DistðiÞ
... Summation layer Step 5. Find the individual with the optimal SmellðiÞ in the
population, i.e., the minimal value of MSE.
xi = X Init + random value, (3) Randomly generate the initial position ðX Init , Y Init ,
yi = Y Init + random value, ð8Þ Z Init Þ, number of individuals, and maximum number
of iterations of the Drosophila population and gener-
z i = Z Init + random value: ate the random direction and distance of flight
Step 3. Calculate the distance DistðiÞ between each point and Step 1. First find the sample mean meani of each class in the
the initial point. n classes divided from the FCM clustering separately.
FP
FAR = , ð13Þ where xk denotes the mean value of the kth attribute, Sk
TN + FP
denotes the mean absolute error of the kth attribute, and
where TP is the abnormal data detected as abnormal, FN xik denotes the kth attribute of the ith record.
6 Wireless Communications and Mobile Computing
y
xik − xk
Z ik = , ð16Þ
Sk
2 2
where Z ik represents the value of the kth attribute in the ith
data record after normalization.
1
(3) Normalization: normalize each value after standard- 1
ization to the interval [0,1].
X − min
X′ = , ð17Þ
max − min
0 1 2 3 4 x
where max and min are the maximum and minimum values
Figure 5: Iterative optimization trajectory of fruit flies in the
of the sample data, respectively.
proposed method.
5.3. Iterative Optimization Trajectory of the Proposed
Method. Let the initial position of Drosophila population Table 1: Intrusion detection results based on the FCM-MFOA-
be ½0, 0:5, 0, the population size be 8, and the number of GRNN method.
iterations be 150. Select 200 groups as training samples Normal Attack DR FAR
and 10 groups as prediction samples. The models proposed
Date set 1 3742 91 91% 1.175%
in this paper are used for prediction at the same time, and
the results are shown in Figure 5. It can be seen that the fruit Date set 2 3758 93 93% 1.297%
fly group in the proposed model does not follow a certain Date set 3 3756 90 90% 1.138%
directional path to find the optimal solution sequentially, Date set 4 3697 89 89% 1.109%
but there are only 6 position points in the trajectory route. Date set 5 3762 92 92% 1.161%
Average value 3743 91 91% 1.176%
5.4. Intrusion Detection Results Based on the FCM-MFOA-
GRNN Algorithm. In order to reflect a real network environ-
ment as much as possible, a number of data are selected
Table 2: Performance comparison of different methods in
from the KDD CUP99 dataset to create 5 groups of datasets,
intrusion detection.
each of which contains 3800 normal data and 100 attack
data. And these 5 groups of datasets need to be as even as Method DR FAR
possible in selecting attack categories. Table 1 shows the Literature [27] 89.24% 2.075%
results of three simulation experiments on each dataset using
Literature [28] 90.1% 1.237%
the FCM-MFOA-GRNN algorithm, respectively, and the
experimental results are taken as the average of the three Proposed method 91.0% 1.176%
results. Among them, two parameters are important metrics
that can indicate the performance of the algorithm, i.e., DR
and FAR. As shown in Table 1, DR and FAR of the proposed small differences of attributes in complex spatial data
method are 91% and 1.176%, respectively. and improve the accuracy of detection. In contrast, the
In order to demonstrate the performance of the pro- comparison methods do not effectively mine the features
posed method, it is compared with the methods proposed of high-dimensional data, and therefore, the detection per-
in Literature [27] and Literature [28] under the same exper- formance is poor.
imental conditions, and the comparison results are shown in In order to compare the running time of the proposed
Table 2. From the experimental results, it can be noted that method with the methods of Literature [27] and Literature
DR of the method proposed in Literature [27] is only 89.24% [28], different amount of data are selected from the KDD
and FAR is 2.075%. DR of the method of Literature [28] is CUP99 dataset for testing. The running time of each
90.1% and FAR is 1.237%. DR and FAR of the proposed algorithm was compared, and the results are shown in
method are 91% and 1.176%, respectively, which are better Figure 6. The minimum number of data selected is 200,
than those of the comparison methods. This is because the and the maximum number is 20000. It can be easily seen
proposed method combines the FCM algorithm with from Figure 6 that the method of Literature [27] takes the
GRNN, so as to cluster the samples to be classified in the most average detection time during the whole experiment,
original space by the FCM algorithm and then use the sam- and the proposed method takes the shortest time. This is
ple closest to the FCM clustering center to train the GRNN because the parameters of the FCM-GRNN model are opti-
model and update the center point until a stable clustering mized by using the global search feature of MFOA, and the
center is obtained. Therefore, it can better distinguish the three-dimensional search method is used to find the optimal
Wireless Communications and Mobile Computing 7
100
80
40
20
0
103 104 105
Dataset size
Literature[27]
Literature[28]
Proposed method
Figure 6: Comparison of running time when different sizes of datasets are detected by different methods.
[10] A. Drewek Ossowicka, M. Pietrołaj, and J. Rumiński, “A sur- on deep learning models,” Journal of Information Security
vey of neural networks usage for intrusion detection systems,” and Applications, vol. 41, no. 12, pp. 1–11, 2018.
Journal of Ambient Intelligence and Humanized Computing, [26] R. C. Staudemeyer and C. W. Omlin, “Evaluating performance
vol. 12, no. 1, pp. 497–514, 2021. of long short-term memory recurrent neural networks on
[11] M. Ring, S. Wunderlich, D. Scheuring, D. Landes, and intrusion detection data,” in Proceedings of the South African
A. Hotho, “A survey of network-based intrusion detection data Institute for Computer Scientists and Information Technologists
sets,” Computers & Security, vol. 86, no. 6, pp. 147–167, 2019. Conference, pp. 218–224, New York, NY, United States, 2013.
[12] A. Bakshi and Sunanda, A comparative analysis of different [27] Y. Zhang, X. Chen, D. Guo, M. Song, Y. Teng, and X. Wang,
intrusion detection techniques in cloud computing, Springer, “PCCN: parallel cross convolutional neural network for abnor-
Singapore, 2019. mal network traffic flows detection in multi-class imbalanced
[13] S. G. Kene and D. P. Theng, “A review on intrusion detection network traffic flows,” IEEE Access, vol. 7, no. 9, pp. 119904–
techniques for cloud computing and security challenges,” in 119916, 2019.
2nd international conference on electronics and communica- [28] A. Pektaş and T. Acarman, “A deep learning method to detect
tion systems, pp. 227–232, Coimbatore, India, 2015. network intrusion through flow-based features,” International
[14] N. Keegan, S. Y. Ji, A. Chaudhary, C. Concolato, B. Yu, and Journal of Network Management, vol. 29, no. 3, pp. 2019–2026,
D. H. Jeong, “A survey of cloud-based network intrusion 2019.
detection analysis,” Human-centric Computing and Informa-
tion Sciences, vol. 6, no. 1, pp. 1–16, 2016.
[15] A. Aburomman and M. B. I. Reaz, “A survey of intrusion
detection systems based on ensemble and hybrid classifiers,”
Computers & Security, vol. 65, no. 4, pp. 135–152, 2017.
[16] A. Nisioti, A. Mylonas, P. D. Yoo, and V. Katos, “From intru-
sion detection to attacker attribution: a comprehensive survey
of unsupervised methods,” IEEE Communication Surveys and
Tutorials, vol. 20, no. 4, pp. 3369–3388, 2018.
[17] L. N. Tidjon, M. Frappier, and A. Mammar, “Intrusion detec-
tion systems: a cross-domain overview,” IEEE Communication
Surveys and Tutorials, vol. 21, no. 4, pp. 3639–3681, 2019.
[18] C. Liu, J. Wang, J. Xu, J. Wang, C. Liu, and Y. Wang, “Abnor-
mal data flow detection in the Internet of things,” in 4th Inter-
national Conference on Electronics and Communication
Engineering, Xi'an, China, 2021.
[19] M. Alauthman, N. Aslam, M. Al-Kasassbeh, S. Khan,
A. Al-Qerem, and K. K. R. Choo, “An efficient reinforce-
ment learning-based Botnet detection approach,” Journal
of Network and Computer Applications, vol. 150, no. 11,
article 102479, 2020.
[20] W.-H. Lin, H.-C. Lin, P. Wang, B.-H. Wu, and J.-Y. Tsai,
“Using convolutional neural networks to network intrusion
detection for cyber threats,” international conference on
applied system invention, 2018, pp. 1107–1110, Chiba, Japan,
2018.
[21] R. Blanco, P. Malagón, J. J. Cilla, and J. M. Moya, “Multiclass
network attack classifier using CNN tuned with genetic algo-
rithms,” in 28th international symposium on power and timing
modeling, optimization and simulation, pp. 177–182, Platja
d'Aro, Spain, 2018.
[22] A. H. Mirza and S. Cosan, “Computer network intrusion
detection using sequential LSTM neural networks autoenco-
ders,” 26th signal processing and communications applications
conference, 2018, pp. 1–4, Izmir, Turkey, 2018.
[23] F. Farahnakian and J. Heikkonen, “A deep auto-encoder based
approach for intrusion detection system,” in 20th international
conference on advanced communication technology, pp. 178–
183, Chuncheon, Korea (South), 2018.
[24] M. Al-Qatf, Y. Lasheng, M. Al-Habib, and K. Al-Sabahi, “Deep
learning approach combining sparse autoencoder with SVM
for network intrusion detection,” IEEE Access, vol. 6, no. 5,
pp. 52843–52856, 2018.
[25] A. L. H. Muna, N. Moustafa, and E. Sitnikova, “Identification
of malicious activities in industrial Internet of things based
Copyright © 2022 Junjie Cen and Yongbo Li. This work is licensed under
http://creativecommons.org/licenses/by/4.0/(the “License”). Notwithstanding
the ProQuest Terms and Conditions, you may use this content in accordance
with the terms of the License.