Prediction_of_DDoS_Flooding_Attack_using_Machine_Learning_Models

Uploaded by

rhgoudar.vtu

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

14 views

Prediction_of_DDoS_Flooding_Attack_using_Machine_Learning_Models

Uploaded by

rhgoudar.vtu

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

Prediction of DDoS Flooding Attack using Machine

Learning Models
S L Deshpande Geeta S Hukkeri
Department of Computer Science Department of Computer Science
2022 Third International Conference on Smart Technologies in Computing, Electrical and Electronics (ICSTCEE) | 978-1-6654-5664-7/22/$31.00 ©2022 IEEE | DOI: 10.1109/ICSTCEE56972.2022.10100083

Pooja S Patil
Department of Computer Network Engineering Engineering
Engineering Visvesvaraya Technological University Visvesvaraya Technological University
Visvesvaraya Technological University Belagavi, India Belagavi, India
Belagavi, India 0000-0571-9134-6105 0000-0001-9511-8578
0000-0001-9365-6225
Poonam Siddarkar
R H Goudar Department of Computer Network
Department of Computer Science Engineering
Engineering Visvesvaraya Technological University
Visvesvaraya Technological University Belagavi, India
Belagavi, India 0000-0002-7848-5402
0000-0002-4590-7744

Abstract— Nowadays multifarious types of Distributed the increasing size of the network and users in the network, it
Denial of Services attacks occur owing to the rapid growth in is difficult to detect the attackers from legal users. DDoS
technology and also potentially cause harm in Software Defined flooding attacks type are ICMP flood, SYN flood or TCP
Network architecture. As a consequence, it is found one among synchronous flood, and UDP flood. There are numerous tools
the crucial and commonly occurring cyber-attack. There are
to stumble on DDoS attacks. The traditional methods used for
many traditional and advanced methods for detecting these
attacks. This paper intends to build a Machine Learning based predicting the DDoS attacks are found to be less accurate and
model for predicting the DDoS Flooding attacks. The DDoS take much time compared to the Machine Learning (ML)
flooding attacks to be anticipated are involved with numerous based models. ML based techniques provide better results for
types. The ML models used to classify these attacks are namely, DDoS attack detection and Prevention. DDoS attacks
Logistic Regression, K-nearest neighbour, Multi-Layer produces large volume of traffic flows or series of TCP SYN
Perceptron, and, Decision Tree classifiers. The implementation requests, these flooding requests cause attacks at SDN
is been done with a jupyter notebook with required python switches[7]. In order to mitigate and guard SDN architecture
packages installed. Among these four classifiers, KNN and from DDoS flooding attacks practical and lightweight
Decision Tree Classifiers have shown almost similar and best
techniques are proposed. Software Defined Network (SDN)
accuracy of 99.98 percent in TCP and ICMP flooding attack
prediction. The Decision Tree Classifier has shown the best is a centralized controller and hence security challenges
accuracy of 77.23 percent compared to others in UDP flooding increases [2]. SDN architecture provides a centralized
attack prediction. network for data communication and at the same time
targeted by cyber attackers due to their characteristics [9].
Keywords— Cyber Attack, DDoS, Flooding attack, Machine The common attack in SDN architecture is DDoS flooding
Learning, Software–Defined network attacks [3].
I. INTRODUCTION II. LITERATURE SURVEY
Nowadays the world is more digitally connected using In paper [1], authors have discussed about TCP SYN flood
various types of networks than before. This becomes attack. They've carried out some strategies with Data-Mining
advantageous to the criminals to attack the online systems, and ML algorithms in addition to a aggregate of those
networks, and infrastructure. This led to the rapid increase of algorithms for CAIDA dataset to ascertain TCP synchronous
Cyber threats [10]. A network is a collection of servers, flood attack. Normal traffic component is taken from the data
computers, mainframes and other electronic devices that are which is collected on the SSE network. In paper [2], authors
connected for data communication. There can be both have described the DDoS attacks in SDN architecture. SDN
authorized and unauthorized users also called hackers in a controller can be made unavailable by the attacker by
network. A hacker can adopt various methods to exploit or generating a chain of TCP SYN requests and letting the
access others data illegally. Those methods can be classified controller remain in vain processes. The methods used for
into Active and Passive attacks [10]. In passive attacks, a mitigation are SYN cookie technique and Pre-generated
hacker just analysis the data and does not modify the data or cookie. Among different SYN cookie methods they have
resources. In active, attacks a hacker can modify the data and selected the TCP-Reset method in defensive mechanism
also block the user from accessing the resources. Distributed against SYN flood attacks.
Denial of Service (DDoS) attack is the maximum typically In paper [3], authors have described ML-based
determined cyber-attack which is an energetic attack and a detection techniques and flooding attacks class in SDN
subfield of the more well-known denial-of-Service(DoS) environment. They have specified machine learning
attack. DDoS attack is a type of cyber-attack where the algorithms and classifiers to identify the threat. Mininet is
attacker prevents the normal user from accessing the used to emulate SDN model.
resources by flooding the server with internet traffic. Due to

978-1-6654-5664-7/22/$31.00 ©2022 IEEE

Authorized licensed use limited to: Visvesvaraya Technological University Belagavi. Downloaded on December 09,2023 at 06:34:40 UTC from IEEE Xplore. Restrictions apply.
In paper [4], authors represent different supervised
algorithms so that you can analyse and mitigate the DDoS
flood attacks. They have described single and hybrid ML
approaches to defeat DDoS attacks. In TCP SYN flood attack
the attacker establishes connection with the server using the
TCP three-way handshake. ICMP flooding attack
additionally called ping attack, attacks the victims server by
sending a big range of echo requests.
In the work [5], an anomaly detection mechanism is
put forth to identify DDoS attacks utilizing string kernels and
the Enhanced Support Vector Machine (ESVM). The model
file is created by ESVM using training data that represent Fig. 1. DDoS attack types.
typical user access behavior. As test samples for ESVM, data
gathered during normal and attack situations are employed.
A. TCP SYN-Flood
With ESVM, DDoS attacks on the application and network
layers are categorized with a classification accuracy of 99%. TCP synchronous flood attack is one among repeatedly
In Paper [6], a control technique based on sFlow occurring types of DDoS attacks. The attacker blocks the
mitigation technology is shown together with real-time three-way handshaking technique of TCP protocol which is
detection of distributed denial of service (DDoS) assaults on used to make a secure connection between the client and
the SDN. In the event that an attack is detected, sFlow server. In this method, client makes a request to a server for
analyses samples of packets obtained from network traffic communication by sending a SYN_REQ message. Once the
and develops handling rules to be forwarded to the server receives the message it replies with SYN_ACK
controller[8]. The proposed method was put into practice by message to the server. The client receives the SYN_ACK
simulating the network in Mininet, a Virtual Machine (VM), message and sends acknowledgement (ACK) message to the
and it was demonstrated that it effectively detects and server. In TCP flood attack the attacker sends large volume
mitigates DDoS attacks. of SYN packets to the targeted server, but does not sent the
In paper [11], authors have represented different final ACK packet that is needed to complete the three-way
classification and clustering ML algorithms to identify the handshake keeping the connection half opened. Therefore the
TCP-SYN flood attack. The algorithms used for classifying server does not have enough resources to allocate to normal
are Random Forest, Decision Tree, and XGBoost algorithms. users.
The accuracy ratio obtained was 0.99 after applying Cross B. ICMP-Flood
Validation method.
ICMP flood attack is also called Ping attack. Generally, the
III. CLASSIFICATION OF DDOS ATTACKS server receives the ICMP requests from the client and replies
the client with ICMP reply messages. In this attack the
DDoS attacks are classified into three extensive kinds:
attacker targets a server and sends large number of echo
• Application layer attacks: Generally the server sends the
response to the request messages sent the by the client. requests to the targeted server, the server has to use its
The commonly occurring application layer attack is an resources to process and reply each echo request sent by the
HTTP flood attack where the attacker keeps on sending attacker. Due to large number of echo requests the server
various HTTP requests to the targeted server with becomes overloaded and becomes inaccessible to the normal
different fake addresses. users.
• Protocol-based attacks: These are some classes of C. UDP-Flood
attacks where the attacker exhausts the server resources
by overloading the server with a large number of UDP flood attack is one among the DDoS attacks where the
requests. The attacker uses the three-way handshaking hacker initiates a large number of UDP packets to random
method used for connection between the client and the ports or routers on a targeted victim’s server and makes it
server. inaccessible to normal users.
• Volumetric attacks: In Volumetric based attacks the IV. ML MODELS
attacker exhausts the bandwidth completely by
bombarding a large amount of traffic into the server. The ML is a field in AI that will allow machine to predict the
attacker bombards request messages to the targeted outcomes very accurately without being programmed
server using a target’s forgery IP. Figure 1 represents the explicitly. Old data is used as the input to identify new output
types of DDoS attacks. values by ML algorithms. Different ML algorithms can be
used for different applications.
A. Machine Learning Types
ML algorithms are classified into four types as shown in
figure 2.

Authorized licensed use limited to: Visvesvaraya Technological University Belagavi. Downloaded on December 09,2023 at 06:34:40 UTC from IEEE Xplore. Restrictions apply.
5) DecisionTreeClassifier (DT): Decision tree belongs to
supervised machine learning used for both regression
and classification. It uses certain parameter to split the
data continuously.

V. PROPOSED SYSTEM
The software used is Anaconda distribution for Jupyter
Notebook. Python packages installed are numpy, sklearn,
pandas, matplotlib, pickle, tqdm. It is necessary for these
python packages to be installed to run the train.py and test.py
Fig. 2. Machine Learning Types
python files. The design principles of DDoS flooding attack
prediction model has been shown in below figure 3.
1) Supervised Machine Learning: In this type the machine
learns from data with the external supervision. Machine
is trained with the “labelled” dataset and then it predicts
the output depending on the training provided.
Supervised Machine Learning is classified as
Classification and Regression
2) Unsupervised Machine Learning: In this type of machine
learning where machine learns from the data without the
external supervision. These models are trained with the
unlabelled dataset which is not classified. The algorithm
learns and predicts the output with no supervision.
Unsupervised Learning can be classified as Clustering
and Association.
3) Semi-Supervised Machine Learning: It is a combination
of both supervised machine learning and unsupervised
machine learning. This method uses both labelled and Fig. 3. DDoS flood attack prediction model.
unlabelled datasets to learn and predict the results. It
operates mostly on unlabelled data. A. Gathering Data: In this study, name of the preferred
4) Reinforcement Learning: Reinforcement learning is a dataset is KD99. The tool used for data set is Mininet
subfield of machine learning model where an computer emulator which is used for software-defined networks. It
program or intelligent agent interacts with the is used to create a virtual network by using virtual hosts,
environment and learn to predict and act with that switches and software-defined applications in a
virtualized environment. Mininet uses switches called as
B. Classification Methods OpenFlow switches. Dataset used is been downloaded
for three protocols that is ICMP, TCP and UDP.
1) Random Forest: Random forest is a ML based technique
B. Data Preprocessing: Data preprocessing is a process or
where it is composed of large number of decision trees. operation that is done on the raw data to remove noise
Based on certain parameters DT can learn and make and transform it so that it can be used for machine
decisions. There are two nodes, one is called parent node learning model. There are two phases in data
or root node and another one is called child node or leaf preprocessing, first is the feature extraction and
nodes. transforming the data to numerical values.
2) Logistic Regression: Logistic Regression (LR) is a type C. Feature Extraction: In this process the raw data is
of supervised learning method. This algorithm uses a transformed to numerical data which can be further
given set of independent variables to learn and predict processed by our machine learning model. Machine
the categorical dependent variables. This algorithm is learning model does not yield better results when raw
used to build ML models to describe data and the data is processed directly. Here the features used are
relationship between large number of independent slightly different for all three attacks. Once feature
variables and one dependent variable. extraction is done our data preprocessing is carried out.
3) K-Nearest Neighbor (KNN): KNN is a machine learning D. Modification of data and Classification: The extracted
algorithm generally used for classification. It learns how features from previous step are transformed to numerical
its neighbour is classified and classifies the new data values or representation. These numerical features are
points using the similarities of the previously measured given to ML models as input for training and testing the
data points. K value must be measured correctly by this model. Then simultaneously each ML model can be
method in order to work efficiently. processed.
4) MultiLayerPerceptron (MLP) Classifier: A Multilayer E. Decision Making: Four models are used for decision
perceptron is a type of artificial neural network that uses making. All four models give four different results
a set of inputs to generate a set of outputs. MLP trains shown in four nodes. By using four models the accuracy
the network using backpropogation method. It is widely will be great. From these results we can choose the best
used for classification purpose. result.

Authorized licensed use limited to: Visvesvaraya Technological University Belagavi. Downloaded on December 09,2023 at 06:34:40 UTC from IEEE Xplore. Restrictions apply.
VI. RESULT AND ANALYSIS
In this section we have shown the raw data set collection
named KD99 generated dataset. The data preprocessing
where extracting important features as prior for preprocessing
is done. This raw data is been preprocessed and then the
model is trained and tested using ML models. Features like
service, src_bytes, count, srv_count are been extracted. There
are certain features extracted but will have to find some
covariance between them and hence will plot a covariance
heatmap. Next step is updating the feature selection to release
overfitting and accuracy improvement. Then the model is
been trained and tested using ML algorithms and their scores
are analysed which is further classified and the accuracy of
these algorithms is compared. In this study
classification_report module is used, which is a performance
evaluation metric to show the precision, recall, F1 score, and
support of our trained model. There are four classifiers used
Fig. 4. Covariance Heatmap of TCP_SYN attack.
and accuracy score of each model is calculated. Then by
majority voting method better model is decided. Importing
Above is the covariance heatmap for TCP flood attack.
the sklearn.metric modules to show the performance metrics According to this map the data seen is highly uncorrelated as
as below: most of it is one valued such as the duration one.
Accuracy: Accuracy score is used to measure the
performance of the model by calculating the ratio of the sum TABLE I. CLASSIFICATION REPORT FOR TCP
of True Positives (TP) and True Negatives (TN) to all
negative and positive observations.
Classification report for TCP

Accuracy Score = (TP + TN)/ (TP + FN + TN + FP) (1) Model Report

Precision Recall F1-score Suppor
F1 Score: This can be represented as the harmonic meanvalue t
of recall score and precision in mathematical terms. 0 0.00 0.00 0.00 3
1 1.00 1.00 1.00 5466
F1 score = (2*P*R) / (P+R) (2) LR Accuracy 1.00 5469
Macro avg 0.50 0.50 0.50 5469
Weighted avg 1.00 1.00 1.00 5469
Where P is precision and R is Recall 0 1.00 0.67 0.80 3
Precision (P): It is the measure used for the imbalanced 1 1.00 1.00 1.00 5466
dataset. It is the ratio of true positives to the sum of true KNN Accuracy 1.00 5469
positives and False Positives (FP). Macro avg 1.00 0.83 0.90 5469
Weighted avg 1.00 1.00 1.00 5469
0 0.00 0.00 0.00 3
Precision Score = TP / (FP + TP) (3) 1 1.00 1.00 1.00 5466
MLP Accuracy 1.00 5469
Recall (R): This is represented as the ratio of true positive and Macro avg 0.50 0.50 0.50 5469
Weighted avg 1.00 1.00 1.00 5469
sum of true and False Negatives (FN) values.
0 1.00 0.67 0.80 3
1 1.00 1.00 1.00 5466
Recall Score = TP / (FN + TP) (4) DT Accuracy 1.00 5469
Macro avg 1.00 0.83 0.90 5469
A. TCP SYN flooding attack Weighted avg 1.00 1.00 1.00 5469

In this design, the dataset is been read which is a .csv file. All Table 1. Shows the performance metrics measured like
the null values are removed from the data and all the precision, recall, f1-score, and support for all four algorithms
important features are been extracted. To find the Covariance in TCP flood attack prediction.
between the features a covariance heatmap is plotted.

Authorized licensed use limited to: Visvesvaraya Technological University Belagavi. Downloaded on December 09,2023 at 06:34:40 UTC from IEEE Xplore. Restrictions apply.
Classification report for UDP

Model
Report
Precision Recall F1-score Support

Macro avg 0.73 0.74 0.73 8011

Weighted 8011
0.75 0.74 0.74
avg
0 0.79 0.80 0.79 4857
1 0.68 0.66 0.67 3154
Accuracy 0.75 8011
MLP
Macro avg 0.73 0.73 0.73 8011
Weighted 8011
0.75 0.75 0.75
avg
0 0.83 0.79 0.81 4857
1 0.70 0.75 0.72 3154
Accuracy 0.77 8011
Fig. 5. Accuracy score of TCP_SYN flood attack. DT
Macro avg 0.76 0.77 0.76 8011
Weighted 8011
The above graph represents the accuracy of four ML models 0.78 0.77 0.77
avg
for TCP flood attack. From this graph we can analyse that
KNN and Decision Tree shows better accuracy with 99.98% Table 2. represents the comparison of performance metrics
and 99.98% respectively compared to other algorithms. measured like precision, recall, f1-score, and support for all
the four algorithms in UDP flood attack prediction.
B. UDP SYN flooding attack

Fig. 7. Accuracy score of UDP flood attack.

The above graph represents the accuracy of four ML models

for UDP flooding attack. This graph represents that Decision
Tree shows better accuracy score that is 77.23% compared to
other algorithms.
Fig. 6. Covariance Heatmap of UDP attack.
C. ICMP SYN flooding attack
Above figure shows the covariance between the features
using covariance heatmap for udp attack. According to this
map the data seen is highly uncorrelated as most of it is one
valued such as the duration one.

TABLE II. CLASSIFICATION REPORT FOR UDP

Classification report for UDP

Model
Report
Precision Recall F1-score Support

0 0.92 0.57 0.71 4857

1 0.58 0.92 0.72 3154
Accuracy 0.71 8011 Fig. 8. Covariance Heatmap of ICMP attack.
LR
Macro avg 0.75 0.75 0.71 8011
Weighted 8011 Above graph is the covariance heatmap for icmp attack to
0.79 0.71 0.71
avg
0 0.82 0.73 0.77 4857
find the covariance between the features. It is seen that datais
KNN 1 0.65 0.75 0.69 3154 highly uncorrelated as most of it is one valued such as the
Accuracy 0.74 8011 duration one.

Authorized licensed use limited to: Visvesvaraya Technological University Belagavi. Downloaded on December 09,2023 at 06:34:40 UTC from IEEE Xplore. Restrictions apply.
TABLE III. CLASSIFICATION REPORT FOR ICMP low. In this proposal four ML algorithms are used require less
training amount when compared to other methods. From the
Classification report for ICMP results we can choose the two best ML models for detecting
any type of DDoS attacks with more accuracy and less time.
Model Report The performance results show that for TCP and ICMP
Precision Recall F1-score Support flooding attacks, KNN and DecisionTree algorithms show
better accuracy with percentage of 99.98 for both models.
0 0.95 0.77 0.85 106 And for UDP DecisionTree model shows better accuracy
1 1.00 1.00 1.00 49385 with 77.23 percent.
Accuracy 1.00 49491
LR
Macro avg 0.98 0.89 0.93 49491 ACKNOWLEDGMENT
Weighted 49491
1.00 1.00 1.00 TI would like to thank everyone who has supported me
avg
0 0.99 0.98 0.99 106 throughout the completion of this work.
1 1.00 1.00 1.00 49385
KNN
Accuracy 1.00 49491 REFERENCES
Macro avg 1.00 0.99 0.99 49491
Weighted 49491
1.00 1.00 1.00 [1] S. Sumathi, R. Rajesh, “Comparative Study on TCP SYN Flood DDoS
avg
0 0.45 0.91 0.60 106 Attack Detection: A Machine Learning Algorithm Based Approach”
1 1.00 1.00 1.00 49385 WSEAS Transactions on Systems and Control, Vol. 16, pp. 584-591
Accuracy 1.00 49491 (2021).
MLP [2] Sumantra,.S. Indira Gandhi, “DDoS Flooding Attack Mitigation in
Macro avg 0.73 0.95 0.80 49491
Weighted 49491 Software Defined Networks” International Conference on System,
1.00 1.00 1.00 Computation, Automation and Networking (ICSCAN), pp. 1-5, IEEE
avg
0 0.99 1.00 1.00 106 (2020).
1 1.00 1.00 1.00 49385 [3] Abimbola O. Sangodoyin, Mobayode O. Akinsol, Prashant Pillai,and
Accuracy 1.00 49491 Vic Grout, “Detection and Classification of DdoS Flooding Attacks on
DT Software-Defined Networks” Vol. 9, pp. 122495 – 122508, IEEE
Macro avg 1.00 1.00 1.00 49491
Weighted 49491 (2021).10.1109/ACCESS.2021.3109490.
1.00 1.00 1.00
avg [4] Ahamed Aljuhani, “Machine Learning Approaches for Combating
Distributed Denial of Service Attacks in Modern Networking
Environments” IEEE Access (2021), pp. 42236- 42264,
Table 3. Shows the accuracy of the model and other 10.1109/ACCESS.2021.3062909.
performance metrics like precision, recall, f1-score, and [5] Ramamoorthi, T. Subbulakshmi, Dr. S. Mercy Shalinie, “Real Time
support. Detection and Classification of DDoS Attacks using Enhanced SVM
with String Kernels” IEEE-International Conference on Recent Trends
in Information Technology, ICRTIT, pp. 91-96, IEEE (2011).
[6] Babatunde Hafis LAWAL, Nuray AT, “Real-Time Detection and
Mitigation of Distributed Denial of Service (DDoS) Attacks in
Software Defined Networking (SDN)” 26th Signal Processing and
Communications Applications Conference (SIU), pp. 1-4, IEEE
(2018).
[7] I Gde Dharma N., M. Fiqri Muthohar, Alvin Prayuda J. D., Priagung
K., Deokjai Choi, “Time-based DDoS Detection and Mitigation for
SDN Controller” 17th Asia-Pacific Network Operations and
Management Symposium (APNOMS), Pp.550-553” IEEE (2015),
DOI: 10.1109/APNOMS.2015.7275389.
[8] Kshira Sagar Sahoo, Amaan Iqbal, Prasenjit Maiti, Bibhudatta
Fig. 9. Accuracy score of ICMP flood attack. Sahoo,”A Machine Learning Approach for Predicting DDoS Traffic in
Software Defined Networks” International Conference on Information
Technology (ICIT), pp. 199-203 IEEE (2018),
The above graph represents the accuracy of four ML models DOI: 10.1109/ICIT.2018.00049.
for ICMP flooding attack. This graph represents that KNN [9] Josy Elsa Varghese And Balachandra Muniyal, “An Efficient IDS
and Decision Tree shows better accuracy score of 99.99% and Framework for DDoS Attacks in SDN Environment” IEEE Access.
99.99% respectively when compared to other algorithms. Vol. 9, pp. 69682- 69699 (2021).
DOI: 10.1109/ACCESS.2021.3078065.
CONCLUSION [10] S.Shanmuga Priya, M.Sivaram, D.Yuvaraj, A.Jayanthiladevi,
“Machine Learning based DDOS Detection. International Conference
Machine Learning based techniques are determined to be a on Emerging Smart Computing and Informatics (ESCI), pp. 234-237
better way of predicting the DDoS attacks. We can use ML IEEE (2020). DOI: 10.1109/ESCI48226.2020.9167642.
techniques to identify cyber-attacks easily and in a faster way [11] Berkay Ozcam, H. Hakan Kilinc, Abdul Halim Zaim,”Detecting TCP
than other traditional methods. Using traditional methods Flood DDoS Attack by Anomaly Detection based on Machine
Learning Algorithms” 6th International Conference on Computer
requires more time and also a lot of manual work. The Science and Engineering (UBMK). Pp.512-516 IEEE (2021). DOI:
accuracy score calculated using traditional methods is also bit 10.1109/UBMK52708.2021.9558989.

Authorized licensed use limited to: Visvesvaraya Technological University Belagavi. Downloaded on December 09,2023 at 06:34:40 UTC from IEEE Xplore. Restrictions apply.