Volume 9, Issue 6, June – 2024 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165 https://doi.org/10.38124/ijisrt/IJISRT24JUN659
Intrusion Detection System with Ensemble Machine
Learning Approaches using Voting Classifier
Karuna G. Bagde1 (Research Scholar) Atul D. Raut2
Department of Computer Science & Engg. Department of Computer Science & Engg
Sant Gadge Baba Amravati University P. R. Pote Patil College of Engg & Mgmt
Amravati, India Amravati, India
Abstract:- Internets have become a part of our everyday Previous Work:
life due to the advancement in the electronics and signal The paper [1] presents a cloud-based intrusion detection
processing technologies during past decades. The model using random forest and feature engineering, achieving
tremendous growth of internet leads towards the network high accuracy in detecting abnormal activities in network
threats. Many times firewalls and anti-viruses fails to traffic.
manage the network because of this Intrusion Detection
System (IDS) comes to assists us. In this paper we use IDS The paper [2] proposes a prediction-level fusion model
with Ensemble methodologies utilized in machine for intrusion detection and classification using machine
learning involve the fusion of multiple classifiers to learning techniques.
improve predictive performance, while voting classifiers
combine predictions from individual models to reach The paper [3] proposes a combination of ant colony
conclusive decisions. The paper employs a voting optimization and the firefly approach for feature selection in
ensemble method combing decision tree, logistic intrusion detection using machine learning algorithms such as
regression and support vector machine classifier models. AdaBoost, gradient boost, and Bayesian network.
We test our proposed model to classify the NSL-KDD
dataset. Our ensemble methodologies of proposed The paper [4] proposes a combination of ant colony
algorithmproduce a good result. optimization and the firefly approach for feature selection in
intrusion detection using machine learning algorithms such as
Keywords:- Intrusion Detection System, Ensemble Algorithm, AdaBoost, gradient boost, and Bayesian network. Gradient
Machine Learning. boost performs better in recognizing and classifying
intrusions.
I. INTRODUCTION
The paper [5] explores the use of machine learning
Web changes our life; due to its keenness the computer algorithms for intrusion detection systems, specifically
structures are uncovered to an expanded number of dangers. focusing on dataset selection, machine algorithms, and
The inquire about and mechanical developments in are performance metrics.
advancing quickly, a supreme Cyber security remains a
challenge. The paper [6] discusses the development of an Intrusion
Detection System (IDS) that uses machine learning
The Intrusion Detection Systems (IDSs)detect attacks techniques such as Support Vector Machines, Random
against a given set of computer assets from a single desktop Forest, and K-Nearest Neighbor to automatically identify
PC to a major corporate enterprise network. The attacks are attacks on complex networks and systems.
detected by looking for a predetermined set of criteria that is
not present during normal daily use. II. DATASET
Intrusion detection systems observe and analyze NSL-KDD Dataset:
network traffic to identify anomalies in network behavior and The NSL-KDD data set is an improved version of the
potential unauthorized access. IDSs are designed to KDD’99 intrusion data set. Data were captured from an
constantly monitor the network, resulting in resource usage evaluation test bed and included large numbers of virtual
even when there are no attacks. hosts and user automata. NSL- KDD is a randomly selected
subset of KDD’99 after redundant data were removed and is
a widely used benchmark for evaluating anomaly detection
techniques. NSL-KDD dataset captures TCP, UDP, and
Internet Control Message Protocol (ICMP) traffic collected
using the tcpdump utility. It contains four types of intrusion
attacks: DoS, U2R, R2L, and Probe described in Table [1]
IJISRT24JUN659 www.ijisrt.com 2690
Volume 9, Issue 6, June – 2024 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165 https://doi.org/10.38124/ijisrt/IJISRT24JUN659
Table 1 NSL-KDD Data Set
Type Intrusion attacks
DoS back, land, neptune, pod, smurf, teardrop, mailbomb, processtable, udpstorm, apache2, worm
U2R buffer-overflow, loadmodule, perl, rootkit,sqlattack, xterm, ps
R2L fpt-write, guess-passwd, imap, multihop, phf, spy, warezmaster,xlock, xsnoop, snmpguess, snmpgetattack,
httptunnel, sendmail,named
Probe ipsweep, nmap, portsweep, satan, mscan, saint
III. EXISTING SYSTEM The proposed intrusion detection system which used
ensemble method. The method we uses the combination of
Let’s Delve into the Available Methods: best available algorithm. Ensemble learning is a powerful
technique that combines multiple machine learning models to
Support Vector Machines (SVM): create a stronger, more robust predictor.
Support Vector Machine (SVM) presents itself as a
classification algorithm designed to identify the hyper plane Proposed Algorithm 1: Intrusion Detection model using
that maximizes the margin between distinct classes within the Ensemble Method.
dataset. This technique proves effective in handling linear and Input: Dataset
non-linear data through the utilization of kernel functions, Output: Model for Intrusion Detection
such as linear, radial basis function, polynomial, or sigmoid,
which aid in the transformation of data into a higher- Take the Dataset.
dimensional space. The computational complexity of SVMs Data preprocessing.
is notable, necessitating the consideration of feature reduction Feature Selection.
methods, such as Principal Component Analysis, to enhance
operational efficiency. It is essential to adjust certain hyper Cc = Find Correlation on Data components to select high
parameters when employing SVM, including the kernel type correlation values.
(for instance, linear, rbf, poly, sigmoid) and the regularization
parameter (C value). Classify Cc using train data
Decision Trees (DT): Logistic Regression, Decision Tree and SVM
Decision Trees (DTs) are algorithmic models rooted in
tree structures, which iteratively partition the dataset Use Ensemble Voting algorithm
according to distinct features in order to formulate decision Propose the Ensemble model
criteria. These models exhibit Proficiency in addressing tasks
Test the proposed Ensemble model by using test data
related to classification as well as regression analysis.
Compute the accuracy, precision, Recall
Decision Trees have a tendency to excessively fit the training
data, thus prompting the necessity for ensemble Return the model
methodologies to alleviate this particular drawback.
Performance Analysis:
Logistic Regression (LR):
Logistic Regression (LR) represents a straightforward True Positive (TP). A true positive outcome is one where
yet efficient classification methodology. The goal is to predict the model predicts a positive outcome correctly.
the likelihood of a binary outcome by analyzing different False Positive (FP). A false positive outcome is one where
input features. LR is renowned for its interpretability and the model predicts a positive outcome incorrectly.
performance, particularly in scenarios where the association True Negative (TN). A true negative outcome is one
between predictors and the response variable is close to being where the model predicts a negative outcome correctly.
linear. False Negative (FN). A false negative outcome is one
where the model predicts a negative outcome incorrectly.
Proposed System: Accuracy. Accuracy is simply the measure of how
In this work, to improve the efficiency of intrusion correctly the model predicts a data given.
detection system an ensemble algorithm based on the decision
tree, Support vector machine and linear regression is used.
The result shows that Ensemble methods work best when the
predictors are as independent from one another as possible.
To get diverse classifiers is to train those using very different Precision.
algorithms. This increases the chance that they improve the Precision is the proportion of positives out of the total
ensemble’s accuracy. The NSL-KDD data set is used to verify number of positives.
the superiority of the algorithm.
IJISRT24JUN659 www.ijisrt.com 2691
Volume 9, Issue 6, June – 2024 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165 https://doi.org/10.38124/ijisrt/IJISRT24JUN659
Recall.
Recall is the proportion of positives that was identified
correctly
F1-Score:
F1 Score is similar to accuracy but is a better metric
because it seeks to create a balance between precision and
recall especially when there is an uneven class. F1 Score is
given by : Fig 2 Performance Comparison with Previous Works
V. CONCLUSIONS
In this study, we proposed a ensemble intrusion
detection model for detecting widely known attacks in
IV. RESULTS networks. Our model uses correlation methods to select the
best feature. Then applied ensemble classification algorithm,
i.e., Decision Tree, Logistic Regression and SVM for better
accuracy rate. Results show our model’s better performance
on NSL-KDD datasets in comparison to existing methods.
REFERANCES
[1]. Hanaa, Attou., Azidine, Guezzaz., Said, Benkirane.,
Mourade, Azrour., Yousef, Farhaoui (2023), “Cloud-
Based Intrusion Detection Approach Using Machine
Learning Techniques. Big data mining and analytics”,
doi: 10.26599/bdma.2022.9020038
[2]. Ramesh, Boraiah. (2023), “Network intrusion
Fig 1 Accuracy Result of Various Classifier detection and classification using machine learning
predictions fusion”, Indonesian Journal of Electrical
The machine used to run the above algorithm was Engineering and Computer Science, doi:
Intel® Core™ i5-5200U CPU @ 2.20GHz × 4, 7.7 10.11591/ijeecs.v31.i2.pp1147-1153
GiB,Ubuntu 20.04.6 LTS machine. [3]. Mutyalaiah, Paricherla., Mahyudin, Ritonga., Sandip,
R., Shinde., Smita, M., Chaudhari., Rahmat, Linur.,
The existing classifier algorithm Logistic Regression, Abhishek, Raghuvanshi. (2023), “Machine learning
Support Vector Machine and Decision Tree are train and techniques for accurate classification and detection of
tested with NSL_KDD dataset, The result are shown in fig 1. intrusions in computer network”, Bulletin of Electrical
The Proposed ensemble algorithm achieved the accuracy Engineering and Informatics, doi:
score 99.46 followed by Decision Tree was 99.44 then SVM 10.11591/beei.v12i4.4708
99.39 and Logistic Regression was 94.41. Thus, showing that [4]. “Machine learning techniques for accurate
our ensemble model was able to achieve the best result . classification and detection of intrusions in computer
network”, Bulletin of Electrical Engineering and
The proposed ensemble model shows the promising Informatics, doi: 10.11591/eei.v12i4.4708
results with comparison to SVM+RF, IntrudTree and PCA- [5]. Pierpaolo, Dini., Abdussalam, Elhanashi., Andrea,
FELM techniques. The Precision of the proposed model is Begni., Sergio, Saponara., Qinghe, Zheng., Kaouther,
99.64% and Recall rate is 99.1% which is quite good as Gasmi. (2023), “Overview on Intrusion Detection
compared to existing methods. Systems Design Exploiting Machine Learning for
Networking Cybersecurity”, Applied Sciences, doi:
Table 2 Comparison with Existing Work 10.3390/app13137507
Accuracy Precision Recall [6]. Ch. Sai Sampath, Dr. P. Anuradha (2023), “Intrusion
Proposed 0.9946 0.9964 0.991 Detection using Machine Learning: A Random Forest-
SVM+RF 0.675 0.636 0.426 based Approach”, International Journal For
IntrudTree 0.98 0.98 0.98 Multidisciplinary Research, doi:
PCA-FELM 0.998 0.92 10.36948/ijfmr.2023.v05i03.3408
IJISRT24JUN659 www.ijisrt.com 2692
Volume 9, Issue 6, June – 2024 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165 https://doi.org/10.38124/ijisrt/IJISRT24JUN659
[7]. D. Xuan, H. Hu, B. Wang and B. Liu , “Intrusion
Detection System Based on RF-SVM Model
Optimized with Feature Selection”, 2021 International
Conference on Communications, Computing,
Cybersecurity, and Informatics (CCCI), Beijing,
China, 2021, pp. 1-5, doi:
10.1109/CCCI52664.2021.9583206.
[8]. Sarker, I.H.; Abushark, Y.B.; Alsolami, F.; Khan, A.I.,
“IntruDTree: A Machine Learning Based Cyber
Security Intrusion Detection Model”,
Symmetry2020,12,754
https://doi.org/10.3390/sym12050754
[9]. E. Vishnu Balan, M.K. Priyan, C. Gokulnath, G. Usha
Devi, “Fuzzy Based Intrusion Detection Systems in
MANET” Procedia Computer Science, Volume
50,2015,Pages 109-114,ISSN 1877-0509,
https://doi.org/10.1016/j.procs.2015.04.071.
IJISRT24JUN659 www.ijisrt.com 2693