Comparison of Single and Ensemble Intrusion Detection Techniques Using Multiple Datasets
Comparison of Single and Ensemble Intrusion Detection Techniques Using Multiple Datasets
2752
Hassan Adegbola Afolabi et al., International Journal of Advanced Trends in Computer Science and Engineering, 10(4), July – August 2021, 2752 – 2761
and decreased efficiency. As a result of this, in order network abnormality. Additionally, the authors
to improve IDS detection accuracy, selected features proposed that this issue could be avoided by the
from a dataset should be extracted prior to using any combination of real-world traffic and simulated
detection approach. A preprocessing technique dataset. Later, an approach using random forest for
known as feature selection has been shown to be a misuse-based, anomaly-based and hybrid IDSs was
suitable solution for an IDS [6,7]. It discovers highly presented by authors in [10]. Numerous machine
important features and removes unnecessary ones. learning techniques with improved accuracy have
Motivated by the discussion above, our study will been developed over the past few years, a hybrid
focus on the performance of various machine approach suggested in [11] which combines K-means
learning techniques used in detection classification clustering and the radial basis function (RBF) kernel
systems when applied to four publicly available of a support vector machine (SVM) is an example of
recent datasets. such evolution. In addition to these advancements,
The contributions of this work are as follows: various performance comparisons of these intrusion
We present an overview of intrusion detection systems have been conducted. Belavagi et
detection systems that employ machine al. [12] used the NSL-KDD dataset to evaluate
learning techniques. Logistic Regression, Gaussian Naive Bayes, Support
A feature extraction technique known as Vector Machine, and Random Forest techniques.
Information gain to extract the best feature According to the author, Random Forest Classifier
was employed to manage large amounts of outperforms the other three algorithms. See table I
irrelevant features in the datasets. below.
Five algorithms were evaluated, the majority Table 1
of which fall under the category of ALGORITHMS PRECISION ACCURACY
individual and ensemble classifiers. (%) (%)
We suggested a novel approach for intrusion Gaussian Naïve 79 79
Bayes
detection that combines the benefits of
Logistic 83 84
feature selection, single and ensemble Regression
classifiers. Random Forest 76 75
We studied the performance of our approach
Support Vector 99 99
and each analyzed classifier using four real
Machine
traffic datasets.A Comprehensive
comparison was done.
Additionally, Almseidin et al. [13] studied Random
The rest of the paper is organized as follows: In
Forest, Random Tree, Bayes Network, Naïve Bayes,
Section II, we focus on some of the major related
Decision Table, MLP and J48 machine learning
works in the area of intrusion detection. Section III
describes the experimentation procedure, tools and algorithms in 2017. However, on the KDD dataset,
methodology used in different steps of the evaluation. decision tree has the lowest false negative value
Our ensemble model is described in Section IV and (0.002), but random forest outperforms in terms of
accuracy. See table II below.
Section V discusses the results of the experiments.
The conclusion and future work are presented in
Table 2
sections VI and VII respectively.
ALGORITHMS PRECISION ACCURACY
(%) (%)
2. RELATED RESEARCH WORK Bayes Network 99.2 90.7
Decision Table 94.4 92.4
Intruders update themselves and the tools they use to J48 98.9 93.1
develop new cyber-attacks on a daily basis. Due to MLP 97.8 91.9
this, Intrusion detection techniques are being Naïve Bayes 98.8 91.2
designed at a rapid pace to ensure that network Random Forest 99.1 93.7
systems are effectively secured against newly Random Tree 99.2 90.5
developed malware. Numerous researches have been
conducted for this reason, and new ones are Likewise, Zaman et al. [14] conducted experiments to
conducted daily to improve the efficacy of IDS compare the precision, accuracy, and recall of Fuzzy
systems. Research findings in a study conducted in C-Means, Radial Basis Function, k-Nearest
[8] concludes that datasets representing exact Neighbors, Support Vector Machine, k-Means, Nave
network systems are now becoming more important Bayes, and an ensemble technique combining all six
to evaluate intrusion detection algorithms. algorithms. Kyoto+ dataset was used to evaluate
As a result, Mahoney et al. [9] studied and discovered these algorithms, and it was determined that Radial
that the DARPA/MIT Lincoln laboratory evaluation Basis Function outperformed the others. See the table
dataset results in an overly optimistic detection of below.
2753
Hassan Adegbola Afolabi et al., International Journal of Advanced Trends in Computer Science and Engineering, 10(4), July – August 2021, 2752 – 2761
2754
Hassan Adegbola Afolabi et al., International Journal of Advanced Trends in Computer Science and Engineering, 10(4), July – August 2021, 2752 – 2761
2755
Hassan Adegbola Afolabi et al., International Journal of Advanced Trends in Computer Science and Engineering, 10(4), July – August 2021, 2752 – 2761
Information Gain Values Of Attributes For CICIDS2017 Dataset 3.3 Evaluation of Algorithms
0 .6
For evaluation purpose, this project has considered
0 .5
LR, SGD, LGBM, XGBOOST, and DNN algorithms.
INFORMATION GAIN 0 .4
0 .3
A. Logistic Regression
0 .2
Logistic Regression is a type of supervised machine
0 .1
learning method that is used to classify data. It can be
0
used with categorical dependent variables. This
th
s
s/ s
rt
ax
ax
n
td
a rd
n
i ze
ce
i ze
ts
ax
a rd
s
et
y te
yte
ea
ea
Ma
io
Po
ke
ng
hS
hM
TM
M
ia n
ck
tS
te
tS
ra t
kw
rw
M
hM
dB
algorithm has gained importance in recent years and
dB
ac
Le
AT
Pa
By
gt
en
g th
ion
ar
ke
IA
gt
ac
Du
th
Fo
dP
t
en
Bw
Fw
gt
V
dI
ac
w
gm
at
en
e
d
en
_B
w
s_
ng
ck
th
Bw
en
F lo
Fw
L
st in
eP
Fw
F lo
tL
tL
ow
yte
Le
Se
t es
F lo
ow
Pa
et
ng
tL
ke
ag
of
of
ke
ck
De
By
et
b fl
B
wd
Le
bfl
ax
ke
er
ac
in_
Pa
th
th
ac
ck
Su
its application has grown tremendously. The
in_
et
eB
M
Su
ac
Av
dP
ng
ng
Pa
dP
W
ck
dP
_W
ag
Le
Le
it_
Bw
Pa
Fw
Bw
er
In it
ta l
ta l
In
Av
To
To
ATTRIBUTES objective of the logistic regression algorithm is to
Figure1: CIC2017 attributes assign data to their appropriate classes based on their
correlation.For a mathematical expression of logistic
Information Gain Attribute Values for NSL-KDD Dataset regression, let us look at a simple linear regression
1
0.9
equation below:
INFORMATION GAIN
0.8
0.7
0.6 = + ∗ (2)
0.5
0.4 apply sigmoid function to the above equation will
0.3
0.2
give:
0.1
0
= (3)
s t t t t s s e t t
te ice Fl
ag te te un un un un te te te te te er _in Rat un un
te
By rv Ra Ra Co Co By Ra Ra Ra Ra rn Ra
c_ se v_ v_ Co Co t_ r_ r_ r_ t_ ea
ed st
_ Co C o or _
Sr Sr Sr v_ v_ v_ ro ro ro or gg v_ t_
ff_ e_ _ Sr _Sr Sr Ds er er er _P m
_L Lo _Ho Sr os er
r
i _ S S S c f H S
D
Sa
m
st_
Di
ff
_S
e
am _Ho
st
s t
Ho
st_
t_
Sr
v_
m
e_
Sr Nu
rv
_D
if
D
st_ Srv
_
Logistic regression formula can be derived by
Ho st D st_ os Sa _S
st_ Ho D _H t_ st
D
DSt_ D st
t_
H os
Ds
t_ Ho substituting eq. 2 in eq. 3 to give.
Ds
ATTRIBUTES
0.7
0.6
0.1
used to calculate the model parameters that best fit
0 the expected and actual outputs. They are a variant of
rt rt ts n /s ts/ s n n ax ax Len Pkt ts/ s yt
s g n n in ean Avg
Sr
c_
Po _ Po _By atio kts
t r P
Ds Win _Du wd_ ow_ IAT d le _IA
ea ea
P k _M _M T _M le_M er_
Id ead _Bw low
d_ _ By d_B ze_ _M _ M
w Si
Av
le
ea
n
ea _M
IA
T
n_
M ize_
S
gradient descent techniques that’s solves the issues of
d _ lo w B Fl _ I w n _B kt_ Id t_Le w_ t_Le eg_
it _ Bw F Flo
w Flo
B wd_ H
TOTL
e F
Su
bflo
w P
Pk Flo
d _ Pk d _S
Bw
computational time. In SGD, the gradient of a
In Bw
∇ ( , , )(5)
Information Gain Values for Attributes of IoT Traffic 2018
2
= ℎ , = given data instances,
IN FO R M AT IO N G A IN
1.8
1.6
1.4
1.2
1 = , ∇ =true gradient.
0.8
0.6
0.4
0.2 C. Light Gradient Boosting Machine
0
LGBM is a high-performance gradient boosting
E
0 .1 1 _W D E
H H H H_ G N I T
0 .0 0 .0 1 D E
_ jit 3 _W DE
. 0 1 GH T
_W HT
H_ _ W T
HT
_W T
HT
N
N
I_D L3 _W HT
T
_W HT
I_D .0 1_ IG HT
W T
T
H H M AG EAN
H
I_D _ L1 IG H
0 .1 E IG H
UD
H H _L 0. E IG H
L 1 I GH
. 01 G H
EA
H H MA E A
U
_M E IG
H H jit_L I TU
G
EI G
EI G
0 .1 IG
H_ _ M
3 _ N IT
H H H_ L0 N IT
I
1_ _ M
H _ W EI
L0 W E I
M
IR _ W E
E
M IR _ L W E
N
W
_ L 01 _
G
G
_
_
_
MA
A
L5
L5
L3
.
L0
IR _
IR _
H_
L
5_
0
_L
IR
H
H_
IR_
_
_L
I_D
I_D
H
_L
M
M
ATTRIBUTES leaf wise based on the best fit. Thus, when growing
on the same leaf in LGBM, the leaf-wise approach
Figure 4: IoTID18 attributes
can reduce loss significantly more than other existing
boosting techniques.A diagrammatic explanation is
given in the figure below.
2756
Hassan Adegbola Afolabi et al., International Journal of Advanced Trends in Computer Science and Engineering, 10(4), July – August 2021, 2752 – 2761
= (7)
B. Precision
It is the ratio of accurate positive results to the
number of predicted positive results by the algorithm.
= (8)
C. Recall
It is calculated by dividing the number of accurate Figure 6:Framework of the Proposed Stacking Ensemble
positive results by the total number of relevant
samples. An advantage of using this approach is that meta-
= (9) classifier in the second stage can rectify the
shortcomings of any or all of the base classifiers in
the first stage. Since the objective is to obtain
D. F-score significantly better outcomes, our ensemble
2757
Hassan Adegbola Afolabi et al., International Journal of Advanced Trends in Computer Science and Engineering, 10(4), July – August 2021, 2752 – 2761
2758
Hassan Adegbola Afolabi et al., International Journal of Advanced Trends in Computer Science and Engineering, 10(4), July – August 2021, 2752 – 2761
datasets, we chose to combine them with SGD and observed that LGBM and XGBoost had the best
DNN as base classifiers to construct an ensemble performance having 99.5% and 99.8% accuracy,
stacking classifier and Logistic regression as meta 98.5% and 99.5% precision, 97.7% and 99.0% recall,
classifiers. In order to evaluate the performance of 98.0% and 99.2% F-score respectively. In contrast,
our proposed stacking ensemble method, we compare SGD performed relatively poor in terms of accuracy
all algorithms used in this work in terms of accuracy and precision having 89.1% and 55.8% respectively.
and F-score metrics as figures 8(a)-(b) below: DNN had the least performance in terms of recall
with 49.1% and LR yielded the least F-score of
53.3%.
Table 5
Table 4
2759
Hassan Adegbola Afolabi et al., International Journal of Advanced Trends in Computer Science and Engineering, 10(4), July – August 2021, 2752 – 2761
2760
Hassan Adegbola Afolabi et al., International Journal of Advanced Trends in Computer Science and Engineering, 10(4), July – August 2021, 2752 – 2761
[12]. M. C. Belavagi and B. Muniyal, "Performance [22]. M. Tavallaee, E. Bagheri, W. Lu, A.A.
Evaluation of Supervised Machine Learning Ghorbani, A detailed analysis of the KDD
Algorithms for Intrusion Detection," Procedia CUP 99 data set, in: IEEE Symp. Comput.
Computer Science, vol. 89, pp. 117-123, 2016. Intell. Secur. Def. Appl. CISDA, 2009, p.
[13]. M.Almseidin, M. Alzubi, S. Kovacs and M. 2009,
Alkasassbeh, "Evaluation of Machine https://doi.org/10.1109/CISDA.2009.5356528
Learning Algorithms for Intrusion Detection [23]. S. Revathi and A. Malathi, "A Detailed
System," in 15th International Symposium on Analysis on NSL-KDD Dataset Using Various
Intelligent Systems and Informatics, 2017. Machine Learning Techniques for Intrusion
[14]. M. Zaman and C. H. Lung, "Evaluation of Detection," International Journal of
Machine Learning Techniques for Network Engineering Research & Technology, vol. 2,
Intrusion Detection," in IEEE/IFIP Network no. 12, 2013.
Operations and Management Symposium, [24]. Y. Meidan, M. Bohadana, Y. Mathov, Y.
2018. Mirsky, D. Breitenbacher, A. Shabtai, and Y.
[15]. S. Aljawarneh, M. Aldwairi and M. B. Elovici "N-BaIoT: Network-based Detection
Yassein, "Anomaly-based intrusion detection of IoT Botnet Attacks Using Deep
system through feature selection analysis and Autoencoders", IEEE Pervasive Computing,
building hybrid efficient model," Journal of Special Issue - Securing the IoT (July/Sep
Computational Science, vol. 25, pp. 152-160, 2018).
2018. [25]. Y. Mirsky, T. Doitshman, Y. Elovici& A.
[16]. M.A. Ferrag, L. Maglaras, S. Moschoyiannis, Shabtai 2018, "Kitsune: An Ensemble of
H. Janicke, Deep learning for cyber security Autoencoders for Online Network Intrusion
intrusion detection: approaches, datasets, and Detection", in Network and Distributed
comparative study, J. Inf. Secur. Appl. (2020) System Security (NDSS) Symposium, San
50, https://doi.org/10.1016/j.jisa.2019.102419. Diego, CA, USA.
[17]. H.A. Afolabi, A.A. Aburas, “An Evaluation of [26]. Ullah I., Mahmoud Q.H. (2020) A Scheme for
Machine Learning Classifiers for Prediction of Generating a Dataset for Anomalous Activity
Attacks to Secure Green IoT Detection in IoT Networks. In: Goutte C., Zhu
Infrastructure”International Journal of X. (eds) Advances in Artificial Intelligence.
Emerging Trends in Engineering Research, Canadian AI 2020. Lecture Notes in Computer
9(5), May 2021,549–557, Science, vol 12109. Springer, Cham.
https://doi.org/10.30534/ijeter/2021/03952021 https://doi.org/10.1007/978-3-030-47358-
. 7_52
[18]. H.A. Afolabi, A.A. Aburas, “Proposed Back [27]. Hyunjae Kang, Dong Hyun Ahn, Gyung Min
Propagation Deep Neural Network for Lee, Jeong Do Yoo, Kyung Ho Park, Huy
Intrusion Detection in Internet of Things Fog Kang Kim, September 27, 2019, "IoT network
Computing” International Journal of intrusion dataset", IEEE Dataport, doi:
Emerging Trends in Engineering Research, https://dx.doi.org/10.21227/q70p-q449.
9(4), April 2021, 464 – [28]. Sheena, Krishan Kumar, Gulshan Kumar:
469,https://doi.org/10.30534/ijeter/2021/2394 Analysis of Feature selection Techniques: A
2021. Data Mining Approach, International Journal
[19]. S. R. Khonde, V. Ulagamuthalvi, “Hybrid of Computer Applictions, ICAET 2016,
Framework for Intrusion Detection System IJCA2016 (1):17–21.
using Ensemble Approach” International [29]. Zahra Karimi, Mohammad Mansour and Ali
Journal of Advanced Trends in Computer Harpunabadi: Feature Ranking in Intrusion
Science and Engineering, 9(4), July – August Detection Dataset using Combination of
2020, 4881 – Filtering Methods, IJCA, Vol 78, No 4,
4890,https://doi.org/10.30534/ijatcse/2020/99 September 2013.
942020 [30]. Ruder, S. (2016). An overview of gradient
[20]. Intrusion Detection Evaluation Dataset descent optimization algorithms. arXiv
(CICIDS2017),"[Online]. Available: preprint arXiv:1609.04747.
https://www.unb.ca/cic/datasets/ids- [31]. Chen, T., &Guestrin, C. (2016, August).
2017.html. [Accessed 08 04 2019]. Xgboost: A scalable tree boosting system. In
[21]. NSL-KDD dataset," [Online]. Available: Proceedings of the 22nd acmsigkdd
https://www.unb.ca/cic/datasets/nsl.html. international conference on knowledge
[Accessed 08 04 2019]. discovery and data mining (pp. 785-794)
2761