0% found this document useful (0 votes)
33 views

Machine Learning Based Intrusion Detection Systems Using HGWCSO and ETSVM Techniques

The document discusses machine learning techniques for intrusion detection systems. It proposes combining the Hybrid Grey Wolf optimizer Cuckoo Search Optimization (HGWCSO) feature selection algorithm with the Enhanced Transductive Support Vector Machine (ETSVM) classification algorithm. The HGWCSO selects the top eight features from a dataset of 41 features without sacrificing precision or recall. Experimental results show the proposed system has higher accuracy, precision, recall, and F-measure than current systems. It is capable of detecting intrusions with 99.984% accuracy on a popular botnet dataset in only 21.38 seconds of training time.

Uploaded by

aarthi dev
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
33 views

Machine Learning Based Intrusion Detection Systems Using HGWCSO and ETSVM Techniques

The document discusses machine learning techniques for intrusion detection systems. It proposes combining the Hybrid Grey Wolf optimizer Cuckoo Search Optimization (HGWCSO) feature selection algorithm with the Enhanced Transductive Support Vector Machine (ETSVM) classification algorithm. The HGWCSO selects the top eight features from a dataset of 41 features without sacrificing precision or recall. Experimental results show the proposed system has higher accuracy, precision, recall, and F-measure than current systems. It is capable of detecting intrusions with 99.984% accuracy on a popular botnet dataset in only 21.38 seconds of training time.

Uploaded by

aarthi dev
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

Machine Learning Based Intrusion Detection

Systems Using HGWCSO And ETSVM Techniques


A.SRIKRISHNAN DR.ARUN RAAZA S.GOPALAKRISHNAN
Department of ECE, Department of ECE, Department of ECE,
Vels Institute of Science, Technology & Vels Institute of Science, Technology & Vels Institute of Science, Technology &
Advanced Studies (VISTAS), Advanced Studies (VISTAS), Advanced Studies (VISTAS),
Chennai, India, Chennai, India, Chennai, India,
[email protected] [email protected]
[email protected]
2022 International Conference on Communication, Computing and Internet of Things (IC3IoT) | 978-1-6654-7995-0/22/$31.00 ©2022 IEEE | DOI: 10.1109/IC3IOT53935.2022.9767857

Abstract²In recent years, computer networks have grown earning, T-IDS built an RDPLM. The model's accuracy is
significantly in size and complexity, and Intrusion Detection 99.984% on the popular Botnet dataset, and training takes
Systems (IDS) have become an integral part of the system 21.38 seconds. The sequential minimum optimization and
foundation. An IDS must overcome obstacles such as a low random tree identification jobs were proven to reduce error
detection rate and a high computational load. Insufficient pruning using deep neural networks.
feature selection in IDS can have a negative impact on the
accuracy of machine learning methods, resulting in errors in [8] Prioritised notifications based on risk. The approach
the form of False Negatives (FN) and False Positives (FP), employs priority, reliability, and asset value as judgement
which must be minimised. The research presents an effective criteria to quantify the warning's risk. With snort, you may
feature selection and classification technique for intrusion improve intrusion detection by classifying the most severe
detection by combining the Hybrid Grey Wolf optimizer alerts by risk category, so only the most serious alerts are
Cuckoo Search Optimization (HGWCSO) with the Enhanced displayed to the security administrator, reducing the number
Transductive Support Vector Machine (ETSVM). The of FP. Evaluated using KDD Cup 99 Dataset and pattern
proposed strategies are capable of selecting the top eight matching. [9] tested the performance of NIDS on an
features from a total of 41 features without sacrificing OpenStack private cloud. This study's purpose is to assess
precision or recall. The experimental results reveal that the NIDS' performance and accuracy in classifying assaults. The
proposed system outperforms the current system in terms of results show the model's output is safe and exact. The NIDS's
accuracy, precision, recall, and F-measure.
real-time warning can also identify attacks across the
Keywords²HGWCSO, ETSVM, IDS, accuracy, precision, network.
recall, and F-measure. [10] Created an ARIMA model for online service
intrusion detection. To begin, the ARIMA model processes
I. INTRODUCTION the training data. Second, it predicts its future behaviour
within a confidence interval. To check for anomalies, it
In this study, the HGWCSO approach is integrated with
analyses the testing data; if any occurrence goes outside the
the ETSVM algorithm to improve intrusion detection
confidence interval's range, it notifies an administrator.
accuracy [1- 4]. The min-max normalisation method is
Experiments are run and results are based on real-world data.
employed to complete the preprocessing, which enhances the
[11] This paper proposes an intrusion detection method
attack detection accuracy. The feature selection procedure is
based on information gain, mutual correlation, and feature
then carried out with the assistance of the HGWCSO
cardinality. The variable feature subset uses a genetic
algorithm, which aids in the selection of the best features
method. The results show that information-based feature
from the KDD dataset. The best traits are then restored as a
selection can enhance detection rates, with this model
result of increased fitness self-esteem. Using the ETSVM
obtaining 87.54 percent accuracy.
classification technique, the intrusion and normal features are
correctly sorted from the dataset [5]. The execution measures
considered include precision, recall, specificity, and III. PROBLEM SPECIFICATION
accuracy. The complexity of research computation Several approaches have been proposed for developing
necessitates a little more thought. Various assaults on real- automated and intelligent IDS that can detect and eradicate
time data can be researched in the future. piracy assaults on computer networks. In many IDSs, rule-
based expert systems and statistical methods are used as
II. LITERATURE SURVEY detectors. Because rule-based experts can detect certain well-
known intrusions, detecting fresh intrusions is difficult, and a
[6] Used TLMD, C5.0, and the Naive Bayes algorithm to
signature database must be updated often and manually.
enhance detection rate and false alarm rate of adaptive
Furthermore, statistical-based IDS need the collection of
network intrusion detection. The TLMD approach also helps
adequate data in order to create a complex mathematical
manage unbalanced datasets, cope with contiguous
model, which is problematic in complex network traffic. To
characteristics, and reduce noise in the training dataset. The
diminish training time and to enhance accurate outcomes, it
detection rate, accuracy, and false alarm rate of the newly
is necessary to recognize significant network traffic features
suggested TLMD approach are compared to current methods
and to use a proficient classifier.
on the KDD Cup99 benchmark intrusion detection dataset.
The unique TLMD technique provides a low false alarm rate
and a high detection rate in the unbalanced dataset. [7] IV. METHODOLOGY
provides a novel method. With feature sets, feature selection The HGWOCS with ETSVM method is used in the
algorithms, simplified sub spacing, and randomised newly presented system to provide more accurate
metal

‹,(((

Authorized licensed use limited to: SRM Institute of Science and Technology. Downloaded on April 06,2023 at 03:43:21 UTC from IEEE Xplore. Restrictions apply.
classification output for the KDD dataset. The proposed ETSVM works well with high-dimensional data. The KDD
architecture is showed in figure 1. dataset is first normalised using min-max. The HGWCSO
algorithm selects more significant and optimal features. The
cuckoo search algorithm improves GWO dependability and
execution searching. Finally, ETSVM algorithm helps to
detect intrusion attacks more efficiently. Compared to other
algorithms, the newly proposed HGWCSO with ETSVM
method yields higher execution measurements with more
precision, recall, specificity, and accuracy.

Fig. 1. Proposed architectures

A. Collection of Datasets


The studies employ the KDD dataset, which has been
proven to be successful when compared to many other
intrusion detection approaches. Different training and testing
datasets in KDD are found to be meaningful. As a result of
an analysis has been carried out on the whole dataset rather
than selecting a part. KDD has specific focal points.
Fig. 2. Pesudo code for HGWCSO
1. Redundant records are eliminated from the dataset,
reducing the likelihood of a classifier biasing towards
records that appear frequently.
V. EXPERIMENTAL RESULT
2. A higher recognition rate since the collection The methods are evaluated using the KDD dataset. For
contains fewer/no duplicated records. evaluating the framework of interruption detection
3. Different assessment findings might be algorithms, this dataset has been the most trustworthy and
contradictory since they were modified by continuing to use robust public standard dataset. The collection includes 41
the testing and training sets. As a result, the KDD dataset is structures as well as nine distinct characteristics. The dataset
utilised for both training and testing. descends into the following four major composes, with
attacks in each.
B. KDD preprocessing with Min-Max Normalization x DOS stands for Denial-of-Service.
Min-max normalisation is used to pre-process the dataset
in the HGWCSO framework. Data normalisation is a data x R2L: illegal access from a remote workstation, such
preparation method that is used as part of data stream as password guessing.
mining. A characteristic in a dataset is normalised by scaling x U2R: illegal access to super-user (root) privileges
its esteems such that they fall within a limited, fixed range, on a local computer.
such as 0.0 to 1.0. The correlation between the actual data
values is preserved while using min-max normalisation. x Probe: observation as well as further probing, such
as port scanning.
C. HGWCSO Algorithm For Feature Selection x The detection accuracy of the KDD dataset is
GWO is a novel meta-heuristic algorithm built on compared using its eight characteristics.
three main steps: encircling, hunting, and attacking. Gray
wolves' social hierarchy and hunting tactics inspired this
TABLE I.  RESULT OF FEATURE SELECTION USING HGWCSO
game. The best mathematical model of the wolf leadership
hierarchy is alpha, followed by beta and delta. Omega aims
to be the remainder of the solutions. It enhances the
reliability of GWO and searching for execution in the current
study. This algorithm employs levy flights in the search
space to start searching and then uses the GWO refresh mode
to accelerate the particles into convergence. Meanwhile, CS's
random disposal method efficiently escapes local optima,
speeding up the search for the optimal configuration. Figure
2 shows the Pesudo code for the HGWCSO Model.

D. ETSVM-Based Classifications


ETSVM was recommended to improve classification
accuracy. Semi-supervised learning algorithm ETSVM In terms of accuracy, precision, recall, and f-measure, the
works by separating labelled and unlabeled data. The newly proposed HGWCSO with ETSVM is compared

Authorized licensed use limited to: SRM Institute of Science and Technology. Downloaded on April 06,2023 at 03:43:21 UTC from IEEE Xplore. Restrictions apply.
against Particle Swarm Optimization (PSO) with AdaBoost conventional approaches. The x-axis represents the amount
based SVM, Best Feature Selection Algorithm (BFS) with of samples, while the y-axis depicts accuracy. The
Genetic and Ant Colony Optimization (GACO), and HGWCSO algorithm is used to choose the best feature. The
Gaussian Firefly Algorithm (GFA) with Improved Relevance GWO's dependability and searching are enhanced, and the
Vector Machine (IRVM) methods. 10-fold cross validation accuracy rate is increased, thanks to the cuckoo search
was used in the experiments. The sample is divided into ten algorithm. In contrast to existing approaches, the
equal-sized subsamples in 10-fold Cross Validation. One experimental findings show that the innovative system
sample from the ten subsamples is utilised for model achieves higher accuracy.
validation, while the remaining nine are used for training.
The cross-validation procedure is performed ten times (the
TABLE III.  NUMBER OF SAMPLES VS PRECISION
folds), with each of the ten subsamples serving as the
validation data precisely once. To create a single output, Number
PSO+
BFS + HGWCS
average the ten results acquired. The entire data is used for Adaboos GFA+IR
of Hybrid O+ETSV
t based VM
both training and testing, and each observation is used for samples
SVM
GACO M
testing once. Confusion matrixes are used to classify the 50 76.58 81.36 89.6 93.6
accuracy, precision, etc. The confusion matrix is formulated 100 79.65 82.35 91.2 94.65
by the bellow image (Figure 3). 150 83.65 86.35 92.65 95.69
200 85.69 88.31 95.6 97.65
250 89.65 91.25 96.5 98.6
300 91.26 94.35 97.22 99.12

Fig. 3. Confusion matrix

A. Evaluation of Accuracy value


Table 2 and Figure 4 shows the accuracy comparison for
the KDD cup incursion dataset. The number of samples is Fig. 5. Number of samples Vs Precision
represented on the x - axis, while the y-axis depicts the C. Evaluation of Recall Value
accuracy. ETSVM is utilised to detect intrusion in this
method. This method makes advantage of a growing In Table 4 & Figure 6, the newly developed HGWCSO
disparity between labelled and unlabeled data. It improves with ETSVM scheme is compared to the previous
the accuracy of intrusion detection. approaches in terms of recall. The number of samples is
represented on the x - axis, whereas the y-axis depicts recall.
The HGWCSO with ETSVM method, which was recently
TABLE II.  NUMBER OF SAMPLES VS ACCURACY released, has a greater recall and hence better classification
PSO+ results.
Number BFS + HGWCS
Adaboos GFA+IR
of Hybrid O+ETSV
t based VM
samples
SVM
GACO M TABLE IV.  NUMBER OF SAMPLES VS RECALL
50 86.6 93.1 94.2 95.95 PSO+
100 87.5 94.6 95.8 96.86 Number BFS + HGWCS
Adaboos GFA+IR
150 89.45 93.0 96.6 96.95 of Hybrid O+ETSV
t based VM
200 87.32 94.12 95.8 98.12 samples GACO M
SVM
250 89.25 93.5 97.7 98.5 50 59.65 66.95 72.3 81.36
300 91.89 95.58 98.2 99.12
100 61.25 68.56 75.3 83.65
150 63.58 69.89 76.58 84.96
200 64.52 71.58 79.36 86.35
250 71.25 73.65 78.65 89.65
300 75.36 75.69 82.36 91.23

Fig. 4. Number of samples Vs Accuracy

B. Estimation Of Precision Ratio


In Table 3 & Figure 5, the precision of the innovative
HGWCSO with ETSVM method is compared to that of Fig. 6. Number of samples Vs Recall

Authorized licensed use limited to: SRM Institute of Science and Technology. Downloaded on April 06,2023 at 03:43:21 UTC from IEEE Xplore. Restrictions apply.
D. Estimated Ratio of F-Measure [3] '3 *DLNZDG HWDO ³,QWUXVLRQ 'HWHFWLRQ 6\VWHP 8VLQJ %DJJLQJ
(QVHPEOH0HWKRGRI0DFKLQH/HDUQLQJ´,QWUXVLRQ'HWHFWLRQ6\VWHP
In Table 5 & Figure 7, the HGWCSO with ETSVM Using Bagging Ensemble Method of Machine Learning,
scheme is compared to the other methods in terms of f- ieeexplore.ieee.org, https://ieeexplore.ieee.org/document/7155853.
measure. The number of samples is represented on the x - Accessed 8 Apr. 2022.
axis, while the y-axis depicts the f-measure. In comparison to [4] 0RKDQ 3LOOD 9DLVKQR HW DO ³6HQVRUV _ )UHH )XOO-Text | Leveraging
other approaches, the newly developed HGWCSO with Computational Intelligence Techniques for Defensive Deception: A
5HYLHZ 5HFHQW $GYDQFHV 2SHQ 3UREOHPV DQG )XWXUH 'LUHFWLRQV´
ETSVM achieves a superior f-measure, as seen in the graph. MDPI, doi.org, 11 Mar. 2022, https://doi.org/10.3390/s22062194.
[5] +HOPHU*:RQJ-6.+RQDYDU 9  0LOOHU /µ$XWRPDWHG
TABLE V.  NUMBER OF SAMPLES VS F-MEASURE 'LVFRYHU\ RI &RQFLVH 3UHGLFWLYH 5XOHV IRU ,QWUXVLRQ 'HWHFWLRQ¶
Journal of Systems and Software, vol. 60, no. 3, pp. 165-175.
Number
PSO+
BFS + HGWCS [6] <XDQ < +XR /  +RJUHIH '  µ7ZR /D\HUV 0ulti-Class
of
Adaboos
Hybrid
GFA+IR
O+ETSV 'HWHFWLRQ 0HWKRG IRU 1HWZRUN ,QWUXVLRQ 'HWHFWLRQ 6\VWHP¶ ,(((
t based VM Symposium on Computers and Communications (ISCC), pp. 767-772
samples GACO M
SVM
[7] .ROL06 &KDYDQ0.µ$Q$GYDQFHG0HWKRGIRU'HWHFWLRQ
50 72.13 81.346 89.06 95.95 RI %RWQHW 7UDIILF XVLQJ ,QWUXVLRQ 'HWHFWLRQ 6\VWHP¶ ,((( IEEE
100 75.23 83.065 91.02 96.86 International Conference on Inventive Communication and
150 76.528 84.5696 92.4565 96.95 Computational Technologies (ICICCT), pp. 481-485.
200 79.346 86.035 95.63 98.12 [8] &KDNLU (0 0RXJKLW 0  .KDPOLFKL <,  µ$Q (IILFLHQW
0HWKRG IRU (YDOXDWLQJ $OHUWV RI,QWUXVLRQ'HWHFWLRQ6\VWHPV¶,(((
250 78.65 89.65 96.50 98.5 International Conference on Wireless Technologies, Embedded and
300 82.36 91.23 97.22 99.12 Intelligent Systems (WITS), pp. 1-6.
[9] 6DQWRVR%,,GUXV056 *XQDZDQ,3µ'HVLJQLQJ1HWZRUN
Intrusion and Detection System using Signature-Based Method for
3URWHFWLQJ 2SHQ6WDFN 3ULYDWH &ORXG¶ ,((( ,Qternational Annual
Engineering Seminar (InAES), pp. 61-66
[10] Gopalakrishnan.S, Dr.Ebenezer Abishek.B, Dr. A. Vijayalakshmi, Dr.
V. Rajendran., 2021. Analysis And Diagnosis Using Deep-Learning
Algorithm On Erythemato-Squamous Disease.
doi:10.14445/22315381/IJETT-V69I3P210.
[11] Gopalakrishnan.S, Dr.Ebenezer Abishek.B, Dr. A. Vijayalakshmi, Dr.
V. Rajendran., 2021. An MS-ROI based Detection and Segmentation
of Erythemato-Squamous Disease. doi:10.14445/22315381/IJETT-
V69I8P231

Fig. 7. Number of samples Vs F-Measure

VI. CONCLUSION
To improve intrusion detection accuracy, the HGWCSO
approach is paired with the ETSVM algorithm. Finishing the
preprocessing with min-max normalisation enhances attack
detection accuracy. The HGWCSO technique is then used to
identify the best features from the KDD dataset. Increased
fitness self-esteem restores the best features. The ETSVM
classification system sorts the dataset into intrusion and
normal features. Execution measures examined include
precision, recall, specificity, and accuracy. The complexity
of research computation requires additional thought. Future
research can look into real-time attacks.

ACKNOWLEDGMENT
The authors thank to VISTAS, for supporting this
research.

REFERENCES
[1] )HQJ <DQKRQJ HW DO ³$ 1RYHO +\EULG &XFNRR 6HDUFK $OJRULWKP
with Global Harmony Search for 0±1 Knapsack Problems | Atlantis
3UHVV´ $ 1RYHO +\EULG &XFNRR 6HDUFK $OJRULWKP ZLWK *OREDO
Harmony Search for 0±1 Knapsack Problems | Atlantis Press, doi.org,
https://doi.org/10.1080/18756891.2016.1256577. Accessed 8 Apr.
2022.
[2] *DLHG,-HPLOL) .RUEDD2µ,QWUXVLRQ'HWHFWLRQ%DVHGRQ
Neuro-)X]]\ &ODVVLILFDWLRQ¶ ,((($&6 WK ,QWHUQDWLRQDO
Conference of Computer Systems and Applications (AICCSA), pp. 1-
8.

Authorized licensed use limited to: SRM Institute of Science and Technology. Downloaded on April 06,2023 at 03:43:21 UTC from IEEE Xplore. Restrictions apply.

You might also like