A Robust Intrusion Detection System Empowered by Generative Adversarial Networks

There is a very bleak outlook on cyber security due to the rapid expansion of the Internet and the ever-changing terrain of cyber-attacks. This paper explores the field of intrusion detection through network analysis, with a particular emphasis on applying machine learning (ML) and deep learning (DL) approaches. For every ML/DL technique, a thorough tutorial overview is given together with a review of pertinent research publications.

Uploaded by

International Journal of Innovative Science and Research Technology

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

28 views

A Robust Intrusion Detection System Empowered by Generative Adversarial Networks

Uploaded by

International Journal of Innovative Science and Research Technology

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

Volume 9, Issue 4, April – 2024 International Journal of Innovative Science and Research Technology

ISSN No:-2456-2165 https://doi.org/10.38124/ijisrt/IJISRT24APR1128

A Robust Intrusion Detection System Empowered by

Generative Adversarial Networks
Vijayaganth V.1; Dharshana M.G.2; Sureka P.3; Varuna Priya S.4
Assistant Professor (Sl.G)1
Department of Artificial Intelligence and Data Science, KPR Institute of Engineering and Technology1,2,3,4

Abstract:- There is a very bleak outlook on cyber I. INTRODUCTION

security due to the rapid expansion of the Internet and
the ever-changing terrain of cyber-attacks. This paper The security of computer networks and data has
explores the field of intrusion detection through network become crucial in the current digital era. Strong network
analysis, with a particular emphasis on applying intrusion detection systems (NIDS) are essential now more
machine learning (ML) and deep learning (DL) than ever because of the complexity of cyber threats and the
approaches. For every ML/DL technique, a thorough interconnectedness of our systems. By detecting illegal
tutorial overview is given together with a review of access and averting possible risks to information systems,
pertinent research publications. These studies were read, intrusion detection plays a critical part in protecting
indexed, and summarised according to their thermal or enterprises. But conventional intrusion detection techniques
temporal correlations with great care. The paper also frequently struggle to keep up with the dynamic threat
provides information on frequently used network landscape. In order to address these issues and improve
datasets in this field, which is relevant given the critical intrusion detection efficacy, we suggest a novel method
role that data plays in ML/DL techniques. It also dubbed "Network Intrusion Detection with Two-Phased
discusses the difficulties in using ML/DL for cyber Hybrid Ensemble Learning and Automatic Feature
security and provides insightful recommendations for Selection." The goal of this project is to integrate state-of-
future lines of inquiry. Interestingly, the KDD data set the-art methods from cybersecurity, data science, and
shows up as a reputable industry standard for intrusion machine learning. Our goal is to revolutionise network
detection methods. A lot of work is being done to intrusion detection by fusing autonomous feature selection
improve intrusion detection techniques, and both and the capabilities of ensemble learning into a two-phased
training and evaluating the detection model's quality detection method.
depend equally on the quality of the data. The KDD data
collection is thoroughly analysed in this research, with a A. Feature Selection
special emphasis on four different attribute classes: The security of networks and information systems has
Basic, Content, Traffic, and Host. We use the Modified become critical in today's ever-expanding digital world. The
Random Forest (MRF) technique to classify these proliferation of cyber dangers, encompassing sophisticated
properties. malware and advanced persistent threats, underscores the
need for network intrusion detection systems (NIDS) to be
Keywords:- Intrusion Detection, Feature Selection, Machine constantly evolving in order to thwart hostile actions and
Leaning. unauthorised access. NIDS effectiveness is largely
dependent on the selection of the most pertinent data
properties, or "features." In machine learning and data
analysis, feature selection is a crucial step where the main
goal is to find and keep valuable qualities while eliminating
redundant or unnecessary ones. To increase the effectiveness
and precision of the detection process in the context of
network intrusion detection, careful feature selection is
crucial.

IJISRT24APR1128 www.ijisrt.com 663

Volume 9, Issue 4, April – 2024 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165 https://doi.org/10.38124/ijisrt/IJISRT24APR1128

Fig 1: Feature Selection

II. LITERATURE REVIEW research. With the goal of supporting the Internet of Things
and machine-to-machine communications, as well as
In this research, Felix Obite [1]et al. suggest that the providing faster wireless broadband connectivity, both
substantial increase in Internet traffic validates the transition cellular and Wi-Fi technologies seek to improve
of the backbone of the telecommunications industry from a performance. These technologies can therefore be
time division multiplexing (TDM) orientation to an considered technical alternatives in a variety of use cases.
emphasis on Ethernet solutions. In a market historically The authors draw the conclusion that both technologies will
dominated by DSL and cable modems, Ethernet PON, which be significant players in the future, acting as both
combines inexpensive Ethernet and fibre infrastructures, has competitors and allies at the same time. Due to its cheaper
emerged as the leading technology. With the help of this deployment costs, Wi-Fi 6 is expected to remain the
innovative technology, which is easy to use, reasonably favoured option for interior use, while 5G is expected to
priced, and scalable, end customers can receive a vast array remain the preferred technology for wide-area coverage.
of data services across a single network. A summary of
EPON's history is given in the paper, with an emphasis on Somayye Hajiheidari [3] et al. has suggested a system
the present work being done on next-generation high-data- that lowers the power consumption of electrical appliances,
rate access networks including NG-PON2, WDM PON, and adding a new dimension to intelligent things. By integrating
OFDM PON. Furthermore, the recently finished 100G- electronic devices and linking them to the Internet, this
EPON is reviewed to illustrate the most current system enhances commonplace physical things and allows
developments in the sector. The document is to provide for communication with cyberspace and local intelligence.
network operators and interested practitioners with the The network of connected things is referred to as the
knowledge they need to plan and prioritise their actions by Internet of Things (IoT) in this notion. Nevertheless,
providing a thorough and current review. The study also because IoT items are directly connected to the Internet,
attempts to find technological answers for additional malevolent people can attack them. These assaults, referred
research. Broadband services that are capable of supporting to as internal attacks, take advantage of IoT devices'
high-speed internet transmission are required due to the rise resource limitations to compromise internal nodes and
in data traffic and the increasing number of online users who launch network attacks. Thus, it is impossible to exaggerate
spend more time online and use bandwidth-intensive apps. It the significance of Intrusion Detection Systems (IDSs) in the
is anticipated that this would help the economy grow. Internet of Things. Notwithstanding its importance, there
Therefore, in order to support these novel and real-time aren't many thorough and organised evaluations that address
broadband applications, future access networks will need to and examine the workings of IDSs in Internet of Things
have a lot of capacity and mobility. environments. This work proposes a Systematic Literature
Review (SLR) of IDSs in the IoT to fill this vacuum. The
The fifth generation (5G) of wireless broadband article offers comprehensive classifications of intrusion
access, which is presently being deployed by Mobile detection systems (IDSs) according to their methodology
Network Operators, has garnered considerable attention in (anomaly-based, signature-based, specification-based, and
recent years [2]. Surprisingly, though, not as much focus has hybrid), architecture (centralised, distributed, hybrid),
been placed on "Wi-Fi 6," the latest IEEE 802.1ax standard evaluation technique (simulation, theoretical), and attack
in the family of wireless local area networks that is intended types (denial of service, Sybil, replay, selective forwarding,
for private edge networks. The suitability of cellular and Wi- wormhole, black hole, sinkhole, jamming, false data,
Fi technologies for providing high-speed wireless Internet attack).
connectivity is reviewed by Edward J. Oughtonet al. in this

IJISRT24APR1128 www.ijisrt.com 664

The ensemble of classifiers [4], often referred to as an III. RELATED WORK

ensemble learner, has attracted a lot of interest in the field of
cybersecurity research, especially in the area of intrusion The increasing number and diversity of network threats
detection systems (IDSs). IDSs are essential for stopping means that traditional firewalls and data encryption methods
cyberattacks, and creating a better detection framework is are no longer adequate to fulfil the expectations of modern
necessary to increase their detection capabilities, particularly network security. Thus, intrusion detection systems have
when using ensemble learning. The choice of accessible been proposed to handle network threats. The current
base classifiers and combiner algorithms present two major mainstream intrusion detection methods still suffer from low
issues in ensemble creation. This work uses a systematic detection rates and a significant feature engineering
mapping analysis to provide an overview of the use of overhead, despite the assistance of machine learning. In
ensemble learners in IDSs. A total of 124 well-known order to address the issue of low detection accuracy, this
articles from the body of existing literature were gathered research suggests a deep learning model for network
and analysed for the study. These papers were then intrusion detection (DLNID). It accomplishes this by fusing
categorised according to factors such as the datasets utilised, an attention mechanism with a bidirectional long short-term
publication sites, years of publication, ensemble memory (Bi-LSTM) network. Initially, a convolutional
methodologies, and IDS techniques. Furthermore, an neural network (CNN) network is used to extract sequence
empirical research of a novel classifier ensemble approach features of data traffic; subsequently, the attention
for anomaly-based IDS dubbed stack of ensemble (SoE) is mechanism is used to reassign the weights of each channel;
reported and analysed in the work. The SoE is an ensemble and lastly, Bi-LSTM is used to learn the network of
classifier that combines three distinct ensemble learners— sequence features. Significant data imbalances are often
random forest, gradient boosting machine, and extreme present in public intrusion detection data sets. In order to
gradient boosting machine—in a homogenous way using a solve issues with data imbalance, this paper creates a
parallel architecture. The accuracy, false positive rates, area generally symmetric dataset by extending minority class
under the ROC curve metrics, and Matthews correlation sample sizes through the use of adaptive synthetic sampling,
coefficients of classification algorithms are statistically or ADASYN. Moreover, information fusion is enhanced by
analysed to assess their performance. By offering a current reducing data dimensionality through the use of a modified
comprehensive mapping analysis and a thorough empirical stacked auto encoder.
review of recent developments in ensemble learning
techniques applied to IDSs, this work closes a gap in the IV. METHODOLOGY
literature.
The suggested network intrusion detection system
As a result of recent developments in mobile (IDS) uses the MODIFIED RANDOM FOREST (MRF)
technology, the ubiquitous use of IoT-enabled gadgets in our algorithm to categorise network traffic as either malicious or
daily lives has created security challenges. Muhamad Erza legitimate. The method divides the KDD dataset's data
Amina [5] et al. have provided a solution that overcomes properties into four groups: Basic, Content, Traffic, and
these issues. The primary cause for concern is that open Host. An MRF classifier is then trained on each group. The
wireless networks, including Wi-Fi, are susceptible to monitored network's network traffic data is gathered by the
impersonation assaults. In these attacks, the adversary poses system, which then pre-processes it to extract pertinent
as an authorised participant in a communications protocol or features before feeding the features into the MRF classifiers.
system. Due to the widespread use of linked devices, Every data point generates a prediction from the classifiers,
massive amounts of high-dimensional data are generated, and the system uses the average prediction to generate the
which complicates simultaneous detections. To address this final forecast. Inspired by negative selection-based detection
issue, the paper suggests a brand-new strategy known as generation, the suggested methodology is tested on the NSL-
Deep-Feature Extraction and Selection (D-FES). Weighted KDD dataset, which is an altered version of the popular
feature selection and stacked feature extraction are KDD CUP 99 dataset. Additionally, by automatically
combined in D-FES. Reconstructing pertinent information choosing parameter values based on the training dataset
from unprocessed inputs yields meaningful representations utilised, the system becomes more flexible and adaptive.
through the application of layered auto encoding.
Subsequently, a shallow-structured machine learner-inspired A. Probability Model
modified weighted feature selection is integrated with this. In this module, we preprocess the training data for the
Experimental results on the Aegean Wi-Fi Intrusion Dataset probability model that is used to capture a user's typical
(AWID), a well-referenced Wi-Fi network benchmark mentioning behaviour. In a social network stream, we
dataset, show the efficacy of the proposed D-FES. The describe a post by its number of mentions, called ask, and
findings demonstrate an impressive 99.918% detection the set V of user names (IDs) that are referenced in it. In this
accuracy and a 0.012% false alert rate. These results case, two kinds of infinity need to be taken into account.
demonstrate that the suggested D-FES is the most precise The first kind, called ask, deals with the quantity of users
technique for identifying impersonation assaults that has that are referenced in a post. We try not to impose an
been documented in the literature. Furthermore, the reduced artificial restriction on the number of cited users, even if it is
feature set obtained from D-FES minimises computational impracticable for a user to mention hundreds of other users
cost while simultaneously lessening the bias of machine in a single post. Rather, to remove any inherent limitation,
learning models. we use a geometric distribution and integrate out the

IJISRT24APR1128 www.ijisrt.com 665

parameter. The number of users who may be named is the D. Modified Random Forest Detection Method
subject of the second kind of infinite. We use the Chinese We covered change-point detection based on MRF and
Restaurant Process (CRP), which is well-known for its DTO in the earlier parts. We have tested our approach in
application in handling infinite vocabularies, for estimating conjunction with Kleinberg's Modified Random Forest
in order to avoid limiting the amount of potential references. detection method in this module. Specifically, we have put
into practice Kleinberg's Modified Random Forest detection
approach in two states. Since a non-hierarchical structure is
anticipated in this experiment, the two-state variant was
selected. A probabilistic automata model with two states—
the Modified Random Forest state and the non-Modified
Random Forest state—serves as the foundation for the
Modified Random Forest detection technique. It is assumed
that some events, like posts arriving, occur in accordance
with a time-varying Poisson process, the rate parameter of
which is dependent on the state at any given moment.

V. ALGORITHM DETAILS

Machine Learning (ML) and Deep Learning (DL)

approaches are used, with a particular emphasis on the
Modified Random Forest (MRF) approach, to analyze the
KDD dataset.

Intrusion Detection with Modified Random Forest

 Step 1: Data Pre-processing

Fig 2: Block Diagram  Load the KDD dataset

 Pre-process the data, handle missing values, encode
B. Computing the Link-Anomaly Score categorical features, etc.
We introduce a mechanism in this module to compute
the divergence of a user's behaviour from the modelled  Step 2: Feature Engineering
normal mentioning behaviour. We compute the likelihood
using the training set T(t)u to find the anomaly score of a  Extract relevant features from the dataset
new post by user u at time t, which includes k mentions to  Optionally, perform dimensionality reduction techniques
users V. The posts published by user u within the time
period [t-T, t], where T is set to 30 days in this project,  Step 3: Split the Dataset
comprise the training set T(t)u. This computation is then
used to define the link-anomaly score. The predictive  Split the dataset into training and testing sets
distribution of the number of mentions and the predictive
distribution of the users who are mentioned can be used to  Step 4: Modified Random Forest (MRF) Training
calculate the two terms in the equation given above.
 Initialize the MRF model with hyper parameters
C. Change Point Analysis and DTO  Train the MRF model using the training set
This approach is a development of the suggested
Change Finder technique, which uses new data  Step 5: Model Evaluation
compressibility to detect changes in a time series' statistical
dependence structure. This module uses MRF coding, also
 Use the trained MRF model to make predictions on the
known as Modified Random Forest (NML) coding, as a
testing set
coding criterion in place of the plug-in predictive
 Evaluate the model's performance using Detection Rate
distribution. Two tiers of scoring procedures are involved in
(DR) and False Alarm Rate (FAR)
the identification of a change point. Change points are
detected by the second layer, whilst outliers are identified by
 Step 6: Attribute Analysis
the first. The scoring criterion for each layer is determined
by calculating the prediction loss for an autoregressive (AR)
model using the MRF coding distribution. While  Analyze the contributions of each attribute class (Basic,
determining the ideal NML code length is challenging, the Content, Traffic, Host) to DR and FAR
suggested SNML offers a sequentially computed  Optimize the dataset by adjusting features to achieve
approximation. Discounting is also used by the MRF while maximum DR while minimizing FAR
training the AR models. Lastly, a threshold is applied in our
method to turn the change-point scores into binary alarms.

IJISRT24APR1128 www.ijisrt.com 666

VI. RESULT ANALYSIS detect a wide range of network threats, such as malware,
port scanning, and denial-of-service attacks. It is imperative
Results from the empirical analysis of the KDD dataset to recognise that no intrusion detection system is perfect.
using the Modified Random Forest (MRF) technique are Like other IDS systems, MRF-based systems are vulnerable
instructive for the Intrusion Detection Systems (IDS) to evasion strategies. Furthermore, MRF-based IDS systems
industry. By dividing the dataset into four categories— can be computationally demanding to train and run.
Basic, Content, Traffic, and Host—the study illustrates the
distinct contributions of each attribute class to the Detection FUTURE WORK
Rate (DR) and False Alarm Rate (FAR). By reducing false
alarm rate (FAR), which stops unnecessary false alerts, and MRF classifiers are well known for their remarkable
increasing detection ratio (DR), which is a measure of accuracy in classification tasks; however, there is room for
successful intrusion detection, the dataset may be optimised improvement. Later research could focus on developing new
by this thorough study. The findings demonstrate the critical MRF algorithms with improved precision and performance.
role attribute class considerations play in the development of Hackers are always coming up with new ways to get around
trustworthy intrusion detection models and provide useful IDS systems. Subsequent efforts can focus on creating MRF
data for enhancing the efficacy of cyber security protocols. classifiers that are more resistant to these evasion strategies.
Training and operation can involve significant computing
Table 1: Comparison Table costs. Subsequent investigations may focus on developing
Algorithm Accuracy novel training algorithms and optimisation strategies that
NB, and DT 75 can reduce the computing load of MRF classifiers.
MRF 88
REFERENCES

[1]. The research "An intellectual intrusion detection

system using hybrid hunger games search and remora
optimisation algorithm for IoT wireless networks"
was carried out by R. Kumar, A. Malik, and V.
Ranga. It was published in the journal Knowledge-
Based Systems in November 2022.
[2]. W. Wang, S. Jian, Y. Tan, Q. Wu, and C. Huang
created a representation-learning-based network
intrusion detection system that records explicit and
implicit feature interactions. In January 2022, the
journal Computer Security published their research.
[3]. W. Lehr, J. Oughton, K. Katsaros, I. Selinis, D.
Fig 3: Comparison Graph
Bubley, and J. Kusuma investigated the differences
between Wi-Fi 6 and 5G wireless internet access
The table shows the accuracy outcomes of several
possibilities [3]. In June 2021, the journal
algorithms, such as Modified Random Forest (MRF),
Telecommunication Policy published their findings.
Decision Trees (DT), and Naive Bayes (NB), within the
[4]. A cross-benchmark evaluation and systematic
framework of a specific inquiry. Fascinatingly, the Modified
mapping study on ensemble learning for intrusion
Random Forest (MRF) scores 88%, far higher than the
combined accuracy of 75% achieved by Naive Bayes and detection systems was carried out by B. A. Tama and
S. Lim. In February 2021, the journal Computer
Decision Trees. These accuracy metrics indicate the MRF
Science Review published their research.
algorithm's effectiveness in the context under study by
[5]. S. Lei, C. Xia, Z. Li, X. Li, and T. Wang developed a
showing how well it performs in comparison to its NB and
novel model dubbed HNN for analysing temporal-
DT competitors. The MRF algorithm is displayed in the
spatial analysis and multi-feature correlation as the
table as a potential choice for the current task, emphasising
foundation for intrusion detection in [5]. October
how important algorithm selection is to achieving higher
2021 saw the publication of their findings in the
accuracy rates.
IEEE Transactions on Network Science and
Engineering.
VII. CONCLUSION
[6]. Y. Cheng, Y. Xu, H. Zhong, and Y. Liu, "Leveraging
In summary, a network intrusion detection system semisupervised hierarchical stacking temporal
convolutional network for anomaly detection in IoT
(IDS) that employs a modified random forest (MRF)
communication," IEEE Internet Things Journal,
algorithm exhibits potential for precisely identifying
volume 8, issue 1, January 2021, pages 144–155.
network intrusions while resolving concerns related to
[7]. In July/August 2021, IEEE Trans. Dependable
overfitting, adaptability, flexibility, and resilience against
Secure Comput., vol. 18, no. 4, pp. 1591–1604,
new threats. MRF-based intrusion detection systems are
"Sustainable ensemble learning driving intrusion
easy to install and train, and they are capable of efficiently
detection model," Z. Ma, C. Zhong, Y. Xiang, X. Li,
monitoring sizable networks. These systems are able to
M. Zhu, L. T. Yang, M. Xu, and H. Li

IJISRT24APR1128 www.ijisrt.com 667

[8]. Developing an efficient feature selection and

ensemble classifier-based intrusion detection system,
Computer Networks, vol. 174, June 2020, Article no.
107247; Y. Zhou, G. Cheng, S. Jiang, and M. Dai.
[9]. "MLEsIDSs: Machine learning-based ensembles for
intrusion detection systems—A review," written by
M. R. Ayyagari, G. Kumar, and K. Thakur November
2020; Journal of Supercomput., 76, no. 11, pp. 8938-
8971.
[10]. Reference 10 A. Tama, B. A., L. Nkenyereye, S. M.
R. Islam, and K. Kwak, "An enhanced anomaly
detection in web traffic using a stack of classifier
ensemble," IEEE Access, vol. 8, pp. 24120–24134,
2020.