0% found this document useful (0 votes)
21 views

Machine Learning Based Intrusion Detection System

The document discusses machine learning based intrusion detection systems. It describes intrusion detection systems, machine learning techniques like support vector machines and naive bayes that are used in intrusion detection. It also discusses how these techniques can help improve the performance of intrusion detection systems.

Uploaded by

eshensanjula2002
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
21 views

Machine Learning Based Intrusion Detection System

The document discusses machine learning based intrusion detection systems. It describes intrusion detection systems, machine learning techniques like support vector machines and naive bayes that are used in intrusion detection. It also discusses how these techniques can help improve the performance of intrusion detection systems.

Uploaded by

eshensanjula2002
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

Proceedings of the Third International Conference on Trends in Electronics and Informatics (ICOEI 2019)

IEEE Xplore Part Number: CFP19J32-ART; ISBN: 978-1-5386-9439-8

MACHINE LEARNING BASED INTRUSION


DETECTION SYSTEM
Anish Halimaa A Dr. K.Sundarakantham
Department of Computer Science and Department of Computer Science and
Engineering Engineering
Thiagarajar College of Engineering Thiagarajar College of Engineering
Madurai, India Madurai, India,
[email protected] [email protected]

Abstract— In order to examine malicious activity that occurs intrusion. Various techniques of intrusion detection are
in a network or a system, intrusion detection system is used. performed; however accuracy is one of the major
Intrusion Detection is software or a device that scans a problems. Detection rate and false alarm rate plays an
system or a network for a distrustful activity. Due to the essential role for the analysation of accuracy. Intrusion
growing connectivity between computers, intrusion detection
detection must be enriched to reduce false alarms and to
becomes vital to perform network security. Various machine
learning techniques and statistical methodologies have been increase the detection rate. Thus, Support Vector Machine
used to build different types of Intrusion Detection Systems (SVM) and Naïve Bayes are applied. Classification can be
to protect the networks. Performance of an Intrusion addressed by these algorithms. Apart from that,
Detection is mainly depends on accuracy. Accuracy for Normalization and Feature Reduction are also applied to
Intrusion detection must be enhanced to reduce false alarms make a comparative analysis.
and to increase the detection rate. In order to improve the
performance, different techniques have been used in recent B. Machine Learning
works. Analyzing huge network traffic data is the main work
of intrusion detection system. A well-organized classification
Machine Learning is used to automate analytical model
methodology is required to overcome this issue. This issue is
taken in proposed approach. Machine learning techniques building. It is a technique of data analysis. It is one of the
like Support Vector Machine (SVM) and Naïve Bayes are branches of Artificial Intelligence which works on the
applied. These techniques are well-known to solve the concept that a system gets trained, make decisions and
classification problems. For evaluation of intrusion detection learn to identify patterns with fewer interventions of
system, NSL– KDD knowledge discovery Dataset is taken. humans. Supervised and Unsupervised learning are the two
The outcomes show that SVM works better than Naïve most extensively used machine learning techniques.
Bayes. To perform comparative analysis, effective Labeled examples like an input with preferred output are
classification methods like Support Vector Machine and taken for training algorithms. Instances without historical
Naive Bayes are taken, their accuracy and misclassification
labels get trained using unsupervised learning. To discover
rate get calculated.
some structure within the data and to explore the data are
Keywords— Intrusion Detection, Support Vector Machine
the two main objective of unsupervised learning. Apart
Naive Bayes, Machine Learning. from these methods, approaches like Semisupervised
learning and Reinforcement learning are used.
I.INTRODUCTION
For training purpose, semisupervised learning uses
In order to recognize abnormal behaviour that occurs in fewer amounts of labeled data and huge amounts of
a computer or network, Intrusion detection system (IDS) is unlabeled data. Trial and error method is used in
used. IDSs can be characterized in several ways, among Reinforcement Learning in which the actions yield the best
them misuse-based and anomaly based IDSs are the most rewards. Classification, regression and prediction are used.
common. To detect known attack like snort, Misuse-based Agent, environment and actions are the three primary
IDS can perform proficiently. This type of IDSs has less component used in this type of learning. The goal is that,
false alarm rate. It incapable to recognizes new attacks the agent has to select those actions, which exploit the
which does not personalize any instruction in database. In predictable reward. By applying good policy, the agent
Anomaly-based IDS, it develops a model of regular able to reach the goal much faster.
behaviour after that; it separates any essential deviations
from this model and consider that deviation as intrusion. C. Support Vector Machine
This type of IDS has the ability to detect both known and
unknown attacks, but encounters a high false alarm rate. Support Vector Machine (SVM) comes under
Various machine learning techniques are incorporated to supervised learning method, in which various types of data
decrease false alarm rate. from different subjects get trained. In a high-dimensional
space, SVM creates hyperplane or multiple hyperplanes.
A. Intrusion Detection System The hyperplane which optimally separates the given data
into various classes with the major partition, consider as a
A distinct existence of intrusion can steal or eliminate best hyperplane. For evaluate the margins between
information from computer or network systems in limited hyperplanes, a non-linear classifier applies various kernel
duration. Hence intrusion is one of the major issues in functions. Maximizing margins between hyperplanes is the
network security. System hardware also gets harm due to main aim of these kernel functions like linear, polynomial,

978-1-5386-9439-8/19/$31.00 ©2019 IEEE 916


Authorized licensed use limited to: Staffordshire University. Downloaded on March 28,2024 at 16:27:55 UTC from IEEE Xplore. Restrictions apply.
Proceedings of the Third International Conference on Trends in Electronics and Informatics (ICOEI 2019)
IEEE Xplore Part Number: CFP19J32-ART; ISBN: 978-1-5386-9439-8

radial basis, and sigmoid. Due to the growing attention in Various machine learning techniques like SVM, Random
SVMs, the eminent applications have been established by Forest (RF) and ELM are examined to report this problem.
the developers and researchers. SVM deals a main role in ELM shows better result when compared to other
image processing and pattern recognition applications. techniques in accuracy. Datasets get divided into one-
Usually a classification task mainly involves dividing fourth of the data samples, half of the dataset and full
data into two sets namely, training datasets and testing datasets. However, SVM produces better results in half of
datasets. In that class label will be defined as “target the data samples and one-fourth samples of data. ELM is
variables” and attributes will be defined as features or the best method to handle the huge amount of data of about
“observed variables. two lakh instances and more.

D. Naive Bayes A new hybrid classification algorithm on Artificial Bee


Colony (ABC) and Artificial Fish Swam (AFS) is
Bayesian classifiers are statistical classifiers. They are proposed [6]. Nowadays computer system is prone to
capable to forecast the probability that whether the given different information thefts due to the widespread usage of
model fits to a particular class. It is based on Bayes’ internet, which leads to the emergence of IDS. Fuzzy C-
theorem. It constructed on the hypothesis that, for a given Means Clustering (FCM) and Correlation-based Feature
class, the attribute value is independent to the values of the Selection (CFS) is applied [6] for separating training
attributes. This theory is called class conditional datasets and to eliminate irrelevant features. If-then rues
independence. are generated by using CART technique, which is applied
to differentiate normal and anomaly records according to
the selected features.

Correlation-based feature selection method which is a


simple filter-based model is used in the proposed system.
Datasets containing the features, highly correlated with the
class, yet uncorrelated with the others are applied. By
II. LITERATURE SURVEY using NSL-KDD and UNSW-NB15 datasets this approach
get achieved 99 percentages of detection rate of anomalies
Protecting computer and network information of an and 0.01 percentages of false positive rate. A hybrid
organizations and individuals become an important task, method for A-NIDS using AdaBoost algorithms and
because compromised information can cause huge loss. Artificial Bee Colony to obtain low false positive rate
Hence, intrusion detection system is used to prevent this (FPR) and high detection rate (DR).
damage. To enrich the function of IDS, different machine
learning approaches get developed. The main objective [2]
is to address the problem of adaptability of Intrusion III.EXISTING METHODOLOGIES
Detection System (IDS).The proposed IDS has the
proficiency to recognize the well-known attacks as well as
The performance of the proposed model is evaluated
unknown attacks. The proposed IDS consist of three major
by the KDD Cup dataset. In order to train classifiers like
mechanisms: Clustering Manager (CM), Decision Maker
SVM and ELM, 10 percent KDD training dataset is taken
(DM), Update Manager (UM). NSL-KDD dataset is
which contains large number of instances. 10 percent
applied to estimate the working of the proposed IDS. Both
KDD dataset is taken rather than entire dataset, because
supervised and unsupervised techniques were
applying entire dataset will cause several problems.
accompanied. The information received to the system is
Symbolic attributes like protocol, service and flag get
grounded on the education of an agent who disregards the
changed or removed. Finally, the instances get labeled
correction proposals presented by IDS. This technique is
under four categories: Normal, DoS, Probe, and R2L.
applied on supervised mode. Both known and unknown
They have trained SVM and ELM with the Dataset. For
traffics can be detected by the system, when they work
testing process, they have used multi-level model with
under unsupervised mode. After updating recently arrived
corrected KDD dataset. Accuracy of the proposed model
data from both supervised and unsupervised modes, the
has attain up to 95.75 percentages and false alarm rate of
function of the system has been improved. Performance of
1.87 percentages by using KDD Cup 1999 dataset.
the system gets improved, when it runs in unsupervised
mode.

By incorporating machine learning techniques like, [3] IV.PROBLEM STATEMENT


SVM and Extreme Learning Machine (ELM), a hybrid
model get developed. Modified K-means is used to Due to the excessive volume of data, false alarm
construct high quality dataset. It builds small dataset that report of intrusion to network gets increased and detection
denote overall original training datasets. By this step, the accuracy gets reduced. This is one of the major issues
training time of the classifier gets reduced. KDDCUP 1999 when the system encounters unknown attacks. The main
is used for implementation. It shows accuracy of about objective is to increase the accuracy rate and to lessen the
95.75 percentages. false alarm rate. To meet the above challenges machine

978-1-5386-9439-8/19/$31.00 ©2019 IEEE 917


Authorized licensed use limited to: Staffordshire University. Downloaded on March 28,2024 at 16:27:55 UTC from IEEE Xplore. Restrictions apply.
Proceedings of the Third International Conference on Trends in Electronics and Informatics (ICOEI 2019)
IEEE Xplore Part Number: CFP19J32-ART; ISBN: 978-1-5386-9439-8

learning algorithm like SVM and Naïve Bayes has been  CfsSubsetEval is one of the methods of
used. attribute selection. It calculates the value of
attributes by considering the individual
V.PROPOSED APPROACH predicting estimation of all features along
with the degree of redundancy between them.
Dataset pre-processing, classification and result
evaluation are the vital phases in the proposed model. In  About classification under SVM, it comes
proposed system each phase is essential and enhances under supervised learning method, in which
important influence on its performance. To examine the various types of data from different subjects
function of SVM and Naïve Bayes classifiers are the get trained. In a given high dimensional
essential steps of this work. space, Support Vector Machine creates
hyperplane or multiple hyperplanes in a high-
A. Pre-Processing dimensional space. SVM creates hyperplane
Dataset contains symbolic features; these features are or multiple hyperplanes.
unable to process by the classifier. Hence, pre-processing
takes place. In this phase all non-numeric or symbolic  The hyperplane which optimally separates the
features get removed or exchanged. Elimination or given data into various classes with the major
replacement of non-numeric or symbolic features is done partition, consider as a best hyperplane. For
in pre-processing phase. evaluate the margins between hyperplanes, a
non-linear classifier applies various kernel
The overall process of pre-processing is essential, in functions. Maximizing margins between
which non-numeric or symbolic features are eliminated or hyperplanes is the main aim of these kernel
replaced, as they do not perform any important functions like linear, polynomial, radial basis,
participation in intrusion detection. Symbolic attributes and sigmoid.
like protocol, service and flag get changed or removed.
Finally, the instances get labeled under four categories:  For the first 19,000 instances, classification
Normal, DoS, Probe, and R2L. of raw dataset using SVM, SVM under
different Normalization techniques and SVM
B. Methodology along with Feature Reduction is done for
comparative analysis. Accuracy and
 Comparative analysis done between SVM Misclassification rate also noted.
and Naïve Bayes for classification of dataset,
to analyze their accuracy and  Same, process is done using Naïve Bayes.
Misclassification Rate. At first raw dataset is Bayesian classifiers are statistical classifiers.
taken and the class attribute contains 24 They are capable to forecast the probability
different types of attack which get labeled that whether the given model fits to a
under 4 categories. They are normal, Dos, particular class. It is based on Bayes’
Probe, r2l. theorem. It works on the hypothesis that, for a
given class, the attribute value is independent
 After, labeling Pre-processing is done to to the values of the attributes. This theory is
convert nominal attribute to binary attribute. called class conditional independence.
In order to obtain improved performance of
intrusion detection system, non-numeric  Naive Bayes classifier works as follows:
features get removed. Training set of samples get denoted by T,
each with their class labels. There are k
 For randomization, the dataset is allowed to classes, X1,X2,X3,..,X(k-1),Xk. A =
get processed in WEKA tool by incorporating {a1,a2,...,an}, depicting n measured values of
the filter Randomize. Randomize filter the n attributes, m1,m2,...,mn, whereas A
randomly shuffles the order of instances depicting n-dimensional vector,. b) For a
passed through it by setting a random number given sample A, the classifier will calculate
generator, in which the seed value get reset. A, which fits to the class having the
Collecting the first 19,000 instances for maximum posteriori probability, conditioned.
comparative analysis.
The block diagram of this approach is given in “Fig.
 In order to get different result and to improve 1”. Accuracy has been calculated and a graph has been
the performance of the dataset, plotted based on the obtained results. From the graph, we
methodologies like CfsSubsetEval is done have can analyze, SVM outperforms Naïve Bayes.
for feature reduction. The given dataset after
preprocessing under goes feature reduction
and normalization.

978-1-5386-9439-8/19/$31.00 ©2019 IEEE 918


Authorized licensed use limited to: Staffordshire University. Downloaded on March 28,2024 at 16:27:55 UTC from IEEE Xplore. Restrictions apply.
Proceedings of the Third International Conference on Trends in Electronics and Informatics (ICOEI 2019)
IEEE Xplore Part Number: CFP19J32-ART; ISBN: 978-1-5386-9439-8

6. Return T
7.end function

VI. PERFORMANCE ANALYSIS

By analyzing accuracy rate and misclassification rate,


the performance of SVM and Naïve Bayes algorithm has
been evaluated for 19,000 instances. The performance
metrics of these algorithms is evaluated by the
information from confusion matrix

Methodology Accuracy Misclassification


Rate Rate
SVM 97.29 2.705
Naïve Bayes 67.26 32.73
SVM-CfsSubsetEval 93.95 6.04
Naïve Bayes- 56.54 43.45
CfsSubsetEval
SVM-Normalization 93.95 2.705
Naïve Bayes- 71.001 28.998
Fig.1. Block Diagram Normalization

TABLE I. Accuracy and Misclassification rate of algorithms


NAIVE BAYES ALGORITHM

INPUT: Training Set T A. Evaluation


Predictor Variable P
OUTPUT: A group of dataset for testing. The model is evaluated based on NSL-KDD dataset,
STEPS: after applying methodologies like pre-processing and
randomization. The dataset consists of 19,000 samples.
1. Read the Training set T. Accuracy rate and Misclassification rate are taken as
2.Calculate the conditional probability P for every class evaluation metrics.
H ←> dependent class
X ←> class variable
P(X|H) = P(H|X)*P(X)
────────
P(H)
3.Find the class with maximum probability.
4.Generate confusion matrix
5.Find Accuracy and Misclassification rate

SUPPORT VECTOR MACHINE

INPUT: Preprocessed Data


OUTPUT: Output Classes Fig 2.Accuracy and misclasssification rate of SVM and Naive Bayes for
STEPS: 19,000 instances

1 Calculate Objective Function T The above graph describes the comparison of


2.Objective function= minwλ || w ||2 + Σ(1-yi(xi,w)) classification accuracy and Misclassification rate of the
Where xi is the input sample, yi is the output label, original dataset after preprocessing. From the graph it can
W is weight vector, λ is reguralaization parameter be infer that SVM attains accuracy of 97.29 percentages
3. Apply gradient descent learning w.r.t weight and Naive Bayes attains accuracy rate of 67.26
4. Update rule for weight for misclassified output percentages for 19000 instances. Naive Bayes has high
w=w+η(yixi - 2λw). Misclassifcation rate than SVM and for 19000 instance.
5. Update rule for weight for correctly classified output
w=w+η(i - 2λw), where η is the learning vector

978-1-5386-9439-8/19/$31.00 ©2019 IEEE 919


Authorized licensed use limited to: Staffordshire University. Downloaded on March 28,2024 at 16:27:55 UTC from IEEE Xplore. Restrictions apply.
VIII. FUTURE WORK

Future work deals with large volume of data, a hybrid


multi-level model will be constructed to improve the
accuracy. It deals with building an more effective model
based on well-organised classifiers which are capable to
categorise new attacks with better performance.

REFERENCES
Fig 3. Accuracy and misclassification rate of SVM and Naive Bayes for [1]H.Wang,J.Gu,andS.Wang,‘‘An effective intrusion detection
19,000 instances after Normalization framework based on SVM with feature augmentation,’’ Knowl.-Based
Syst., vol. 136, pp. 130–139, Nov. 2017.

The above graph describes the comparison of [2]Setareh Roshan, Yoan Miche, Anton Akusok, Amaury Lendasse;
classification accuracy and Misclassification rate of the “Adaptive and Online Network Intrusion Detection System using
dataset after Normalization. From the graph it can be infer Clustering and Extreme Learning Machines”, ELSEVIER, Journal of the
Franklin Institute, Volume.355, Issue 4,March 2018,pp.1752-1779.
that SVM attains accuracy of 93.85 percentages and Naive
Bayes attains accuracy rate of 71.001 for 19000 instances. [3]Wathiq Laftah Al-Yaseen , Zulaiha Ali Othman , Mohd Zakree
Naive Bayes has high Misclassification rate than SVM Ahmad Nazri; “Multi-Level Hybrid Support Vector Machine and
and for 19000 instances. the accuracy rate has been Extreme Learning Machine Based on Modified K-means for Intrusion
Detection System”, ELSEVIER, Expert System with Applications,
decreased for Naive Bayes for 19000 instances. Volume.66,Jan 2017,pp.296-303.

[4]Iftikhar Ahmad, Mohammad Basheri, Muhammad Javed Iqbal, Aneel


Raheem; “Performance Comparison of Support Vector Machine,
Random Forest, and Extreme Learning Machine for Intrusion
Detection”, IEEE ACCESS, Survivability Strategies for Emerging
Wireless Networks, Volume.6,May 2018,pp.33789-33795.

[5]BuseGulAtli1, YoanMiche,AapoKalliola, IanOliver, SilkeHoltmanns,


AmauryLendasse; “Anomaly-Based Intrusion Detection Using Extreme
Learning Machine and Aggregation of Network Traffic Statistics in
Probability Space” SPRINGER, Cognitive Computation, June 2018,pp.
1-16

[6]Pinjia He, Jieming Zhu, Shilin He, Jian Li, and Michael R. Lyu; “A
Feature Reduced Intrusion Detection System Using ANN Classifier”,
ELSEVIER, Expert Systems with Applications,Vol.88,December 2017
Fig 4. Accuracy and Misclassification rate after feature reduction pp.249-247

The above graph describes the comparison of [7]Vajiheh Hajisalem, Shahram Babaie; “A hybrid intrusion detection
system based on ABC-AFS algorithm for misuse and anomaly
classification accuracy and misclassification rate of the detection”, ELSEVIER, Department of Computer Engineering, Vol. 136,
dataset after Feature reduction. From the graph it can be pp. 37-50, May 2018.
infer that SVM attains accuracy of 93.95 percentages and
Naive Bayes attains accuracy rate of 56.54 for 19000 [8]Karen A. Garcia, Raul Monroy , Luis A. Trejo, Carlos Mex-Perera
and Eduardo Aguirre,“Analyzing Log Files for Postmortem Intrusion
instances. Naive Bayes has high Misclassification rate Detection”,IEEE Transactions on Systems,Man, and Cybernetics, part
than SVM for 19000 instances. C(Application and Reviews)42.6(2012),pp.1690-1704.

[9]R.M.Elbasiony,E.A.Sallam,T.E.Eltobely,andM.M.Fahmy,‘‘A hybrid
network intrusion detection framework based on random forests and
VII. CONCLUSION weighted k-means,’’ Ain Shams Eng. J.,vol. 4,no. 4,pp. 753–762, 2013.

Intrusion detection and Intrusion prevention are needed [10]Hudan Studiawan, Christian Payne, Ferdous Sohel; “Graph
Clustering and Anomaly Detection of Access Control log for Forensic
in current trends. As our regular events are mainly Purposes”, ELSEVIER, Digital Investigation, Vol. 21, pp.76-87, June
dependent on networks and information systems, intrusion 2017 .
detection and intrusion prevention are very vital. Many
approaches have been applied in intrusion detection [11]Mazini, Mehrnaz, Babak Shirazi and Iraj Mahdavi; “Anomaly
network-based intrusion detection system using a reliable hybrid
systems. Among them machine learning plays a vital role. artificial bee colony and AdaBoost algorithms”, Journal of King Saud
This analysis deals with machine learning algorithms like University- Computer and Information Sciences, 2018.
SVM and Naïve Bayes. It proposes while dealing with
19,000 instances SVM outperforms Naïve Bayes. [12] Huang, G.-B., Zhou, H., Ding, X., & Zhang, R.“Extreme learning
machine for regression and multiclass classification. IEEE Transactions
on Systems, Man, and Cybernetics” 42(2), 513–529, 2012.

920
Authorized licensed use limited to: Staffordshire University. Downloaded on March 28,2024 at 16:27:55 UTC from IEEE Xplore. Restrictions apply.

You might also like