Intrusion Detection System Using Unsupervised ML Algorithms: School of Information Technology and Engineering
Intrusion Detection System Using Unsupervised ML Algorithms: School of Information Technology and Engineering
A PROJECT ON
INTRUSION DETECTION SYSTEM USING
UNSUPERVISED ML ALGORITHMS
Submitted by:
Aditya Kumar (18BIT0235)
Ritvik Gupta (18BIT0218)
ABSTRACT
With the advent vast amounts of information and technology, all forms of
businesses around the world are becoming increasingly data driven.
Companies collect and deal with high velocity, variety and volumes of
data. This also gives way to various loopholes in the systems developed
for working with such large amounts of data.
METHODOLOGY
CATEGORIZING DATA
Here, we import the data into out project and categorize the data in on
basis of different types of attacks in 5 general attack categories. ie. Normal,
DoS, R2L, U2R, Probe.
We also categorize into Normal or Attack categories. We use Pipeline
functions which takes the arguments as Transformers and Estimators which
are custom defined in our code to make the categories.
TRAIN and TEST DATA OUTPUT
Train and Test data frames after applying transform Pipelines.
ONE-HOT ENCODING
Most machine learning algorithms cannot operate on label data directly.
They require all input variables and output variables to be numeric.
References: https://www.naun.org/main/UPress/cc/2014/a102019-106.pdf
http://www.wseas.us/e-library/conferences/2013/Nanjing/ACCIS/ACCIS-30.pdf
FEATURE SCALING
Machine learning algorithm just sees number — if there is a vast difference
in the range say few ranging in thousands and few ranging in the tens, and
it makes the underlying assumption that higher ranging numbers have
superiority of some sort. So, these more significant number starts playing a
more decisive role while training the model and hence the model becomes
biased towards a specific feature. To fix this, we use scaling of data.