0% found this document useful (0 votes)
16 views

Paper 2-Application of Machine Learning Approaches in Intrusion Detection System

This document summarizes a survey of 49 research papers on the application of machine learning approaches for intrusion detection systems between 2009 and 2014. The papers focused on single, hybrid, and ensemble classifier designs and architectures. A statistical comparison of commonly used classifier algorithms, datasets, and consideration of feature selection is provided. Machine learning techniques for intrusion detection can be categorized as anomaly detection or misuse detection. Anomaly detection identifies unknown attacks while misuse detection relies on known signatures or patterns.

Uploaded by

Ibrahim qashta
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views

Paper 2-Application of Machine Learning Approaches in Intrusion Detection System

This document summarizes a survey of 49 research papers on the application of machine learning approaches for intrusion detection systems between 2009 and 2014. The papers focused on single, hybrid, and ensemble classifier designs and architectures. A statistical comparison of commonly used classifier algorithms, datasets, and consideration of feature selection is provided. Machine learning techniques for intrusion detection can be categorized as anomaly detection or misuse detection. Anomaly detection identifies unknown attacks while misuse detection relies on known signatures or patterns.

Uploaded by

Ibrahim qashta
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

(IJARAI) International Journal of Advanced Research in Artificial Intelligence,

Vol. 4, No.3, 2015

Application of Machine Learning Approaches in


Intrusion Detection System: A Survey
Nutan Farah Haq Musharrat Rafni
Department of Computer Science and Engineering Department of Computer Science and Engineering
Ahsanullah University of Science and Technology Ahsanullah University of Science and Technology
Dhaka, Bangladesh Dhaka, Bangladesh

Abdur Rahman Onik Faisal Muhammad Shah


Department of Computer Science and Engineering Department of Computer Science and Engineering
Ahsanullah University of Science and Technology Ahsanullah University of Science and Technology
Dhaka, Bangladesh Dhaka, Bangladesh

Md. Avishek Khan Hridoy Dewan Md. Farid


Department of Computer Science and Engineering Department of Computer Science and Engineering
Ahsanullah University of Science and Technology United International University
Dhaka, Bangladesh Dhaka, Bangladesh

Abstract—Network security is one of the major concerns of the performance normal during any malicious outbreak,
the modern era. With the rapid development and massive usage perform an experienced security analysis.
of internet over the past decade, the vulnerabilities of network
security have become an important issue. Intrusion detection Intrusion detection system approaches can be classified in 2
system is used to identify unauthorized access and unusual different categories. One of them is anomaly detection and the
attacks over the secured networks. Over the past years, many other one is signature based detection, also known as misuse
studies have been conducted on the intrusion detection system. detection based detection approach [4, 41]. The misuse
However, in order to understand the current status of detection is used to identify attacks in a form of signature or
implementation of machine learning techniques for solving the pattern. As misuse detection uses the known pattern to detect
intrusion detection problems this survey paper enlisted the 49 attacks the main disadvantage is that it will fail to identify any
related studies in the time frame between 2009 and 2014 focusing unknown attacks to the network or system. On the other hand,
on the architecture of the single, hybrid and ensemble classifier anomaly detection is used to detect unknown attacks. There
design. This survey paper also includes a statistical comparison are different ways to find out the anomalies. Different
of classifier algorithms, datasets being used and some other machine learning techniques are introduced in order to
experimental setups as well as consideration of feature selection identify the anomalies.
step.
Over the years, many researchers and scholars have done
Keywords—Intrusion detection; Survey; Classifiers; Hybrid; some significant work on the development of intrusion
Ensemble; Dataset; Feature Selection detection system. This paper reviewed the related studies in
intrusion detection system over the past six years. This paper
I. INTRODUCTION enlisted 49 papers in total from the year 2009 to 2014.This
The Internet has become the most essential tool and one of paper enlisted the proposed architecture of the classification
the best sources of information about the current world. techniques, algorithms being used. A Statistical comparison
Internet can be considered as one of the major components of has been added to show classifier design, chosen algorithms,
education and business purpose. Therefore, the data across the used datasets as well as the consideration of feature selection
Internet must be secure. Internet security is one of the major step.
concerns now-a-days. As Internet is threatened by various
This paper is organized as follows: Section 2 provides the
attacks it is very essential to design a system to protect those
research topic overview where a number of techniques for
data, as well as the users using those data. Intrusion detection
intrusion detection have been described. Section 3 represents a
system (IDS) is therefore an invention to fulfill that
statistical overview of articles over the years on the algorithms
requirement. Network administrators adapt intrusion detection
that were frequently used, the datasets for each experiment
system in order to prevent malicious attacks. Therefore,
and the consideration of feature selection step. Section 4
intrusion detection system became an essential part of the
includes the discussion and conclusion as well as some issues
security management. Intrusion detection system detects and
which have been highlighted for future research in intrusion
reports any intrusion attempts or misuse on the network. IDS
detection system using machine learning approaches.
can detect and block malicious attacks on the network, retain

9|P age
www.ijarai.thesai.org
(IJARAI) International Journal of Advanced Research in Artificial Intelligence,
Vol. 4, No.3, 2015

II. RESEARCH PAPER OVERVIEW 2) Naive Bayes


On the basis of the class label given Naive Bayes assumes
A. Machine Learning Approach
that the attributes are conditionally independent and thus tries
Machine learning is a special branch of artificial to estimate the class-conditional probability[15]. Naive Bayes
intelligence that acquires knowledge from training data based often produces good results in the classification where there
on known facts. Machine learning is defined as a study that exist simpler relations. Naive Bayes requires only one scan of
allows computers to learn knowledge without being the training data and thus it eases the task of classification a
programmed mentioned by Arthur Samuel in 1959.Machine lot.
learning mainly focuses on prediction. Machine learning
techniques are classified into three broad categories such as – 3) K-nearest neighbor
supervised learning, unsupervised learning, and reinforcement Various distance measure techniques are being used in K-
learning. nearest neighbor. K-nearest neighbor finds out k number of
samples in training data that are nearest to the test sample and
1) Supervised Learning then it assigns the most frequent class label among the
Supervised learning is also known as classification. In considered training samples to the test sample. For classifying
supervised learning data, instances are labeled in the training samples, K-nearest neighbor is known as an approach which is
phase. There are several supervised learning algorithms. the most simple and nonparametric[8]. K-nearest neighbor can
Artificial Neural Network, Bayesian Statistics, Gaussian be mentioned as an instance-based learner, not an inductive
Process Regression, Lazy learning, Nearest Neighbor based [35].
algorithm, Support Vector Machine, Hidden Markov Model,
Bayesian Networks, Decision Trees(C4.5,ID3, CART, 4) Artificial Neural Network
Random Forrest), K-nearest neighbor, Boosting, Ensembles Artificial Neural Network (ANN) is a processing unit for
classifiers (Bagging, Boosting), Linear Classifiers (Logistic information which was inspired by the functionality of human
regression, Fisher Linear discriminant, Naive Bayes classifier, brains [23]. Typically neural networks are organized in layers
Perceptron, SVM), Quadratic classifiers are some of the most which are made up of a number of interconnected nodes which
popular supervised learning algorithms. contain a function of activation. Patterns are presented to the
network via the input layer, which communicates to one or
2) Unsupervised Learning more hidden layers where via a system of weighted
In unsupervised learning data instances are unlabeled. A connections the actual processing is done. The hidden layers
prominent way for this learning technique is clustering. then link to an output layer for producing the detection result
Some of the common unsupervised learners are Cluster as output.
analysis (K-means clustering, Fuzzy clustering), Hierarchical 5) Support Vector Machines
clustering, Self-organizing map, Apriori algorithm, Eclat Support vector machine (SVM) was introduced in mid-
algorithm and Outlier detection (Local outlier factor). 1990’s [5]. The concept behind SVM for intrusion detection
3) Reinforcement Learning basically is to use the training data as a description of only the
Reinforcement learning means computer interacting with an normal class of objects or which is known as non-attack in
environment to achieve a certain goal. A reinforcement intrusion detection system, and thus assuming the rest as
approach can ask a user (e.g., a domain expert) to label an anomalies [51]. The classifier constructed by support vector
instance, which may be from a set of unlabeled instances. machines methodology discriminates the input space in a
finite region where the normal objects are contained and all
B. Single Classifiers the rest of the space is assumed to contain the anomalies [9].
One machine learning algorithm or technique for 6) Fuzzy Logic
developing an intrusion detection system can be used as a For reasoning purpose, dual logic's truth values can be
standalone classifier or single classifier. Some of the machine either absolutely false (0) or absolutely true (1), but in Fuzzy
learning techniques have been discussed in this study which logic these kinds of restrictions are being relaxed [60]. That
have been found as frequently used single classifiers in our means in Fuzzy logic the range of the degree of truth of a
studied 49 research papers. statement can hold the value between 0 and 1 along with '0'
1) Decision Tree and '1'[11].
Creating a classifier for predicting the value of a target C. Hybrid Classifiers
class for an unseen test instance, based on several already
known instances is the task of Decision tree (DT). Through a A hybrid classifier offers combination of more than one
sequence of decisions, an unseen test instance is being machine learning algorithms or techniques for improving the
classified by a Decision tree [11]. Decision tree is very much intrusion detection system's performance vastly. Using some
popular as a single classifier because of its simplicity and clustering-based techniques for preprocessing samples in
easier implementation [14]. Decision tree can be expanded in training data for eliminating non-representative training
2 types: (i) Classification tree, with a range of symbolic class samples and then, the results of the clustering are used as
labels and (ii) Regression tree, with a range of numerically training samples for pattern recognition in order to design a
valued class labels [11]. classifier. Thus, either supervised or unsupervised learning
approaches can be the first level of a hybrid classifier [11].

10 | P a g e
www.ijarai.thesai.org
(IJARAI) International Journal of Advanced Research in Artificial Intelligence,
Vol. 4, No.3, 2015

D. Ensemble Classifiers B. Classifier design


The classifiers performing slightly better than a random
classifier are known as weak learners. When multiple weak
learners are combined for the greater purpose of improving the
performance of a classifier significantly is known as Ensemble
classifier [11].Majority vote, bagging and boosting are some
common strategies for combining weak learners [15].Though
it is known that the disadvantages of the component classifiers
get accumulated in the ensemble classifier, but it has been
producing a very efficient performance in some combination.
So researchers are becoming more interested in ensemble
classifiers day by day.
III. STATISTICAL COMPARISONS OF RELATED WORK
A. Distribution of Papers by Year of Publication
Fig. 1. Year-wise distribution of papers
The survey comprises 49 research papers in the time frame
between 2009 and 2014. It discussed 8 papers from each of the Intrusion detection method can be categorized in 3
year 2009, 2010 and 2012.The highest number of papers are categories namely single, hybrid and ensemble[11] .Fig.2
studied from the year 2011.The number of papers from that depicts the number of research papers in terms of single,
year is 11. 10 papers are enlisted for the year 2013 and 4 hybrid and ensemble classifiers used in each year. According
papers from 2014.Fig.1 depicts the percentage of distribution
of papers by year of publication.

TABLE I. TOTAL NUMBERS OF RESEARCH PAPERS FOR THE Types Of CLASSIFIER DESIGN

Classifier design No. of research


References
type paper
(D. Sa´nchez, 2009)[12], (Su-Yun Wua, 2009)[50], (Jun Ma, 2009)[27], (Mao Ye, 2009)[31], (Feng Jiang,
2009)[16], (Yung-Tsung Hou, 2010)[58], (Min Seok Mok, 2010)[34], (Han-Ching Wu, 2010)[22],
(Chengpo Mua, 2010)[10], (Wang Dawei, 2011)[53], (G. Davanzo, 2011)[17], (Levent Koc, 2012)[29],
Single 20
(Carlos A. Catania, 2012)[9], (Inho Kang, 2012)[26], (Prabhjeet Kaur, 2012)[38], (Yusuf Sahin,
2013)[59], (S. Devaraju, 2013)[42], (Guillermo L. Grinblat, 2013)[21], (Mario Poggiolini, 2013)[32],
(Adel Sabry Eesa, 2014)[2].

(Kamran Shafi, 2009)[28], (M. Bahrololum, 2009)[30], (Gang Wang, 2010)[18], (Woochul Shim,
2010)[55], (Muna Mhammad T. Jawhar, 2010)[37], (Ilhan Aydin, 2010)[25], (Seung Kim, 2011)[45], (I.T.
Christou, 2011)[24], (Mohammad Saniee Abadeh, 2011)[36], (Shun-Sheng Wang, 2011)[47], (Su,
2011)[49], (Seungmin Lee, 2011)[46], (Yinhui Li, 2012)[57], (Bose, 2012)[6], (Prof. D.P. Gaikwad,
Hybrid 22
2012)[39], (A.M.Chandrashekhar, 2013)[1], (Mazyar Mohammadi Lisehroodi, 2013)[33], (Dahlia Asyiqin
Ahmad Zainaddin, 2013)[13], (Seongjun Shin, 2013)[44], (Gisung Kim, A novel hybrid intrusion
detection method integrating anomaly detection with misuse detection, 2013)[19], (Wenying Feng,
2014)[54], (Ravi Ranjan, 2014)[40].

(Tich Phuoc Tran, 2009)[52], (C.A. Laurentys, 2011)[7], (Dewan Md. Farid M. Z., 2011)[15], (Yang Yi,
Ensemble 7 2011)[56], (Siva S. Sivatha Sindhu, 2012)[48], (Dewan Md. Farid L. Z., 2013)[14], (Akhilesh Kumar
Shrivas, 2014)[3]

8
No. Of research papers

6
Single
4
Hybrid
2 Ensemble

0
2009 2010 2011 2012 2013 2014

Fig. 2. Year wise distribution of research papers for the types of classifier design

11 | P a g e
www.ijarai.thesai.org
(IJARAI) International Journal of Advanced Research in Artificial Intelligence,
Vol. 4, No.3, 2015

to the statistical comparison between the enlisted papers, the contrary, Fuzzy logic seems to be less considerable among
hybrid classifiers have the highest number of literatures in the the single classifiers over the enlisted literatures.
time frame mentioned earlier with a total number of 22. What
comes later in terms of study is single classifiers which have D. Ensemble classifiers
been studied in 20 papers. Multiple weak learners are combined in Ensemble
classifiers. Table III depicts the articles using ensemble
C. Single classifiers classifiers in intrusion detection system. Statistics shows
Fig. 3 depicts the number of single learning algorithms used AdaBoost is the most commonly used learning algorithm
as classifiers. The number of research papers in the single along with majority voting. Table III also enlists the detection
classifier architecture using different classification techniques, rate of each of the classifier and the citation of each article
e.g. Bayesian, SVM, DT, ANN, KNN, Fuzzy Logic enlisted throughout the time period.
in this survey paper is twenty. Table II enlists the proposed
algorithms used in all the articles reviewed in this paper. Table E. Hybrid classifiers
IV shows Year wise distribution of single classifiers regarding Table V depicts Year wise distribution of Hybrid classifiers
results and citation of each article. regarding results and citation of each article. Hybrid classifiers
in intrusion detection have established in the mainstream study
Support vector machine and Artificial neural network are due to the performance accuracy in recent times Statistics
the most popular approaches for single learning algorithm shows hybrid classifiers have the highest number of articles in
classifiers. Though we have taken 49 related papers and the Year of 2011. The table also shows the used algorithms in
number of comparative samples is less but the comparison each article and their performance in intrusion detection
result implies that Support Vector machine is by far the most system.
common and considered single classification technique. On
2.5
SVM
Decisision Tree
2
Naïve Bayes
1.5 ANN
Fuzzy Lozic
1 Detector Generation Algorithm
Logistic Regression
0.5 RMDID algorithm
Cuttlefish algorithm
0 Negative selection algorithm
2009 2010 2011 2012 2013 2014 PODM algorithm
kNN

Fig. 3. Distribution of Single classifiers over the Years

TABLE II. ALGORITHMS USED IN SINGLE TYPE OF CLASSIFIER DESIGNED BASED RESEARCH PAPERS
Algorithm Research paper Title Reference
Naive Bayes  A network intrusion detection system based on a hidden naive bayes (Levent Koc, 2012)[29] ; (Yung-
multiclass classifier. Tsung Hou, 2010)[58]
 Malicious web content detection by machine learning.
Support Vector Machine  An autonomous labeling approach to support vector machines (Carlos A. Catania, 2012)[9] ; (Inho
algorithms for network traffic anomaly detection. Kang, 2012)[26] ; (Guillermo L.
 A differentiated one-class classification method with applications to Grinblat, 2013)[21] ; (Yung-Tsung
intrusion detection. Hou, 2010)[58];
 Abrupt change detection with One-Class Time Adaptive Support (G. Davanzo, 2011)[17].
Vector Machines.
 Malicious web content detection by machine learning.
 Anomaly detection techniques for a web defacement monitoring
service.

Decision Tree  Madam id for intrusion detection using data mining. (Prabhjeet Kaur, 2012)[38]; (Yusuf
 A cost-sensitive decision tree approach for fraud detection. Sahin, 2013)[59] ; (Su-Yun Wua,
 Data mining-based intrusion detectors. 2009)[50] ; (Yung-Tsung Hou,
 Malicious web content detection by machine learning. 2010)[58].

Artificial Neural Network  Detection of accuracy for intrusion detection system using neural (S. Devaraju, 2013)[42] ; (Han-
network classifier. Ching Wu, 2010)[22].
 Neural networks-based detection of stepping-stone intrusion.

12 | P a g e
www.ijarai.thesai.org
(IJARAI) International Journal of Advanced Research in Artificial Intelligence,
Vol. 4, No.3, 2015

Fuzzy Logic  Data mining-based intrusion detectors. (Su-Yun Wua, 2009)[50].

Detector Generation Algorithm  Evolving boundary detector for anomaly detection (Wang Dawei, 2011)[53].

Negative Selection algorithm  Application of the feature-detection rule to the Negative Selection (Mario Poggiolini, 2013)[32].
Algorithm
Logistic regression  Random effects logistic regression model for anomaly detection (Min Seok Mok, 2010)[34].

RMDID  Projected outlier detection in high-dimensional mixed-attributes data (Mao Ye, 2009)[31].
set.
PODM  Information inconsistencies detection using a rule-map technique (Jun Ma, 2009)[27]
Cuttlefish algorithm  A novel feature-selection approach based on the cuttlefish (Adel Sabry Eesa, 2014)[2].
optimization algorithm for intrusion detection systems.
Sequence-based Outlier Detection  Some issues about outlier detection in rough set theory. (Feng Jiang, 2009)[16].
algorithm
K-nearest neighbour  Anomaly detection techniques for a web defacement monitoring (G. Davanzo, 2011)[17].
(KNN) service.

TABLE III. YEAR WISE DISTRIBUTION OF ENSEMBLE CLASSIFIERS REGARDING RESULTS AND CITATION OF EACH ARTICLE
Year Research Paper Title Reference Algorithm used Result (%) Citation

2009 Novel intrusion detection using (Tich Phuoc Tran,  NN DR : 94.31 14


probabilistic neural network & 2009)[52]  AdaBoost
adaptive boosting  BSPNN
2011 A novel artificial immune system for (C.A. Laurentys, 2011)[7]  GA DR : 97.85 17
fault behavior detection  Majority Vote
Adaptive intrusion detection based on (Dewan Md. Farid M. Z.,  NB DR : 99.75 14
boosting & naive Bayesian classifier 2011)[15]  AdaBoost
Incremental SVM based on reversed (Yang Yi, 2011)[56]  SVM DR : 81.377 30
set for network intrusion detection  ISVM
2012 Decision tree based light weight (Siva S. Sivatha Sindhu,  Neural ensemble DR : 98.38 44
intrusion detection using a wrapper 2012)[48] decision tree
approach
2013 An adaptive ensemble classifier for (Dewan Md. Farid L. Z.,  NB DR : 92.65 13
mining concept drifting data streams 2013)[14]  C4.5
 AdaBoost
 a
2014 An ensemble model for classification (Akhilesh Kumar Shrivas, ANN DR : 97.53 (using NSL-KDD)
of attacks with feature selection based 2014)[3]  Bayesian Network DR: 99.41 (using KDD-99)
on KDD-99 & NSL-KDD data set  Gain ratio FS

a
Not cited yet.

TABLE IV. YEAR WISE DISTRIBUTION OF SINGLE CLASSIFIERS REGARDING RESULTS AND CITATION OF EACH ARTICLE
Year Research Paper Title Reference Algorithm used Result (%) Citation
2009 Association rules applied to credit card fraud (D. Sa´nchez,  Association rule methodology Certainty factor : 64
detection 2009)[12] 80.08

Data Mining based intrusion detectors (Su-Yun Wua,  C4.5 DR : 70.62 67


2009)[50] FAR: 1.44
Some issues about outlier detection in rough (Feng Jiang, 2009)[16]  Outlier Detection algorithm DR: 30
set theory SEQ based : 90
DIS based : 92
Projected outlier detection in high dimensional (Mao Ye, 2009)[31]  PODM algorithm DR: 24
mixed attributes data set Credit approval data : 70
Breast Cancer Data : 80
Mushroom Data : 96
Synthetic Data : 97
Information inconsistencies detection using a (Jun Ma, 2009)[27]  RMDID algorithm Error scales = 5.0% 1
rule map technique Inconsistent entries in
Train Set = 5, Test Set = 4

2010 Malicious web content detection by machine (Yung-Tsung Hou,  Naïve Bayes Accuracy : 39
learning 2010)[58]  DT NB : 58.28
 SVM DT : 94.74
 AdaBoost SVM: 93.51
Boosted DT: 96.14
Random effect logistic regression model for (Min Seok Mok,  Logistic regression model. Classification accuracy : 8
anomaly detection 2010)[34] Training dataset :
79.43 (Normal)
20.57(Attack)
Validation dataset:
79.17 (Normal)
20.83 (Attack)
An intrusion response decision making model (Chengpo Mua,  Hierarchical task network Roc curve : 20
based on hierarchical task network planning 2010)[10] planning excellent

Neural Networks based detection of stepping (Han-Ching Wu,  Neural Network Accuracy : 99.0 13
stone intrusion 2010)[22]

13 | P a g e
www.ijarai.thesai.org
(IJARAI) International Journal of Advanced Research in Artificial Intelligence,
Vol. 4, No.3, 2015

2011 Evolving boundary detectors for anomaly (Wang Dawei,  Detector Generation algorithm DR : 6
detection 2011)[53] Iris Dataset : 99.28 considering
Self radius = 0.08
Boundary threshold = 0.04
KDD dataset :
DOS : 94.5
Probing : 93.64
U2R: 78.85
R2L: 50.69 considering
Self radius = 0.05
Boundary threshold = 0.025

Anomaly detection techniques for a web (G. Davanzo,  K nearest neighbor FPR: 3
defacement monitoring service 2011)[17]  Support Vector machine K nearest neighbor : 19.43
SVM :6.45

2012 A network intrusion detection system based on (Levent Koc,  Hidden Naïve Bayes Accuracy : 93.73 45
Hidden Naïve bayes multiclass classifier 2012)[29] Error rate: 6.28

An autonomous labeling approach to support (Carlos A. Catania,  Support Vector machine DR : 88.64 (80% attack) 11
vector machines algorithms for network traffic 2012)[9] 98.37 (1% attack)
anomaly detection
A differentiated one-class classification method (Inho Kang, 2012)[26]  Support Vector machine DR :
with applications to intrusion detection M=200* 17
Targeted attack : 96.9
( 4.7 % more than ordinary
detection)

Madam id for intrusion detection using Data (Prabhjeet Kaur,  Decision Tree (J48) FP rate :75.00 7
mining 2012)[38] Precession : 1.7
Recall: 66.7

2013 A cost sensitive Decision tree approach for (Yusuf Sahin,  Decision Tree TPR: 9
fraud detection 2013)[59] Direct cost : 74.6
Class Probability : 92.1
CS-Gini : 92.8
Cs-IG: 92.6
Detection of accuracy for intrusion detection (S. Devaraju,  Neural Network Accuracy : 4
system using neural network classifier 2013)[42] FFNN : 79.49
ENN: 78.1
GRNN: 58.74
PNN:85.50
RBNN: 83.51
Abrupt change detection with one class time (Guillermo L.  Support Vector Machine 495.9 sequences correctly 3
adaptive Support Vector Machine Grinblat, 2013)[21] classified within 500 sequences.

Application of feature –detection rule to the (Mario Poggiolini,  Negative Selection algorithm Feature Detection rule : 0.9375 3
negative selection algorithm 2013)[32] HD rule : 0.7686
RCHK(No MHC rule ):0.8258
RCHK(Global MHC rule) : 0.5155
RCHK(MHC ) rule : 0.9482
 b
2014 A novel feature-selection approach based on (Adel Sabry Eesa, Cuttlefish algorithm AR : 73.267
the cuttlefish optimization algorithm for 2014)[2] DR: 71.067
intrusion detection system FPR: 17.685
b
Not cited yet.

TABLE V. A DETAILED INFORMATION ON RESEARCH PAPERS DESIGNED WITH HYBRID CLASSIFIER

Year Research Paper Title Reference Algorithm(s) used Result (%) Citation
2009 Anomaly intrusion detection design using (M. Bahrololum,  NN TP rate : 97.00(Dos) 11
hybrid of unsupervised and supervised neural 2009)[30] 71.65(Probe)
network 26.69(R2L)

An adaptive genetic-based signature learning (Kamran Shafi,  GA Accuracy : 92 31


system for intrusion detection 2009)[28] FA rate : 0.84
2010 A new approach to intrusion detection using (Gang Wang,  ANN. Accuracy : 96.71 114
Artificial Neural Networks and 2010)[18]  Fuzzy clustering. Precision : 99.91(Dos)
fuzzy clustering 48.12(Probe)
93.18(R2L)
83.33(U2R)
A distributed sinkhole detection method using (Woochul Shim,  Hierarchical cluster analysis. DR : 96.61 7
cluster analysis 2010)[55]
Design Network Intrusion Detection System (Muna Mhammad T.  Fuzzy C-means clustering. Accuracy : 100(Dos) 21
using hybrid Jawhar, 2010)[37]  NN 100(U2R)
Fuzzy-Neural Network 99.8(Probe)
40(R2L)
68.6(Unknown)
Chaotic-based hybrid negative selection (Ilhan Aydin,  Negative selection. Accuracy : 97.65 51
algorithm and its applications in fault 2010)[25]  Clonal selection.
and anomaly detection  KNN.
2011 Detecting fraud in online games of chance and (I.T. Christou,  LOF. DR : 98 3
lotteries 2011)[24]  K-means clustering.
 EXAMCE.
Fast outlier detection for very large log data (Seung Kim,  Kd-tree indexing. Gained time efficiency : 293-8727 11
2011)[45]  Approximated KNN.
 LOF.
Design and analysis of genetic fuzzy systems (Mohammad Saniee  Fuzzy genetic based machine DR : 88.13 (Mitchigan) 21
for intrusion detection in Abadeh, 2011)[36] learning methods: 99.53 (Pitsburg)
computer networks (i)Michigan,(ii)Pitsburg,(iii)IRL. 93.2 (IRL)

14 | P a g e
www.ijarai.thesai.org
(IJARAI) International Journal of Advanced Research in Artificial Intelligence,
Vol. 4, No.3, 2015

An Integrated Intrusion Detection System for (Shun-Sheng Wang,  BPN. Accuracy: 95.13 24
Cluster-based Wireless 2011)[47]  ART.
Sensor Networks  Rule based method.
Real-time anomaly detection systems for (Su, 2011)[49]  GA. Accuracy : 97.42 (with known 16
Denial-of-Service attacks by weighted  KNN. attack)
k-nearest-neighbor classifiers Accuracy : 78 (with unknown
attack)
Self-adaptive and dynamic clustering for online (Seungmin Lee,  SOM. DR : 83.4 (offline) 14
anomaly detection 2011)[46]  K-means clustering 86.4 (online)

2012 An efficient intrusion detection system based (Yinhui Li, 2012)[57]  K-means clustering. DR : 98.6249 40
on support vector machines and  SVM.
gradually feature removal method  Ant colony.
The combined approach for anomaly detection (Bose, 2012)[6]  SOM. DR : 98.5 (Dos) 2
using neural networks & clustering techniques  K-means clustering.
Anomaly Based Intrusion Detection System (Prof. D.P. Gaikwad,  ANN. * 6
Using Artificial Neural Network 2012)[39]  Fuzzy clustering.
and Fuzzy Clustering
2013 Fortification of hybrid intrusion detection (A.M.Chandrashekhar,  Fuzzy C-means clustering. Accuracy : 98.94 (Dos) 2
system using variants of neural networks & 2013)[1]  Fuzzy neural network. 97.11 (Probe)
support vector machines  SVM. 97.80 (U2R)
97.78 (R2L)
Hybrid of fuzzy clustering (Dahlia Asyiqin  Fuzzy clustering. Recall : 99.1 (Dos) 4
Neural network over NSL data set for intrusion Ahmad Zainaddin, 94.1 (Prob)
detection system 2013)[13] 78 (U2R)
89 (R2L)
 c
A hybrid framework based on neural network (Mazyar Mohammadi K-means clustering. DR : 99.99 (Dos)
MLP & K-means clustering for intrusion Lisehroodi, 2013)[33]  MLP 99.97 (Probe)
detection system 99.99 (U2R)
99.98 (R2L)
Advanced probabilistic approach for network (Seongjun Shin,  Markov chain. DR : 90 9
intrusion forecasting and detection 2013)[44]  K-means clustering.
 APAN.
A novel hybrid intrusion detection method (Gisung Kim, A novel  C4.5. DR : 99.98 (with known attack) 9
integrating anomaly detection with misuse hybrid intrusion  1-class SVM. 97.4 (with unknown attack)
detection detection method Training time : 21.375 sec
integrating anomaly Testing time : 10.13 sec
detection with misuse
detection, 2013)[19]
2014 Mining network data for intrusion detection (Wenying Feng,  CSOACN (self organized ant colony DR : 94.86 10
through combining 2014)[54] network) FP : 6.01
SVMs with ant colony networks  SVM FN : 1.00
 CSVAC(combining support vectors with
ant colony)

A new clustering approach for anomaly (Ravi Ranjan,  C4.5. DR : 96.12 (Dos) 4
intrusion detection 2014)[40]  SVM. 90.10 (R2L)
 K-means clustering. 70.51 (U2R)
70.13 (R2L).
Accuracy : 96.38
False alarm rate : 3.2
c
Not cited yet
F. Used Dataset in Researches cup 99, DARPA 1998, DARPA 2000 being considered as
standard datasets for intrusion detection system. DARPA
Datasets are assigned for default tasks e.g., Classification, dataset contains around 1.5 million traffic instances [36].
Regression, Function learning, Clustering. Datasets reviewed NSL-KDD dataset was proposed by removing all redundant
by this paper is for classification purpose. As Fig.4 depicts, by instances from KDD'99. Thus, NSL-KDD dataset is more
far the most common dataset being used is KDD cup 1999 efficient than KDD'99 in getting more accurate evaluation of
dataset. This dataset contains 4,000,000 instances and 42 different learning techniques [19]. Some of the datasets were
attributes. The number of papers using KDD cup 1999 data set randomly used by the researchers. Table VI shows the year-
yields a peak in 2011 and in total 20 research papers has wise distribution of randomly used dataset.
mentioned KDD Cup 1999 as their dataset.
Car evolution dataset [32] contains 1,728 instances with 6 TABLE VI. YEAR-WISE DISTRIBUTION OF RANDOMLY USED DATASET
attributes, attribute types are categorical. Wisconsin Breast Data Set 2009 2010 2011 2012 2013 2014 Total
cancer [16] has multivariate data types, all 10 attributes are
integer types and it has 699 instances. Glass [32] dataset with Car Evaluation 1 1
multivariate data types and 214 instances It has 10 real Glass 1 1
attributes. Mushroom dataset [32] contains 22 categorical DAMADICS 1 1
attributes and 8,124 instances. Lympography dataset [16]
Yeast 1 1
contains 18 categorical attributes and 148 instances. Yeast
dataset [24] have 8 real attributes with 1,484 instances. Fisher- Ionosphere 1 1
Iris dataset [25] contains 4 real attributes with 150
instances.Bicup2006 dataset and CO2 dataset [27] have 1,323 Musk 1 1
and 296 instances respectively. Public datasets like DARPA Malicious Web 1 1 2
pages
1998, DARPA 2000, Fisher-Iris dataset, NSL KDD datasets
Bicup2006 1 1
are used in many related studies. Study also shows that few
CO2 1 1
private or non-public datasets used over the time frame.
Lympography 1 1
Although the study briefly highlights public datasets like KDD

15 | P a g e
www.ijarai.thesai.org
(IJARAI) International Journal of Advanced Research in Artificial Intelligence,
Vol. 4, No.3, 2015

G. Feature Selection yields a peak in the year 2012, where out of 8 papers in that
Feature Selection is an important step for the improvement year 7 used feature selection step. On the contrary, in 2009 the
of the system performance. Feature selection is considered scenario was completely opposite. Though we have taken 49
before the training phase. Feature selection points out the best related papers and number of differences in those papers are
features and eliminates the redundant and irrelevant features. trivial but the comparison result implies that 21 experiments
Table VII shows the year-wise distribution of feature selection used feature selection where 28 experiments did not. It implies
step consideration. Table VII implies that out of 49 studies, 21 that feature selection is not a popular procedure in intrusion
used feature selection step for their proposed architecture. It detection. Table VII and VIII overview the year-wise
also shows that the number of papers using feature selection distribution of feature selection considered in related studies
and the count of paper.
7
6
5
No. Of usage of the dataset

4
3
2
1
0
2009 2010 2011 2012 2013 2014

KDD 99 NSL KDD


Synthetic Dataset using Gausian distribution DARPA 1998
Credit Card Fisher Iris
Wisconsis Breast Cancer Mushroom
DARPA 2000 Network tcpdump

Fig. 4. Distribution of popular datasets over the years

TABLE VII. YEAR-WISE DISTRIBUTION OF FEATURE SELECTION CONSIDERED

Feature Selection Considered 2009 2010 2011 2012 2013 2014 Total

YES 1 3 4 7 4 2 21

NO 7 5 7 1 6 2 28

TABLE VIII. DISTRIBUTION OF RESEARCH PAPERS CONSIDERING THE FEATURE SELECTION STEP

Feature No. of Research papers


Selection research
papers
YES 21 A.m.chandrashekhar, k. (2013)[1]. adel sabryeesa, z. o. (2014)[2]. Akhilesh Kumar Shrivas, A. K. (2014)[3]. Bose, A. A. (2012)[6] Carlos A.
Catania, F. B. (2012)[9]. Inho Kang, M. K. (2012)[26]. Levent Koc, T. A. (2012)[29].M. Bahrololum, E. S. (2009)[30]. Mario Poggiolini, A. E.
(2013)[32]. Min Seok Mok, S. Y. (2010)[34]. Prabhjeet Kaur, A. K. (2012)[38]. S. Devaraju, S. R. (2013)[42]. Seongjun Shin, S. L. (2013)[44].
Shun-Sheng Wang, K.-Q. Y.-C.-W. (2011)[47]. Siva S. Sivatha Sindhu, S. G. (2012)[48]. Su, M.-Y. (2011)[49]. Woochul Shim, G. K. (2010)[55].
Yang Yi, J. W. (2011)[56]. Yinhui Li, J. X. (2012)[57]. Yung-Tsung Hou, Y. C.-S.-M. (2010)[58]. Yusuf Sahin, S. B. (2013)[59].

NO 28 C.A. Laurentys, R. P. (2011)[7] Chengpo Mua, Y. L. (2010)[10] D. Sa´nchez, M. V. (2009)[12] Dahlia Asyiqin Ahmad Zainaddin, Z. M.
(2013)[13]. Dewan Md. Farid, L. Z. (2013)[14] Dewan Md. Farid, M. Z. (2011)[15] Feng Jiang, Y. S. (2009)[16] G. Davanzo, E. M. (2011)[17]
Gang Wang, J. H. (2010)[18] Gisung Kim, S. L. (2013)[19](Ravi Ranjan, 2014)[40] Guillermo L. Grinblat, L. C. (2013)[21] Han-Ching Wu, S.-H.
S. (2010)[22] I.T. Christou, M. B. (2011)[24] Ilhan Aydin, M. K. (2010)[25]. Jun Ma, J. L. (2009)[27] Kamran Shafi, H. A. (2009)[28]Mao Ye, X.
L. (2009)[31]. Mazyar Mohammadi Lisehroodi, Z. M. (2013)[33]. Mohammad Saniee Abadeh, H. M. (2011)[36]. Muna Mhammad T. Jawhar, M.
M. (2010)[37]. Prof. D.P. Gaikwad, S. J. (2012)[39] Seung Kim, N. W.-H. (2011)[45]. Seungmin Lee, G. K. (2011)[46]. Su-Yun Wua, E. Y.
(2009)[50]. Tich Phuoc Tran, L. C. (2009)[52]. Wang Dawei, Z. F. (2011)[53]. Wenying Feng, Q. Z. (2014)[54].

16 | P a g e
www.ijarai.thesai.org
(IJARAI) International Journal of Advanced Research in Artificial Intelligence,
Vol. 4, No.3, 2015

IV. DISCUSSION, FUTURE WORK AND CONCLUSION [14] Dewan Md. Farid, L. Z. (2013). An Adaptive Ensemble Classifier for
Mining Concept-Drifting Data Streams. Expert systems with
Uses of different classifier techniques in intrusion detection Applications,ELSEVIER .
system is an emerging study in machine learning and artificial [15] Dewan Md. Farid, M. Z. (2011). Adaptive Intrusion Detection based on
intelligence. It has been the attention of researchers for a long Boosting and. International Journal of Computer Applications .
period of time. This paper has identified 49 research papers [16] Feng Jiang, Y. S. (2009). Some issues about outlier detection in rough
related to application of using different classifiers for intrusion set theory. expert systems with application,ELSEVIER .
detection published between 2009 and 2014. Though this [17] G. Davanzo, E. M. (2011). Anomaly detection techniques for a web
defacement monitoring service. Expert Systems with
survey paper cannot claim to be an in-depth study of those Applications,ELSEVIER .
studies, but it presents a reasonable perspective and shows a [18] Gang Wang, J. H. (2010). A new approach to intrusion detection using
valid comparison of works in this field over those years. The Artificial Neural Networks and. Expert Systems with
following issues could be useful for future research: Applications,ELSEVIER .
[19] Gisung Kim, S. L. (2013). A novel hybrid intrusion detection method
 Removal of redundant and irrelevant features for the integrating anomaly detection with misuse detection. Expert Systems
training phase is a key factor for system performance. with Applications,ELSEVIER .
Consideration of feature selection will play a vital role [20] Gisung Kim,J.C,S.K. (2012). A congestion-aware IDS node selection
in the classification techniques in future work. method for wireless sensor networks. IJDSN.
[21] Guillermo L. Grinblat, L. C. (2013). Abrupt change detection with One-
 Feature selection has many algorithms to work with. Class Time-Adaptive Support Vector Machines. Expert Systems with
Using different feature selection algorithms and Applications,ELSEVIER .
working with the best possible one will be helpful for [22] Han-Ching Wu, S.-H. S. (2010). Neural networks-based detection of
the classification techniques and also increase the stepping-stone intrusion. Expert Systems with Applications,ELSEVIER
consideration of feature selection step in intrusion .
detection. [23] Haykin, S. (1999). Neural networks: A comprehensive foundation (2nd
Edition). New Jersey: Prentice Hall.
 Uses of single classifiers or baseline classifiers in [24] I.T. Christou, M. B. (2011). Detecting fraud in online games of chance
performance measurement can be replaced by hybrid and lotteries. Expert Systems with Applications,ELSEVIER .
or ensemble classifiers. [25] Ilhan Aydin, M. K. (2010). Chaotic-based hybrid negative selection
algorithm and its applications in fault. expert systems with
REFERENCES applications,ELSEVIER .
[1] A.M.Chandrashekhar, K. (2013). Fortification of hybrid intrusion [26] Inho Kang, M. K. (2012). A differentiated one-class classification
detection system using variants of neural networks & support vector method with applications to intrusion detection. Expert Systems with
machines. International Journal of Network Security & Its Applications Applications,ELSEVIER .
(IJNSA) .
[27] Jun Ma, J. L. (2009). Information inconsistencies detection using a rule-
[2] Adel Sabry Eesa, Z. O. (2014). A novel feature-selection approach based map technique. Expert systems with applications,ELSEVIER .
on the cuttlefish optimization algorithm for intrusion detection systems.
Expert Systems with Applications,ELSEVIER . [28] Kamran Shafi, H. A. (2009). An adaptive genetic-based signature
learning system for intrusion detection. Expert Systems with
[3] Akhilesh Kumar Shrivas, A. K. (2014). An Ensemble Model for Applications, ELSEVIER .
Classification of Attacks with Feature Selection based on KDD99 and
NSL-KDD Data Set. International Journal of Computer Applications . [29] Levent Koc, T. A. (2012). A network intrusion detection system based
on a Hidden Naïve Bayes multiclass classifier. Expert Systems with
[4] Anderson, J. (1995). An introduction to neural networks. Cambridge: Applications,ELSEVIER .
MIT Press.
[30] M. Bahrololum, E. S. (2009). ANOMALY INTRUSION DETECTION
[5] Bernhard E Boser, I. M. (1992). A Training Algorithm for Optimal DESIGN USING. International Journal of Computer Networks &
Margin Classiers. Proceedings of the 5th Annual ACM Workshop on Communications (IJCNC) .
Computational , 144-152.
[31] Mao Ye, X. L. (2009). Projected outlier detection in high-dimensional
[6] Bose, A. A. (2012). THE COMBINED APPROACH FOR ANOMALY mixed-attributes data set. Expert systems with applications,ELSEVIER .
detection using neural networks & clustering techniques. Computer
Science & Engineering: An International Journal (CSEIJ) . [32] Mario Poggiolini, A. E. (2013). Application of the feature-detection rule
to the Negative Selection Algorithm. Expert Systems with
[7] C.A. Laurentys, R. P. (2011). A novel Artificial Immune System for Applications,ELSEVIER .
fault behavior detection. Expert Systems with Applications,ELSEVIER .
[33] Mazyar Mohammadi Lisehroodi, Z. M. (2013). A HYBRID
[8] C.M.Bishop. (1995). Neural networks for pattern recognition. England: FRAMEWORK BASED ON NEURAL NETWORK MLP AND K-
Oxford University. MEANS CLUSTERING FOR INTRUSION DETECTION SYSTEM.
[9] Carlos A. Catania, F. B. (2012). An autonomous labeling approach to Proceedings of the 4th International Conference on Computing and
support vector machines algorithms for network traffic anomaly Informatics, ICOCI 2013 (p. Paper No. 020). Sarawak, Malaysia:
detection. Expert Systems with Applications,ELSEVIER . Universiti Utara Malaysia.
[10] Chengpo Mua, Y. L. (2010). An intrusion response decision-making [34] Min Seok Mok, S. Y. (2010). Random effects logistic regression model
model based on hierarchical. Expert Systems with for anomaly detection. Expert Systems with Applications,ELSEVIER .
Applications,ELSEVIER . [35] Mitchell, T. (1997). Machine learning. New york: MacHraw Hill.
[11] Chih-Fong Tsai, Y.-F. H.-Y.-Y. (2009). Intrusion detection by machine [36] Mohammad Saniee Abadeh, H. M. (2011). Design and analysis of
learning: A review. expert systems with applications,ELSEVIER . genetic fuzzy systems for intrusion detection in. Expert Systems with
[12] D. Sa´nchez, M. V. (2009). Association rules applied to credit card fraud Applications,ELSEVIER .
detection. Expert Systems with Applications,ELSEVIER . [37] Muna Mhammad T. Jawhar, M. M. (2010). Design Network Intrusion
[13] Dahlia Asyiqin Ahmad Zainaddin, Z. M. (2013). HYBRID OF FUZZY Detection System using hybrid. International Journal of Computer
CLUSTERING NEURAL NETWORK OVER NSL DATASET FOR Science and Security.
INTRUSION DETECTION SYSTEM. Journal of Computer Science.

17 | P a g e
www.ijarai.thesai.org
(IJARAI) International Journal of Advanced Research in Artificial Intelligence,
Vol. 4, No.3, 2015

[38] Prabhjeet Kaur, A. K. (2012). MADAM ID FOR INTRUSION Applications,ELSEVIER .


DETECTION USING DATA MINING. International Journal of [49] Su, M.-Y. (2011). Real-time anomaly detection systems for Denial-of-
Research in IT & Management,IJRIM . Service attacks by weighted. Expert Systems with
[39] Prof. D.P. Gaikwad, S. J. (2012). Anomaly Based Intrusion Detection Applications,ELSEVIER .
System Using Artificial Neural Network & Fuzzy clustering. [50] Su-Yun Wua, E. Y. (2009). Data mining-based intrusion detectors.
International Journal of Engineering Research & Technology (IJERT) . Expert Systems with Applications,ELSEVIER .
[40] Ravi Ranjan, G. S. (2014). A NEW CLUSTERING APPROACH FOR [51] Tax, D. &. (1999). Data domain description using support vectors.
ANOMALY INTRUSION DETECTION . International Journal of Data Proceedings of the european symposium on artificial neural networks,
Mining & Knowledge Management Process (IJDKP) . 251-256.
[41] Rhodes, B. M. (2000). Multiple self-organizing maps for intrusion [52] Tich Phuoc Tran, L. C. (2009). Novel Intrusion Detection using
detection. Baltimore, MD. Probabilistic Neural. (IJCSIS) International Journal of Computer
[42] S. Devaraju, S. R. (2013). DETECTION OF ACCURACY FOR Science and Information Security.
INTRUSION DETECTION SYSTEM USING NEURAL NETWORK [53] Wang Dawei, Z. F. (2011). Evolving boundary detector for anomaly
CLASSIFIER. International Journal of Emerging Technology and detection. Expert Systems with Applications.
Advanced Engineering .
[54] Wenying Feng, Q. Z. (2014). Mining network data for intrusion
[43] Sahoo, R. R. (2014). A NEW CLUSTERING APPROACH FOR detection through combining SVMs with ant colony networks. Future
ANOMALY INTRUSION DETECTION. International Journal of Data Generation Computer Systems,ELSEVIER .
Mining & Knowledge Management Process (IJDKP) .
[55] Woochul Shim, G. K. (2010). A distributed sinkhole detection method
[44] Seongjun Shin, S. L. (2013). Advanced probabilistic approach for using cluster analysis. Expert Systems with Applications,ELSEVIER .
network intrusion forecasting and detection. Expert Systems with
Applications,ELSEVIER . [56] Yang Yi, J. W. (2011). Incremental SVM based on reserved set for
network intrusion detection. Expert Systems with Applications .
[45] Seung Kim, N. W.-H. (2011). Fast outlier detection for very large log
data. Expert Systems with Applications,ELSEVIER . [57] Yinhui Li, J. X. (2012). An efficient intrusion detection system based on
support vector machines and gradually feature removal method. Expert
[46] Seungmin Lee, G. K. (2011). Self-adaptive and dynamic clustering for Systems with Applications,ELSEVIER .
online anomaly detection. Expert Systems with
Applications,ELSEVIER. [58] Yung-Tsung Hou, Y. C.-S.-M. (2010). Malicious web content detection
by machine learning. expert systems with applications,ELSEVIER .
[47] Shun-Sheng Wang, K.-Q. Y.-C.-W. (2011). An Integrated Intrusion
Detection System for Cluster-based Wireless. Expert Systems with [59] Yusuf Sahin, S. B. (2013). A cost-sensitive decision tree approach for
Applications. fraud detection. Expert Systems with Applications,ELSEVIER.
[48] Siva S. Sivatha Sindhu, S. G. (2012). Decision tree based light weight [60] Zimmermann, H.-J. (2010). Fuzzy set theory. Advanced Review John
Wiley & Sons, Inc
intrusion detection using a wrapper approach. Expert Systems with

18 | P a g e
www.ijarai.thesai.org

You might also like