0% found this document useful (0 votes)
61 views5 pages

Inductive Intrusion Detection in Flow-Based Network Data Using One-Class Support Vector Machines

Winter

Uploaded by

Heath Collins
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
61 views5 pages

Inductive Intrusion Detection in Flow-Based Network Data Using One-Class Support Vector Machines

Winter

Uploaded by

Heath Collins
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

Inductive Intrusion Detection in Flow-Based

Network Data using One-Class


Support Vector Machines
Philipp Winter, Eckehard Hermann, Markus Zeilinger
Department of Secure Information Systems
Upper Austria University of Applied Sciences
4232 Hagenberg / Softwarepark 11, Austria
{philipp.winter, eckehard.hermann, markus.zeilinger}@fh-hagenberg.at

Abstract—Despite extensive research effort, ordinary anomaly evaluates the proposed approach and discusses the results.
detection systems still suffer from serious drawbacks such as Finally, Section VI provides a conclusion and discusses future
high false alarm rates due to the enormous variety of network work.
traffic. Also, increasingly fast network speeds pose performance
problems to systems which base upon deep packet inspection. II. R ELATED W ORK
In this paper, we address these problems by proposing a novel
inductive network intrusion detection system. The system oper- Gao and Chen designed and developed a flow-based intru-
ates on lightweight network flows and uses One-Class Support sion detection system in [1]. Karasaridis et al. [2], Shahrestani
Vector Machines for analysis. In contrast to traditional anomaly et al. [3] and Livadas et al. [4] proposed a concept for the
detection systems, the system is trained with malicious rather
than with benign network data. The system is suited for the load detection of botnets in network flows. Finally, in [5] Sperotto
of large-scale networks and is less affected by typical problems et al. provided a comprehensive survey about current research
of ordinary anomaly detection systems. in the domain of flow-based network intrusion detection.
Evaluations brought satisfying results which indicate that A sound evaluation of a NIDS is a nontrivial task and
the proposed approach is interesting for further research and requires high-quality training and testing sets. Unfortunately,
perfectly complements traditional signature-based intrusion de-
tection systems. the de facto standard is still the DARPA data set created by
Lippmann et al. in [6]. Despite its severe weaknesses and the
Keywords-network intrusion detection; machine learning; sup- critique published by McHugh [7] it is still used. The KDD
port vector machine; netflow;
Cup ’99 data set can be regarded as another popular data set
[8]. Finally, Sperotto et al. contributed the first labeled flow-
I. I NTRODUCTION
based data set [9] intended for evaluating and training network
In the area of network intrusion detection, the research intrusion detection systems. This data set is used in this paper.
community usually focuses on either misuse or anomaly de- Finally, in [10] Gates and Taylor examine whether the es-
tection systems. While the former is meant to detect precisely tablished anomaly detection paradigms are still valid. Similar
specified attack signatures, the latter is supposed to detect pat- work is contributed by Sommer and Paxson in [11]. They
terns deviating from normal network operations. Both concepts point out why anomaly detection systems are hardly used in
exhibit disadvantages. Misuse detection systems struggle with operational networks.
steadily increasing network speeds and attack signatures while
anomaly detection systems suffer from high false alarm rates III. P ROPOSED A PPROACH
and the lack of representative training data, just to name a few. We decided to train the NIDS with malicious network data
The network intrusion detection system (NIDS) proposed in only. This approach is diametric to the way ordinary anomaly
this paper embraces concepts of both worlds. It operates on detection systems work. As pointed out in [11], machine
network flows rather than on entire network packets. Incoming learning methods perform better at recognizing inliers than
flows are analysed using One-Class Support Vector Machines at detecting outliers. Further advantages of this approach are
(OC-SVM). The learning algorithm is trained solely with discussed in Section V-B.
malicious rather than with benign data. By this means, the In short, the proposed NIDS receives network flows and
NIDS is supposed to recognize previously learned attacks analyzes them with a OC-SVM. The following two sections
including attack variations instead of detecting anomalies. The briefly introduce the concepts behind network flows and OC-
proposed concept is entitled “inductive NIDS”. SVMs respectively.
The remainder of this paper is organized as follows. Section
II gives a brief overview of similar contributions. Section A. Network flows
III introduces the proposed approach. Section IV describes A set of unidirectional network packets sharing certain
the process of model and feature selection while Section V characteristics together form a so called network flow. This set

978-1-4244-8704-2/11/$26.00 ©2011 IEEE


Protocol Distribution
Flow
Training Set
Report Flow FTP 13
Malicious IRC 7.383

Preprocessing One-Class SVM HTTP 9.798

OTHER 18.970
Not Malicious
AUTH/IDENT 191.339
Drop Flow
SSH 13.942.629

0 2 4 6 8
10 10 10 10 10
Fig. 1. The high-level view on the proposed approach for an inductive NIDS. Amount of Flows

Fig. 2. The protocol distribution of the original training set.

of characteristics is defined as {IPsrc , IPdst , Portsrc , Portdst ,


IP Protocol}, i.e., source and destination IP addresses, ports
and the IP protocol number. Additional information can be Figure 1 also shows the training set which is used in order
derived such as the amount of packets or bytes transferred to train the OC-SVM. As already mentioned, the set contains
in a flow. All this information is summed up to form a flow only malicious flows. By this means, the OC-SVM is only
record (often just referred to as flow) consisting of several aware of how malicious flows look like. Benign flows are
flow attributes. This flow record is then sent from a router to supposed to be classified as outliers, accordingly. The structure
a so called flow collector. Network flows only provide meta of the malicious training set is covered in Section III-D.
information and do not carry any packet payload. We decided In addition to the malicious set, a data set consisting only
to use NetFlow in version 5, the most popular and widespread of benign flows was created. This set is not part of Figure 1,
protocol for dealing with network flows. though. The following two sections now discuss how and why
Several reasons led to the decision of working with network both sets are generated.
flows: The highly lightweight nature, the broad availability
D. The malicious data set
(only a NetFlow-enabled router is necessary) and the pos-
sibility to analyze even encrypted network traffic. A more The training set for the OC-SVM is a subset of the data
comprehensive discussion of the advantages of flow-based set contributed by Sperotto et al. [9]. The authors created
network intrusion detection is provided in [5]. the set by setting up a honeypot which was exposed to the
Internet for 6 days. The honeypot featured services for three
B. One-Class Support Vector Machines protocols: HTTP, SSH and FTP. The authors identified attacks
OC-SVMs as proposed by Schölkopf et al. [12] are an un- by monitoring the log files of these three services. That way,
supervised learning method. Roughly speaking, unsupervised the authors gathered around 14 million flows which are mostly
learning methods are aware of only a single class of data: of malicious nature. For the protocol and attack semantics, it
OC-SVMs distinguish between vectors which are referred to is referred to the original contribution [9].
as either in-class (inside the trained distribution) or outliers Figure 2 illustrates the protocol distribution of the set. By far
(outside the distribution). In the context of network intrusion the most flows have been collected for SSH. The 13 collected
detection, a OC-SVM as used in this paper distinguishes FTP flows turned out to be not malicious. Furthermore,
between “malicious” (in-class) and “not malicious” (outlier) the collection process yielded flows belonging to IRC and
network data. We decided to use the OC-SVM implementation AUTH/IDENT. According to the authors, these flows can be
provided by the popular library libsvm [13]. considered as side effect traffic and are not malicious per se. So
The choice fell on OC-SVMs for the following reasons: the IRC flows are not part of botnet activity. Rather, an attacker
First of all, SVMs combine accurate detection rates with just installed an IRC server on the honeypot. Thus, malicious
acceptable training time. Also, by the use of so called kernels flows in the original training set are effectively limited to SSH
SVMs are able to perform nonlinear classification. Finally, and HTTP since there were no malicious FTP flows.
researchers already achieved promising results with SVMs in Figure 3 illustrates the type of attacks present in the original
intrusion detection [14]. training set. The most interesting attacks from the perspective
of a network operator are manual and succeeded attacks. These
C. High-level view account for only 6 and 144 flows, respectively.
Figure 1 provides a high-level view on the proposed NIDS. A reduction of the original set is necessary for two reasons:
All flows received by the NIDS are first preprocessed. This First, the training of 14 million flows would take a vast amount
step involves scaling the flow to a predefined numeric range of time. Second, the original set holds many flows which
and selecting only the relevant flow attributes (see Section IV). should not be trained such as side effect traffic which is not
Afterwards, the scaled flow is passed on to the analysis en- malicious per se.
gine. This engine makes use of a OC-SVM which classifies the The reduction consists of three steps: 1) selecting only the
incoming flow as either malicious (in-class) or not malicious relevant flow attributes, 2) deleting irrelevant flows and 3) per-
(outlier). In the latter case the flow is ignored while in the forming random sampling to gain a smaller, yet representative
former case the flow is to be reported to the network operator. subset:
Attack Distribution Malicious Data Set

Automated 14.156.775
Training and Validation Set Testing Set
Manual 6

Failed 26.694
Validation Set Testing Set
Succeeded 144

0 2 4 6 8
10 10 10 10 10 Benign Data Set
Amount of Flows

Fig. 3. The attack types of the original training set. Fig. 4. Partitioning of the malicious and the benign data set.

added to the testing set. The remainder of both sets was chosen
1) The IP addresses of the original data set were discarded
to be the validation set, respectively.
since they have been anonymized. Furthermore, all time
information was derived to a single attribute entitled “du- IV. M ODEL AND F EATURE S ELECTION
ration”. After all, the following flow attributes remained The sole purpose of model and feature selection is the
in the reduced training set: packets/flow, octets/flow, optimization of the classifiers classification capability, i.e.,
duration per flow, source port, destination port, TCP the best possible adaption of the classifier to the custom
flags, IP protocol. classification problem.
2) The deletion of irrelevant flows followed these rules: Feature selection is the process of selecting a subset of
First of all, all 5.968 unlabeled flows of the original data features out of the original feature set. Within the scope of
set (i.e. flows whose nature could not be verified) were this paper this means that the original feature set F consisting
deleted. Next, all flows belonging to protocols other of the seven flow attributes described in Section III-D is to be
than SSH and HTTP were deleted; 215.123 flows were reduced to a subset S where S ⊆ F.
affected by this deletion. Model selection or optimization is the process of finding
3) Random sampling is necessary because the intermediate parameters for the respective machine learning method which
data set after step 1) and 2) still contained the vast lead to optimal classification results. OC-SVMs require the
amount of almost 13 million flows. Training that many parameter ν, {ν ∈ R | 0 < ν ≤ 1} which controls the
flows would require an unacceptably high training time. fraction of the outliers in the training set. For ν, we tested
The random sampling process selected every flow of the the values {0.001, 0.201, 0.401, 0.601, 0.801}. This linearly
intermediate data set with a probability of 1/600. Finally, increasing set was chosen to approximately cover the possible
this step yielded a total of 22.924 flows. These flows now range for ν. Furthermore, a radial basis function (RBF) kernel
represent the reduced training set which will be used for was chosen for the OC-SVM due to its general applicability
model and feature selection in Section IV. [18]. The RBF kernel requires a parameter γ, {γ ∈ R | 0 ≤ γ}
which specifies the width of the RBF kernel. For γ, the
E. The benign data set values {0.1, 0.3, 0.5, 1, 2, 5, 10} are tested. This set increases
A data set holding only benign flows was created from nonlinearly and is also supposed to cover a small but realistic
scratch. The purpose of this set is to aid in the process of model range for γ.
and feature selection (see Section IV). In the end it is also used So, roughly speaking model selection is the task of deter-
to evaluate the NIDS (see Section V). This set was created by mining the best tradeoff between ν and γ for the OC-SVM.
manually generating network traffic inside a virtual machine A. Joint optimization
based on grml linux [15]. All the network traffic was captured Model and feature selection was regarded as one joint rather
with tcpdump [16] and then converted to flow format using than two independent optimization problems. As pointed out
softflowd [17]. The benign data set embraced flows belonging in [19] and [20], this methodology can lead to better results.
to the following protocols: HTTP, SSH, DNS, ICMP and FTP. As argued by the authors of [11], limiting the false alarm
Overall, the set consists of 1.904 flows. rate of an anomaly detection system should have top priority.
Hence, our proposed NIDS was optimized with respect to its
F. Data set partitioning false alarm rate.
Preliminary to the process of model and feature selection, The small search space enables the use of the popular grid
the malicious and benign data sets were further divided search method for joint optimization. In the scope of this
into a so called testing and validation set, respectively. The paper, the grid spans over three dimensions:
partitioning is illustrated in Figure 4. 1) The feature subset consisting of 27 − 1 possible subsets.
The two validation sets are only used for model and The exponent stands for the 7 feature candidates, i.e.
feature selection whereas the testing sets are used for the final flow attributes.
evaluation described in Section V. One-third of the malicious 2) The model parameter ν for which 5 values are tested as
set and half of the benign set were randomly sampled and discussed in the previous section.
TABLE I TABLE II
R ESULTS OF THE COARSE GRAINED OPTIMIZATION . R ESULTS OF THE FINE GRAINED OPTIMIZATION .

False alarm rate Miss rate γ ν Feature subset False alarm rate Miss rate γ ν
0% 22.53807% 0.1 0.201 SP, DP, TF, PR 0% 4.71376% 0.26 0.02
0% 22.53807% 0.1 0.201 SP, DP, TF 0% 4.72689% 0.25 0.02
0% 22.61685% 0.3 0.201 SP, DP, TF, PR 0% 4.74658% 0.29 0.02
0% 22.61685% 0.3 0.201 SP, DP, TF 0% 4.75315% 0.27 0.02
0% 22.66281% 0.1 0.201 P, SP, DP, TF, PR 0% 4.76628% 0.28 0.02

Miss Rate with varying ν and γ


40

3) The kernel parameter γ for which 7 values are tested. 35


50
Overall, the grid holds a total of 127 * 5 * 7 = 4.445 points. 30

For each grid point, an 8-folded1 cross-validation is performed


40

25

Miss Rate (%)


to determine the error rates consisting of the false alarm rate 30

and the miss rate. 20


20

The optimization was divided into two consecutive steps. 10 15

1) The first step is entitled “coarse grained optimization”. 0


0.39 0.29
10

0.33 0.25
It is meant to determine the best feature subset as well 0.25
0.17 0.13
0.17
0.21
5

as the best tradeoff between ν and γ. ν


0.09
0.01 0.01
0.05
0.09

γ
2) Afterwards, “fine grained optimization” is performed.
This step bases upon the results of the coarse grained Fig. 5. Relationship between the varying parameters ν and γ and the miss
optimization and further explores the surrounding region rate during the fine grained grid search.
of the best ν/γ-combination to achieve even better
results.
optimization came up with ν = 0.201 and γ = 0.1, the set
B. Coarse grained optimization to be further explored starts with 0.01 for ν and γ. The set
ends with 0.29 for ν and 0.39 for γ. For both parameters,
The best five results of the coarse grained optimization
a step size of 0.01 is chosen. After all, for γ the following
with respect to the false alarm rate are listed in Table I. The
set is tested: {0.01, 0.02, ..., 0.28, 0.29} whereas for ν the set
abbreviations in the last column are as follows: P = packets
contains: {0.01, 0.02, ..., 0.38, 0.39}. With 39 values in the
per flow, SP = source port, DP = destination port, TF = TCP
set for ν and 29 values in the set for γ, the fine grained
flags and PR = IP protocol.
optimization has to test 29 ∗ 39 = 1.131 points. Again, for
The results reveal that none of the best five combinations
each point an 8-folded cross-validation is executed.
causes any false alarms. The best one scores only marginally
The best five results of the fine grained optimization are
better than the remaining four. For all five points, the model
listed in Table II. All results yield a miss rate of around 4.7%
parameter ν is 0.201. The parameter γ, on the other hand
with a false alarm rate of still 0%. The parameter ν is 0.02
exhibits some variance in the range between 0.1 and 0.3.
for all results whereas γ varies between 0.25 and 0.29.
Since all of the five points yield an identical false alarm
rate, the point with the lowest miss rate is chosen for further The fine grained optimization was able to significantly lower
optimization. In fact, the first two points of Table I have the error rates of the coarse grained optimization. The miss rate
identical error rates. The difference is that one of these points was originally around 22.5% and could be lowered to 4.7%.
contains the IP protocol feature whereas the other does not. Figure 5 illustrates how the miss rate on the Z-axis varies
We chose to take the one including this feature since we during fine grained optimization. One can clearly see the
believe that the additional feature enhances the generalization correlation between ν and the miss rate. The range of the
capability of the OC-SVM. parameter γ, on the other hand, hardly influences the miss
rate.
C. Fine grained optimization
V. E VALUATION AND D ISCUSSION
The coarse grained optimization resulted in the point char-
acterized by γ = 0.1, ν = 0.201 and the feature subset holding After model and feature selection determined the best set
source port, destination port, TCP flags and IP protocol. of features and model parameters, it is crucial to test the final
As already mentioned, this feature subset is not changed model by making use of the dedicated testing set created in
anymore. Hence, only ν and γ remain for further optimization. Section III-F.
The fine grained testing range for ν and γ is chosen to lie
A. Testing set prediction
between the nearest coarse grained values. Since the previous
For the purpose of testing, the OC-SVM is trained with the
1 The fold size was chosen to be a multiple of the four CPU cores. malicious training set (see Figure 4), an RBF kernel and the
TABLE III
P REDICTION OF THE TESTING SETS . multiple representative and realistic data sets collected “in the
wild”.
Benign set Malicious set The source code and the prepared data sets are available
Type Predicted Actual Predicted Actual upon email request to the authors.
In-class 0 (0%) 0 7.540 (98.07492%) 7.688 ACKNOWLEDGMENT
Outlier 942 (100%) 942 148 (1.92507%) 0 We want to thank the anonymous reviewers who provided
us with many helpful comments.

parameters ν = 0.02 and γ = 0.26. Only the features source R EFERENCES


port, destination port, TCP flags and IP protocol are used for [1] Y. Gao, Z. Li, and Y. Chen, “A DoS Resilient Flow-level Intrusion De-
training. tection Approach for High-speed Networks,” in Proc. of the 26th IEEE
International Conference on Distributed Computing Systems (ICDCS
Table III lists the results of the prediction of both testing ’06), Washington, DC, USA, 2006, p. 39.
sets. None of the benign flows was predicted as in-class, i.e., [2] A. Karasaridis, B. Rexroad, and D. Hoeflin, “Wide-scale botnet detection
malicious. Rather, the OC-SVM correctly classified all of the and characterization,” in Proc. of the first conference on Hot Topics in
Understanding Botnets (HotBots ’07), Berkeley, CA, USA, 2007, p. 7.
benign flows as outliers. This corresponds to a false alarm rate [3] A. Shahrestani, M. Feily, R. Ahmad, and S. Ramadass, “Architecture for
of 0%. Furthermore, around 98% of the flows were correctly Applying Data Mining and Visualization on Network Flow for Botnet
predicted as in-class. Almost 2% were classified as outliers, Traffic Detection,” in Proc. of the International Conference on Computer
Technology and Development (ICCTD ’09), Washington, DC, USA,
i.e., not malicious. This corresponds to a miss rate of 2%. 2009, pp. 33 – 37.
[4] C. Livadas, R. Walsh, D. Lapsley, and W. T. Strayer, “Using Machine
B. Discussion Learning Techniques to Identify Botnet Traffic,” in 2nd IEEE LCN
Although the results of the evaluation seem highly promis- Workshop on Network Security (WoNS ’06), 2006, pp. 967 – 974.
[5] A. Sperotto et al., “An Overview of IP Flow-Based Intrusion Detection,”
ing at first glance, they have to be viewed critically. We IEEE Communications Surveys Tutorials, vol. 12, no. 3, pp. 343 – 356,
identified the following drawbacks in our approach. 2010.
First of all, the OC-SVM appears to be highly dependent on [6] R. P. Lippmann et al., “Evaluating Intrusion Detection Systems: The
1998 DARPA Off-line Intrusion Detection Evaluation,” in Proc. of the
source and destination port. This assumption is affirmed by the DARPA Information Survivability Conference and Exposition, 2000, pp.
fact that evaluation results are significantly worse if these two 12 – 26.
features are omitted. In this case, the false alarm rate increases [7] J. McHugh, “Testing Intrusion detection systems: a critique of the 1998
and 1999 DARPA intrusion detection system evaluations as performed
to 0.7% and the miss rate increases to even 81.1%. by Lincoln Laboratory,” ACM Transactions on Information and System
Another drawback lies in the fact that the training set is Security, vol. 3, no. 4, pp. 262 – 294, 2000.
effectively limited to attacks for SSH and HTTP. Also, the [8] “KDD Cup 1999: General Information,” 1999, http://www.sigkdd.org/
kddcup/index.php?section=1999&method=info.
largest fraction in the training set is scan traffic. In fact, many [9] A. Sperotto, R. Sadre, F. Vliet, and A. Pras, “A Labeled Data Set for
network operators deem probing activity to be uninteresting Flow-Based Intrusion Detection,” in Proc. of the 9th IEEE International
but it can be an indicator for botnet infections or other Workshop on IP Operations and Management (IPOM ’09), Berlin,
Heidelberg, 2009, pp. 39 – 50.
compromised hosts when coming from “the inside”. [10] C. Gates and C. Taylor, “Challenging the anomaly detection paradigm: a
On the other hand, the proposed inductive NIDS embraces provocative discussion,” in Proc. of the 2006 workshop on New security
several important advantages. First of all, a once trained OC- paradigms (NSPW ’06), New York, NY, USA, 2007, pp. 21 – 29.
[11] R. Sommer and V. Paxson, “Outside the closed world: On using
SVM can be distributed to multiple networks without a prior machine learning for network intrusion detection,” in IEEE Symposium
learning phase as often required by anomaly detection sys- on Security and Privacy (S&P), May 2010, pp. 305 – 316.
tems. The use of NetFlow alleviates the monitoring of large- [12] B. Schölkopf et al., “Estimating the Support of a High-Dimensional
Distribution,” Neural Computation, vol. 13, no. 7, pp. 1443 – 1471,
scale networks and allows the analysis of encrypted network 2001.
connections. Finally, we believe that training malicious rather [13] C.-C. Chang and C.-J. Lin, LIBSVM: a library for support vector
than benign network traffic keeps the false alarm rate low machines, 2001, software available at http://www.csie.ntu.edu.tw/∼cjlin/
libsvm.
and allows the detection of attack variations. However, further [14] L. Khan, M. Awad, and B. Thuraisingham, “A new intrusion detection
research will be necessary to prove these assumptions. system using support vector machines and hierarchical clustering,” The
VLDB Journal, vol. 16, no. 4, pp. 507 – 521, 2007.
VI. C ONCLUSIONS AND F UTURE W ORK [15] “Grml,” http://www.grml.org.
[16] “tcpdump / libpcap,” http://www.tcpdump.org.
We introduced a novel concept for an inductive NIDS. The [17] “softflowd,” http://www.mindrot.org/projects/softflowd/.
NIDS operates on network flows and makes use of One-Class [18] C.-W. Hsu, C.-C. Chang, and C.-J. Lin, “A Practical Guide to Support
Support Vector Machines for analysis. The NIDS is designed Vector Classification,” 2000.
[19] Q.-Z. Yao, J. Cai, and J.-L. Zhang, “Simultaneous Feature Selection
for recognizing attacks and their variations rather than for and LS-SVM Parameters Optimization Algorithm Based on PSO,” in
detecting deviations of normal traffic. Proc. of the WRI World Congress on Computer Science and Information
The results of the evaluation suggest that the proposed Engineering (CSIE ’09), Washington, DC, USA, 2009, pp. 723 – 727.
[20] S.-W. Lin, T.-Y. Tseng, S.-C. Chen, and J.-F. Huang, “A SA-Based
approach yields quite satisfying results although there is much Feature Selection and Parameter Optimization Approach for Support
room for further testing and improvement. Vector Machine,” in IEEE International Conference on Systems, Man
Predominantly, future work will concentrate on enhancing and Cybernetics (SMC ’06), vol. 4, 2006, pp. 3144 – 3145.
both data sets. A sound and comprehensive evaluation requires

You might also like