0% found this document useful (0 votes)
137 views8 pages

Anomaly Detection by Using CFS Subset and Neural Network With WEKA Tools

Uploaded by

Shahid Azeem
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
137 views8 pages

Anomaly Detection by Using CFS Subset and Neural Network With WEKA Tools

Uploaded by

Shahid Azeem
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

Anomaly Detection by Using CFS Subset

and Neural Network with WEKA Tools


J. Jabez, S. Gowri, S. Vigneshwari, J. Albert Mayan
and Senduru Srinivasulu

Abstract An intrusion detection system (IDS) is a product application or contrap-


tion that screens the framework or system practices for methodology encroachment
or dangerous activities and makes reports to the organizational framework. The prin-
cipal centralization of intrusion discovery and aversion frameworks (IDPS) is to
perceive the possible events, information logging about them, and interruption tries
to report. Furthermore, the associations are utilizing IDPS for different purposes, such
as recognizing issues identified with approaches of security, recording, and keeping
the people existing dangers from encroaching arrangements of security. In this paper,
anomaly is identified utilizing enhanced correlation-based feature selection (CFS),
which is basically a subset technique and is based upon extreme learning machine,
multilayer perceptron, and feature selection. This project scope involves identifica-
tion of anomalies in the early stages and to increase the accuracy of identification or
detection.

Keywords IDS · Feature selection · Anomaly · Multilayer perceptron · EML

1 Introduction

In today’s scenario, there are number of activities which are malicious and are present
in the system. The intrusion detection system (IDS) recognizes malicious activities

J. Jabez (B) · S. Gowri · S. Vigneshwari · J. Albert Mayan · S. Srinivasulu


School of Computing, Sathyabama Institute of Science & Technology, Chennai, Tamil Nadu, India
e-mail: [email protected]
S. Gowri
e-mail: [email protected]
S. Vigneshwari
e-mail: [email protected]
J. Albert Mayan
e-mail: [email protected]
S. Srinivasulu
e-mail: [email protected]
© Springer Nature Singapore Pte Ltd. 2019 675
S. C. Satapathy and A. Joshi (eds.), Information and Communication Technology
for Intelligent Systems, Smart Innovation, Systems and Technologies 107,
https://doi.org/10.1007/978-981-13-1747-7_66
676 J. Jabez et al.

inside and outside of the system. Securing systems from interruptions or assaults are
getting harder day by day as the intrusions are highly advanced and growing very
fast in the networks. The odds of information loss, hacking, and interruption have
been increasing with more users of the Internet [1–4].
The alertness created due to integration of networks helps in decreasing the dam-
age if and when detected or needed [5]. Multilayer perceptron approach, to enhance
the distinguishing proof precision for low visit ambushes and area security, has got
two stages, for instance, preparing with normal huge datasets and testing with inter-
ruption datasets [6, 7]. Important archetype of machine learning is neural network
(NN) and to conclude complicated real-time issues which is enforced in IDS [8, 9].
But two features of network-based IDS that make it futile are (i) lesser preciseness in
detection, mainly in case of low-frequency attacks and (ii) poor cohesion of anomaly
detection.

2 Literature Survey

This segment clarifies the endeavors done in the territory of network-based IDS
(NIDS), and the greater part of the detection works depended on KDD dataset. An
expert system in view of principles and factual methodologies are the two noteworthy
methodologies generally used to guarantee interruption detection [10, 11].
The detection rate of the attack remains at 78% while the rate detection of other
Haystack [12, 13] later built up a system to evaluate an intrusion detection strategy in
light of user and abnormality techniques. Six types of intrusions are disguise assaults,
unapproved client’s break-ins endeavor, vindictive utilization, spillage, benefit dis-
avowal, and access control of security system. The run of the mill profile results
in exploring the call successions between interruption discovery and confirmation
against human system.
An ambush in this structure is considered as the grouping deviation from average
profile succession. Thus, this structure works detached using effectively assembled
information and executes view table algorithm (VTA) for learning program profiles
basically [14, 15].

3 Intrusion Detection System (IDS)

Intrusion acknowledgment is the best approach to check and researching the exer-
cises occurring in a system or network structure with a particular true objective to
recognize signs of security issues. There are two key systems of IDS: anomaly loca-
tion and abuse acknowledgment. Anomaly location tries to recognize lead that does
not fit in with a run of the mill direct, misuse acknowledgment attempts to organize
illustrations and signs of certainly known assaults in the traffic of the system. Basic
usefulness of IDS is to go about as a detached alarming system. The intrusion is
Anomaly Detection by Using CFS Subset … 677

Fig. 1 Proposed system architecture

distinguished the IDS produces an alert and gives all the pertinent data (time, IP
bundles, and so on) that set off the caution [7, 8, 16].
Our principal point is to create intrusion detection system (IDS) in light of anomaly
location display that would be exact, difficult to cheat by the little varieties in designs,
low in false cautions, versatile and is continuous. Figure 1 depicts the proposed system
design where the intrusion bundles are gotten from the Web. At first, the highlights
are extricated from information parcels and after that sent to our proposed IDS [10].
At that point, proposed IDS figures the separation between the removed highlights
and prepared model. Here, the prepared model consists of enormous datasets with the
dispersed capacity condition to enhance the execution of intrusion detection system.
Subsequently, the exception esteem is more prominent than the predetermined limit,
and then it produces the false alarm.
WEKA Tool. WEKA means Waikato Environment for Knowledge Analysis; it is
a Java-based program and is preferred machine learning software. WEKA tool is a
freely accessible programming. It is supporting many several data mining standards
like clustering, data preprocessing, regression, feature selection, visualization, and
678 J. Jabez et al.

classification [11]. The WEKA allows in finding out the hidden information or data
from the file systems and database with the use of visual interface and simple options.
Correlation-based feature selection (CFS): CFS is one of the most straightforward
component determination strategies. It depends on the presumption that features are
restrictively free given the class; this includes subset which is utilized to assess
the given hypothesis [12]. Good component subset is one that contains exceedingly
associated within a given class, and yet it is uncorrelated with each other. One of the
benefits of CFS is that of an algorithm based upon filters, which makes it significantly
quicker in comparison with a wrapper selection method as it does not have to create
learning algorithms.

4 Proposed Algorithm

To overcome the existing problem, we proposed some novel technique as CFS sub-
set algorithm and neural network with WEKA tools. The CFS subset is selecting the
most frequent and important technique characteristics. The selection of characteristic
is for identifying and removing the unnecessary and inappropriate characteristics.
The measurement of the characteristic and attribute is very coefficient.
CFS Subset Algorithm. The selection of the feature is a process that allows selecting
the relevant feature in real subset. The selection of the characteristic is most frequent
and important technique in the field of data preprocessing in mining of the data. The
selection of characteristic is to identify and remove the unnecessary and inappro-
priate characteristics [17]. There are two types of learning process—supervised and
unsupervised learning—and this feature could be applied in both learning methods.
The characteristics subset of the optimality is getting measured by criteria of eval-
uation. The dimension of the domain is expanding in N number of characteristics.
Finding a subset of optimal characteristic is generally inflexible, and many other
issues relevant to the selection of characteristic have been displayed to the NP-hard.
A general selection process in the characteristic consists of some stages that are i.
the generation of subset, ii. evaluation of subset, iii. stopping criterion, iv. validation
of result.
Another technique is neural network, where three features have been used that is
multilayer perceptron, logistic regression, and extreme learning machine where the
multilayer perceptron (MLP) has been used for the training of the neural network
[13]. The logistic regression is also known as the analysis of regression that is being
in use for the outcome prediction of categorical dependent variable on the basis
of predictor variables. It is being in use for the estimation of parameters empirical
values in the model of qualitative response. It also measures the connection among
the independent variables and dependent variables. It could be the multinomial or
binomial. A well-known measurement attribute is the linear correlation coefficient
for which the formula is given below
Anomaly Detection by Using CFS Subset … 679
  
N XY − X − Y
Correlation(r)      2  2 (1)
N X2 − X2 N Y − Y

H(Y)  − p(y) log(p(y)) (2)
yRy
 
H(Y/X)  − p(x) p(y/x) log(p(x/y)) (3)
yRy yRy

H(Y) − H(Y/X)
C(Y/X)  (4)
H(Y)

where X and Y are the two features/attributes.


The multilayer perceptron (MLP) is using the back-propagation that learning
by the set of weights for predicting the label of class, where the label of class is
attacking on every connection. For the better result, we reduce the training time of
neural network and consider about the size of input to keep it small.

4.1 Algorithm for MLP

Step 1 Provide the data of input that should be in relation to the attribute file format,
and we are using a toolbox named as WEKA over the MLP for calculating
the every input activation, as the name ‘a’ and ‘u’.
Step 2 Calculate
 the every tuples by using the given formula. i (t) 
di (t) − yi (t) g (ai (t)).
Step 3 The derivatives of back-propagate
 get the errors for the hidden layers by
using this formula ∂i (t)  g ui (t) k k (t)wki .
Step 4 Calculate updated weight using:

vij (t + 1)  vij (t) + η ∂i (t)xj (t)


wij (t + 1)  wij (t) + η ∂i (t)zj (t)

5 Results and Discussion

In our study, a dataset is extracted and a number of experiments are performed based
on the dataset in order to measure the IDS performance. Experiments were carried out
based on the following configuration: Windows 7, Intel Pentium (R), CPU G2020,
and processor speed 2.90 GHz, respectively.
The extracted dataset includes trained data of about two thousand connection
records and test data includes five thousand connection records. In addition, dataset
includes a group of forty-one derived features received from every connection and
680 J. Jabez et al.

Fig. 2 Big dataset size versus execution time

also a group of labels that identifies the connection record status whether it is a
normal type or attacked type. Features of symbolic variables, discrete features, and
continuous features fall into four specific groups: 1. First group includes common
features of TCP connection, which includes intrinsic features, connection duration,
type of network service (telnet, http), and protocol type (UDP, TCP). 2. This group
suggests the content features inside the connection to represent the domain knowl-
edge, and it is used to estimate the payload content of the TCP packets (like number
of login failed attempts). 3. The similar feature of host examines the established
connection in the previous two seconds, which is having the identical target host as
existing connection, and the estimation of the statistics is relevant to protocol service,
behavior, etc. 4. The similar features of the services examine the connection having
same services in last two seconds same as the existing services.
Figure 2 shows the overview of various execution times with various size of
dataset. The proposed intrusion detection system takes less execution time at every
level rather than other existing machine learning approaches. This is because of the
less trained datasets. The distance computation is easy between the trained and testing
dataset, respectively.
Figure 3 shows the anomaly detection rate in the computer network. The proposed
intrusion detection system identifies almost all type of attacks such as Probe, DoS,
U2R, and R2L. The anomaly detection rate depends on the outlier values testing
data. If the propagation value increases, then the dataset assumed will act as intrusion
dataset.
Figure 4 shows the graphical comparison of CPU utilization levels with various
sizes of datasets. In the machine learning approaches, CPU utilization is very high
when compared with proposed approach. Most of the research papers have assigned
machine learning approaches only with the help of huge quantity of training datasets
and training functions. In our proposed approach, we are using only limited datasets
to train the proposed IDS.
Anomaly Detection by Using CFS Subset … 681

Fig. 3 Big dataset size versus anomaly detection

Fig. 4 Big dataset size versus CPU utilization

6 Conclusion

This work proposed a new approach called as CFS subset algorithm and neural net-
work, where the MLP, logistic regression, and extreme learning machine (ELM)
for identifying the intrusion in computer network. Our training model contains two
huge dataset with the distributed environment that improves the process of Intrusion
detection system. The approaches of machine learning system identifying the intru-
sion in computer network with frequent time of execution and prediction of storage.
When compared to the existing IDS technique, the proposed IDS system taking less
time for execution and storing the test in dataset. Here in this study, the performance
of proposed IDS is better than other existing machine learning approaches and can
significantly detect every anomaly data in computer network. In future, the proposed
work could be used in several distance computation functions amid of the testing
682 J. Jabez et al.

and trained data. Our research work can be considered to improve the efficiency of
IDS in a better manner.

References

1. Tsai, C.-F., Hsu, Y.-F., Lin, C.-Y., Lin, W.-Y.: Intrusion detection by machine learning a review.
Expert Syst. Appl. (2009)
2. Garg, T., Khurana, S.S.: IEEE International Conference on Recent Advances and Innovations
in Engineering (ICRAIE-2014), 09–11 May 2014
3. Shambhu, J.P., Upadhyaya, J., Venugopal Govindaraju, F.F.: Proceedings of the 20th Interna-
tional Conference on Data Engineering. IEEE (2004)
4. Kumar, G., Kumar, K., Sachdeva, M.: The Use of Artificial Intelligence Based Techniques
for Intrusion Detection: A Review. Published online: 4 September 2010 © Springer Sci-
ence+Business Media (2010)
5. Lin, C.C., Wang, M.-S.: Genetic-clustering algorithm for intrusion detection system. Int. J. Inf.
Comput. Secur. 2, 218–234 (2008)
6. Raj, A., ArunPrasath, R., Vigneshwari, S.: Efficient mechanism for sharing private data in a
secured manner. In: 2016 International Conference on Circuit, Power and Computing Tech-
nologies (ICCPCT), pp. 1–4, Mar 2016
7. Mukherjee, S., Sharma, N.: Intrusion Detection using Naive Bayes Classifier with Feature
Reduction. Elsevier (2012)
8. Al-Jarrah1, O.Y., Siddiqui1, A., Elsalamouny, M., Yoo1, P.D., Muhaidat1, S., Kim, K.:
Machine-learning-based feature selection techniques for large-scale network intrusion detec-
tion. In: 2014 IEEE 34th International Conference on Distributed Computing Systems Work-
shops
9. Tavallaee, M., Bagheri, E., Lu, W., Ghorbani, A.: A detailed analysis of the KDD CUP 99 data
set. In: 2009 IEEE International Conference on Computational Intelligence for Security and
Defense Applications, pp. 53–58 (2009)
10. Liu, H., Setiono, R., Motoda, H., Zhao, Z.: Feature Selection: An Ever Evolving Frontier in
Data Mining. In: JMLR: Workshop and Conference Proceedings: The Fourth Workshop on
Feature Selection in Data Mining, vol. 10, pp. 4–13 (2010)
11. Lakshmi Praba, N., Nancy, V., Vigneshwari, S.: Mobile based privacy protected location based
services with three layer security. Int. J. Appl. Eng. Res. 10(4), 10101–10108 (2015). ISSN
0973-4562
12. Mitchel, T.M.: Machine Learning, 2nd edn., Chapter 1, pp. 1–17 (2010)
13. Lakhina, S., Joseph, S., Verma, B.: Feature reduction using principal component analysis for
effective anomaly-based intrusion detection on NSL-KDD. Int. J. Eng. Sci. Technol. 2(6),
1790–1799 (2010)
14. Gowri, S., Vigneshwari, S., Sathiyavathi, R., Lakshmi, T.R.K.: A framework for group decision
support system using cloud database for broadcasting earthquake occurrences. Adv. Intell. Syst.
Comput. 438, 611–615 (2016). ISBN: 978-981-10-0767-5
15. Dhanabal, L., Shantharajah, S.P.: A study on NSL-KDD dataset for intrusion detection system
based on classification algorithms. Int. J. Adv. Res. Comput. Commun. Eng. 4(6) (2015)
16. Saranya, R., Gowri, S., Monisha, S., Vigneshwari, S.: An ontological approach for originating
data services with hazy semantics. Indian J. Sci. Technol. 9(23), 0974-5645 (2016)
17. Liu, H., Setiono, R., Motoda, H., Zhao, Z.: Feature selection: an ever evolving frontier in data
mining. In: JMLR: Workshop and Conference Proceedings, vol. 10, pp. 4–13 (2010)

You might also like