Detection of Cyber Attack in Network Using Machine Learning Techniques Final
Detection of Cyber Attack in Network Using Machine Learning Techniques Final
SUBMITTED BY
G. HIMA HARSHA - 121710301017
G. PAVAN KUMAR -121710301018
N. SURESH - 121710301033
C. RAKESH KUMAR-121710301043
UNDER THE ESTEEMED GUIDANCE OF
Ms. Bhanu Sree
ASSOCIATE PROFESSOR
GITAM
2
3
ABSTRACT
TABLE OF CONTENTS
1.INTRODUCTION
1.1 Motivation
1.2 Existing System
1.3 Objective
1.4 Outcome
1.5 Applications
4
1.6 Structure of project
1.5.2 System Design
1.5.3 Implementation
1.5.4 Testing
1.5.5 Deployment of System and Maintenance
5
Detection of Cyber Attack in Network using Machine Learning
Techniques
6
ABSTRACT
7
1.INTRODUCTION
Contrasted with the past, improvements in PC and correspondence
innovations have given broad and propelled changes. The use of new
innovations give incredible advantages to people, organizations, and
governments, be that as it may, messes some up against them. For
instance, the protection of significant data, security of put away
information stages, accessibility of information and so forth. Contingent
upon these issues, digital fear based oppression is one of the most
significant issues in this day and age. Digital fear, which made a great deal
of issues people and establishments, has arrived at a level that could
undermine open and nation security by different gatherings, for example,
criminal association, proficient people and digital activists. Along these
lines, Intrusion Detection Systems (IDS) has been created to maintain a
strategic distance from digital assaults. Right now, learning the bolster
support vector machine (SVM) calculations were utilized to recognize port
sweep endeavors dependent on the new CICIDS2017 dataset with 97.80%,
69.79% precision rates were accomplished individually. Rather than SVM
we can introduce some other algorithms like random forest, CNN, ANN
where these algorithms can acquire accuracies like SVM – 93.29, CNN –
63.52, Random Forest – 99.93, ANN – 99.11.
1.1 MOTIVATION
The use of new innovations give incredible advantages to people,
organizations, and governments, be that as it may, messes some up against
them. For instance, the protection of significant data, security of put away
information stages, accessibility of information and so forth. Contingent
upon these issues, digital fear based oppression is one of the most
significant issues in this day and age. Digital fear, which made a great deal
of issues people and establishments, has arrived at a level that could
undermine open and nation security by different gatherings, for example,
criminal association, proficient people and digital activists. Along these
lines, Intrusion Detection Systems (IDS) has been created to maintain a
strategic distance from digital assaults.
8
1.2 Objectives
Objective of this project is to detect cyber attacks by using machine
learning algorithms like
• ANN
• CNN
• Random forest
1.3 Objectives
9
Fig: 1 Project SDLC
• Practical Implementation
It’s the first and foremost stage of the any project as our is an
academic leave for requisites amassing, we followed of IEEE
Journals and Amassed so many IEEE Relegated papers and
final culled a Paper designated “Individual web revisitation by
setting and substance importance input and for analysis stage
10
we took referees from the paper and did literature survey of
some papers and amassed all the Requisites of the project in
this stage
1.5.2SYSTEM DESIGN
1.5.4.TESTING
UNIT TESTING
1.Data Collection
2.Data Pre-handling
4.Modeling
5.Predicting
12
FUNCTIONAL REQUIREMENT (NFR) indicates the quality
property of a product framework. They judge the product
framework dependent on Responsiveness, Usability, Security,
Portability and other non-useful principles that are basic to the
accomplishment of the product framework. Illustration of
nonfunctional prerequisite, "how quick does the site load?"
Failing to meet non-utilitarian necessities can bring about
frameworks that neglect to fulfill client needs. Non-practical
Requirements permits you to force imperatives or limitations on
the plan of the framework across the different light-footed
accumulations. Model, the site should stack in 3 seconds when
the quantity of concurrent clients is > 10000. Portrayal of non-
utilitarian necessities is similarly just about as basic as a useful
prerequisite.
13
property rights, and so forth ought to be reviewed.
14
level programming subsystem
2.LITERATURE SURVEY
15
2.1 R. Christopher, “Port scanning techniques and the defense against
them,” SANS Institute, 2001.
Port Scanning is quite possibly the most well known strategies assailants
use to find benefits that they can adventure to break into frameworks. All
frameworks that are associated with a LAN or the Internet by means of a
modem run benefits that tune in to notable and not so notable ports. By
port filtering, the assailant can track down the accompanying data about
the focused on frameworks: what administrations are running, what clients
own those administrations, regardless of whether unknown logins are
upheld, and whether certain organization administrations require
verification. Port examining is cultivated by making an impression on each
port, each in turn. The sort of reaction got demonstrates whether the port is
utilized and can be examined for additional shortcomings. Port scanners
are imperative to organize security experts since they can uncover
conceivable security weaknesses on the focused on framework. Similarly
as port outputs can be ran against your frameworks, port sweeps can be
recognized and the measure of data about open administrations can be
restricted using the legitimate apparatuses. Each freely accessible
framework has ports that are open and accessible for use. The item is to
restrict the openness of open ports to approved clients and to deny
admittance to the shut ports.
16
organization are portscanning it routinely. Nonetheless, protectors won't
typically wish to conceal their portscanning, while assailants will. For
definiteness, in the rest of this paper, we will discuss the assailants
checking the organization, and the safeguards attempting to recognize the
sweep. There are a few legitimate/moral discussions about portscanning
what break out routinely on Internet mailing records and newsgroups.
17
2.4 S. Aljawarneh, M. Aldwairi, and M. B. Yassein, "Inconsistency
based interruption discovery framework through include
determination examination and building crossover productive model,"
Journal of Computational Science, vol. 25, pp. 152–160, 2018.
18
3. PROBLEM ANALYSIS
3.1 EXISTING APPROACH:
Chaste Bayes and Principal Component Analysis (PCA) were been utilized
with the KDD99 dataset by Almansob and Lomte [9].Similarly, PCA,
SVM, and KDD99 were utilized Chithik and Rabbani for IDS [10]. In
Aljawarneh et al's. Paper, their appraisal and assessments were passed on
dependent on the NSL-KDD dataset for their IDS model [11] Composing
investigates show that KDD99 dataset is ceaselessly utilized for IDS [6]–
[10].There are 41 features in KDD99 and it was made in 1999.
Subsequently, KDD99 is old and doesn't give any information about
bleeding edge new attack types, model, multi day abuses, etc. As such we
used a forefront and new CICIDS2017 dataset [12] in our examination.
3.11 Drawbacks
1) Strict Regulations
3) Restrictive to assets
.
19
3.2.1 Advantages
20
3.3 Software And Hardware Requirements
SOFTWARE REQUIREMENTS
The functional requirements or the overall description documents
include the product perspective and features, operating system and
operating environment, graphics requirements, design constraints and user
documentation.
The appropriation of requirements and implementation constraints
gives the general overview of the project in regards to what the areas of
strength and deficit are and how to tackle them.
HARDWARE REQUIREMENTS
21
Minimum hardware requirements are very dependent on the particular
software being developed by a given Enthought Python / Canopy / VS
Code user. Applications that need to store large arrays/objects in
memory will require more RAM, whereas applications that need to
perform numerous calculations or tasks more quickly will require a
faster processor.
• Operating system : windows, linux
• Processor : minimum intel i3
• Ram : minimum 4 gb
• Hard disk : minimum 250gb
4. SYSTEM DESIGN
22
UML DIAGRAMS
The System Design Document depicts the framework necessities, working
climate, framework and subsystem engineering, records and information
base plan, input designs, yield formats, human-machine interfaces, nitty
gritty plan, preparing rationale, and outer interfaces.
Entertainer: Actor addresses the job a client plays as for the framework.
An entertainer communicates with, yet has no influence over the utilization
cases.
23
Start
Localhost
Detection of Attack
Visualisation
End
CLASS DIAGRAM
24
User
agriculture
Start()
Localhost()
Register & Login to Application() System
Real Time Malware Detection()
Data Stores in SQL()
User Add Data()
Attack Classification based on model()
Detection of Attack()
Visualisation()
end()
25
SEQUENCE DIAGRAM
Us er Sy s tem
Start
Loc alhos t
Us er Add Data
Vi sualisation
26
5.IMPLEMENTATION
27
6.CODE
28
29
30
31
32
33
34
35
36
37
38
7.RESULTS AND DISCUSSIONS
Data preprocessing
39
Data EDA
40
41
ML Deploy
42
From the score accuracy we concluding the DT & RF give better accuracy and
building pickle file for predicting the user input
43
Application
44
Localhost - in cmd python app.py
45
Enter the input
46
Predict attack -
47
8. CONCLUSION
Right now, estimations of help vector machine, ANN, CNN, Random Forest
and profound learning calculations dependent on modern CICIDS2017
dataset were introduced relatively. Results show that the profound learning
calculation performed fundamentally preferable outcomes over SVM, ANN,
RF and CNN. We are going to utilize port sweep endeavors as well as other
assault types with AI and profound learning calculations, apache Hadoop
and sparkle innovations together dependent on this dataset later on. All these
calculation helps us to detect the cyber attack in network. It happens in the
way that when we consider long back years there may be so many attacks
happened so when these attacks are recognized then the features at which
values these attacks are happening will be stored in some datasets. So by
using these datasets we are going to predict whether cyber attack is done or
not. These predictions can be done by four algorithms like SVM, ANN, RF,
CNN this paper helps to identify which algorithm predicts the best accuracy
rates which helps to predict best results to identify the cyber attacks
happened or not.
FUTURE SCOPE
In enhancement we will add some ML Algorithms to increase accuracy
48
8.REFERENCES
[1]iK.iGraves,iCeh:iOfficialicertifiediethicalihackerireviewiguide:iExami312-
50.iJohniWileyi&iSons,i2007.i
[2]iR.iChristopher,i“Portiscanningitechniquesianditheidefenseiagainstithem,”iSA
NSiInstitute,i2001.i
[3]iM.iBaykara,iR.iDas¸,iandiI.iKaradoi˘gan,i“Bilgiigi¨uvenlii˘giisistemlerindeik
ullanilaniarac¸lariniincelenmesi,”iini1stiInternationaliSymposiumioniDigitaliFore
nsicsiandiSecurityi(ISDFS13),i2013,ipp.i231–239.i
[4]iS.iStaniford,iJ.iA.iHoagland,iandiJ.iM.iMcAlerney,i“Practicaliautomatedidete
ctioniofistealthyiportscans,”iJournaliofiComputeriSecurity,ivol.i10,ino.i1-
2,ipp.i105–136,i2002.i
[5]iS.iRobertson,iE.iV.iSiegel,iM.iMiller,iandiS.iJ.iStolfo,i“Surveillanceidetectio
niinihighibandwidthienvironments,”iiniDARPAiInformationiSurvivabilityiConfer
enceiandiExposition,i2003.iProceedings,ivol.i1.iIEEE,i2003,ipp.i130–138.i
[6]iK.iIbrahimiiandiM.iOuaddane,i“Managementiofiintrusionidetectionisystemsib
asedkdd99:iAnalysisiwithildaiandipca,”iiniWirelessiNetworksiandiMobileiComm
unicationsi(WINCOM),i2017iInternationaliConferenceion.iIEEE,i2017,ipp.i1–6.i
[7]iN.iMoustafaiandiJ.iSlay,i“Theisignificantifeaturesiofitheiunswnb15ianditheik
dd99idataisetsiforinetworkiintrusionidetectionisystems,”iiniBuildingiAnalysisiDat
asetsiandiGatheringiExperienceiReturnsiforiSecurityi(BADGERS),i2015i4thiInte
rnationaliWorkshopion.iIEEE,i2015,ipp.i25–31.i
[8]iL.iSun,iT.iAnthony,iH.iZ.iXia,iJ.iChen,iX.iHuang,iandiY.iZhang,i“Detectioni
andiclassificationiofimaliciousipatternsiininetworkitrafficiusingibenford’silaw,”ii
niAsiaPacificiSignaliandiInformationiProcessingiAssociationiAnnualiSummitian
diConferencei(APSIPAiASC),i2017.iIEEE,i2017,ipp.i864–872.i
[9]iS.iM.iAlmansobiandiS.iS.iLomte,i“Addressingichallengesiforiintrusionidetect
ionisystemiusinginaiveibayesiandipcaialgorithm,”iiniConvergenceiiniTechnology
i(I2CT),i2017i2ndiInternationaliConferenceifor.iIEEE,i2017,ipp.i565–568.i
[10]iM.iC.iRajaiandiM.iM.iA.iRabbani,i“Combinedianalysisiofisupportivectorim
achineiandiprincipleicomponentianalysisiforiids,”iiniIEEEiInternationaliConferen
ceioniCommunicationiandiElectronicsiSystems,i2016,ipp.i1–5.i
49
[11]iS.iAljawarneh,iM.iAldwairi,iandiM.iB.iYassein,i“Anomaly-
basediintrusionidetectionisystemithroughifeatureiselectionianalysisiandibuildingi
hybridiefficientimodel,”iJournaliofiComputationaliScience,ivol.i25,ipp.i152–
160,i2018.i
[12]iI.iSharafaldin,iA.iH.iLashkari,iandiA.iA.iGhorbani,i“Towardigeneratingiain
ewiintrusionidetectionidatasetiandiintrusionitrafficicharacterization.”iiniICISSP,i
2018,ipp.i108–116.i
[13]iD.iAksu,iS.iUstebay,iM.iA.iAydin,iandiT.iAtmaca,i“Intrusionidetectioniwith
icomparativeianalysisiofisupervisedilearningitechniquesiandifisheriscoreifeatureis
electionialgorithm,”iiniInternationaliSymposiumioniComputeriandiInformationiS
ciences.iSpringer,i2018,ipp.i141–149.i
[14]iN.iMarir,iH.iWang,iG.iFeng,iB.iLi,iandiM.iJia,i“Distributediabnormalibeha
vioridetectioniapproachibasedionideepibeliefinetworkiandiensembleisvmiusingisp
ark,”iIEEEiAccess,i2018.i
[15]iP.iA.iA.iResendeiandiA.iC.iDrummond,i“Adaptiveianomalybasediintrusioni
detectionisystemiusingigeneticialgorithmiandiprofiling,”iSecurityiandiPrivacy,ivo
l.i1,ino.i4,ip.ie36,i2018.i
[16]iC.iCortesiandiV.iVapnik,i“Supportvectorinetworks,”iMachineilearning,ivol.i
20,ino.i3,ipp.i273–297,i1995.i
[17]iR.iShouval,iO.iBondi,iH.iMishan,iA.iShimoni,iR.iUnger,iandiA.iNagler,i“A
pplicationiofimachineilearningialgorithmsiforiclinicalipredictiveimodeling:iaidata
miningiapproachiinisct,”iBoneimarrowitransplantation,ivol.i49,ino.i3,ip.i332,i201
4.i
50