0% found this document useful (0 votes)
11 views

AI Anomaly Detection in Network Traffic

This document presents a comparative study on detecting anomalies in computer networks using machine learning techniques. It discusses the limitations of traditional security systems and introduces three advanced AI models: Decision Trees, Autoencoders, and Generative Adversarial Networks (GANs), detailing their methodologies, strengths, and weaknesses. The conclusion emphasizes the need for continuous monitoring, data augmentation, and advanced analytics to enhance network security.

Uploaded by

konkasireesha11
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views

AI Anomaly Detection in Network Traffic

This document presents a comparative study on detecting anomalies in computer networks using machine learning techniques. It discusses the limitations of traditional security systems and introduces three advanced AI models: Decision Trees, Autoencoders, and Generative Adversarial Networks (GANs), detailing their methodologies, strengths, and weaknesses. The conclusion emphasizes the need for continuous monitoring, data augmentation, and advanced analytics to enhance network security.

Uploaded by

konkasireesha11
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 17

COMPARATIVE STUDY TO DETECT

ANOMALIES IN COMPUTER NETWORK


USING MACHINE LEARNING
This presentation will explore the use of AI models for detecting
anomalies in network traffic. We'll delve into the challenges of
traditional security systems, introduce three advanced AI models for
anomaly detection, and examine their strengths and limitations.

BATCH NO: C2 NAME OF THE GUIDE: Mr. D. Varun Pra

TEAM MEMBERS:

21H71A05F9 Yenuga Naga Venkata Sai Praveen Teja

21H71A05I6 Shaik Subhani

21H71A05F6 Sadam Naga Chaitanya

22H75A0514 Nagiredla Manoj Kumar


Table of Contents
1. Machine Learning Anomaly Detection in Computer Network
2. The Problem: Outsmarting Cyberattacks
3. Machine Learning Models for Anomaly Detection
4. Methodology: Training and Evaluation
5. Decision Trees: Interpretability and Efficiency
6. Autoencoders: Learning Normal Patterns
7. Generative Adversarial Networks (GANs): Synthetic Data Generation
8. Discussion: Trade-offs and Considerations
9. UML Diagrams

10. References
11. Conclusion
The Problem: Outsmarting Cyberattacks

Traditional Systems' Shortcomings The Need for Advanced Techniques


Legacy security systems, often based on signature-based detection, rely on The limitations of traditional security measures necessitate a move towards
identifying known threats. This approach is inherently reactive, struggling to proactive and adaptive anomaly detection. Advanced techniques, powered
adapt to new or zero-day attacks. The sheer volume of network traffic by Artificial Intelligence (AI), offer the potential to identify previously unseen
makes it challenging to analyze every packet effectively, leading to many threats by focusing on deviations from established patterns of normal
threats slipping through. Furthermore, these systems are frequently behavior. AI models can learn and adapt to evolving threat landscapes,
inflexible and struggle to adapt to the ever-changing tactics used by offering superior protection against a wider range of attacks. This ability to
cybercriminals. This results in a slower response time and increased proactively identify anomalies is essential in securing networks in today's
vulnerability to sophisticated attacks. ever-evolving cyber threat landscape.
Machine Learning Models for
Anomaly Detection
Decision Trees
These are Supervised (When you have Labels) Interpretable and
efficient for classification tasks, but may struggle with complex
data.
In this XG Boost model is used to detection

Autoencoders
Unsupervised (When you don’t have labels for your data)neural
networks that learn patterns in normal data to detect deviations.

Generative Adversarial Networks (GANs)


Unsupervised (When you don’t have labels for your data)neural networks

Can generate synthetic data to learn complex patterns, but


training GANs can be computationally expensive.
Methodology: Training and Evaluation
1.XGBoost (Supervised):
The methodology document outlines the 3.Autoencoder (Unsupervised):
•Training:
comprehensive approach to training and •Training:
• GPU-accelerated with
evaluating the three different anomaly • Encoder: 114→4 dimensions
'gpu_hist'
detection methods. Each approach has its • Decoder: 4→114 dimensions
• Learning rate: 0.1
unique characteristics: • Batch size: 512
• 10 training rounds
1.XGBoost : • Adam optimizer
• Binary and multiclass
It provides strong performance with labeled • 10 epochs
variants
data and offers interpretability through •Evaluation:
•Evaluation:
feature importance. • Reconstruction error
• ROC-AUC score
2. GAN : 2.GAN (Semi-Supervised): • Statistical threshold (μ + 5σ)
• Confusion matrix
approach is more flexible and can learn •Training: • KMeans clustering of anomalies
• Feature importance
complex patterns in the data without • Generator: 5 layers (64→512 neurons)
requiring full labels. • Discriminator: 5 layers (256→1 neurons) Common Elements:
3. Autoencoder: • Batch size: 512 • MinMaxScaler normalization
is most suitable for scenarios with limited or • Learning rate: 0.00001 • 75-25 train-test split
no labels, using reconstruction error as an • 10 epochs • ROC curves and confusion matrices
anomaly signal. •Evaluation: • Precision, recall, F1-score metrics
• Discriminator scores
• 1st percentile threshold for anomalies
Decision Trees with XG Boost

XgBoost
Optimized version of GBT incorporating parallelism, tree pruning and regularization

Gradient Boosting
Utilize Gradient Descent to minimize errors in the sequentially built trees.

Boosting
Trees built sequentially minimizing errors from previous trees and
weighing better performing ones more.

Random Forest
Utilize random subsets of a dataset to build multiple decision trees

Bagging
Ensemble of multiple decision trees to arrive at
decision through majority voting

Decision Trees
Tree based algorithm that outputs decisions
based on certain conditions.
Autoencoders: Learning Normal Patterns
Strengths
1
Can detect anomalies in high-dimensional data.

Unsupervised Learning
2
Does not require labeled data for training.

Limitations
3
May be sensitive to noise in the data.
Generative Adversarial
Networks (GANs): Synthetic
Data Generation
Strengths
Can learn complex, high-dimensional features for improved
anomaly detection.

Data Augmentation
Generating synthetic data to enhance training data.

Limitations
Training GANs can be computationally expensive and unstable.
Discussion: Trade-offs and
Considerations
Complexity Computational
Requirements
GANs offer the most
complex model, while GANs demand significant
Decision Trees are relatively computational resources,
simple. while Decision Trees are
more efficient.

Detection Performance
GANs generally achieve the best performance, but other
models may be suitable depending on the specific
requirements.
UML Diagrams
A UML diagram is a way to visualize systems and software using Unified
Modeling Language (UML). Software engineers create UML diagrams to
understand the designs, code architecture, and proposed implementation of
complex software systems. UML diagrams are also used to model workflows
and business processes.
The most used UML Diagrams are as follows:
1. Use Case Diagram
2. Class Diagram
3. Sequence Diagram
4. Activity Diagram
Use Case Diagram
1.Supervised Learning :
3.Unsupervised Learning :
•Uses XGBoost with GPU acceleration Each approach has its
•Uses an autoencoder architecture
•Implements both binary (normal vs advantages:
•Learns normal network traffic patterns
anomaly) and multi-class classification •XGBoost is best when you have
•Detects anomalies through reconstruction error
•Features complete labeled data usage labeled data
•Implements KMeans clustering for anomaly
•Strong performance metrics with explicit •GAN works well with partially labeled
categorization
anomaly labeling data
•Features dimensionality reduction through the latent
space •Autoencoder is ideal when you don't
have labeled data

2.Semi-Supervised Learning :
•Implements a GAN (Generative Adversarial
Network)
•Generator creates synthetic normal network
traffic
•Discriminator learns to distinguish normal vs
anomalous traffic
•Trains primarily on normal traffic data
•Uses threshold-based anomaly detection
Class Diagram
The class diagram shows the main 4.Autoencoder with Key relationships:
5.AnomalyEvaluator:
components of the system: EncoderNetwork and •All detection classes depend on
•Common evaluation
1.DataPreprocessor: DecoderNetwork: DataPreprocessor
methods for all approaches
•Handles all data preprocessing tasks •Implements unsupervised •Each detector uses an
•Includes ROC curves,
•Includes scaling, encoding, and data splitting learning AnomalyEvaluator for
confusion matrices
•Contains methods for reducing anomalies in •EncoderNetwork performance metrics
•Handles visualization of
the dataset compresses data to latent •GANDetector contains
results
2.XGBoostDetector: space Generator and Discriminator
•Implements supervised learning approach •DecoderNetwork •Autoencoder contains
•Manages model training and prediction reconstructs data from EncoderNetwork and
•Includes binary and multiclass classification latent space DecoderNetwork
capabilities •Uses reconstruction error
3.GANDetector with Generator and for anomaly detection
Discriminator:
•Implements semi-supervised learning
•Generator creates synthetic normal samples
•Discriminator learns to distinguish normal vs
anomalous patterns
•Contains methods for training and anomaly
Sequence
Diagram
The sequence diagram shows three main detection
pathways:
1.Supervised Learning Path (XGBoost):
• Loads preprocessed data
• Initializes XGBoost with GPU parameters
• Trains on labeled data Common Elements:

• Makes predictions and evaluates • All paths start with data preprocessing

2.Semi-Supervised Learning Path (GAN): • Each method uses the evaluator


component
• Initializes Generator and Discriminator networks
• Final results include ROC curves and
• Iterative training process between networks
• Generator creates synthetic normal samplesconfusion matrices
• Discriminator learns to detect anomalies • Client interacts primarily with preprocessor
• Final anomaly detection and evaluation and gets final results
The flow shows how each method:
3.Unsupervised Learning Path (Autoencoder):
• Initializes autoencoder network 1. Receives preprocessed data

• Trains on normal data only 2. Processes it differently based on the

• Encodes and decodes data approach

• Uses reconstruction error for detection 3. Performs anomaly detection

• Clusters detected anomalies 4. Gets evaluated using common metrics

• Evaluates results
Activity Diagram
The activity diagram illustrates the complete workflow with three main
paths:
1.Data Preparation Flow: c) Unsupervised Path (Autoencoder):
•Load KDD Cup network data • Initialize autoencoder architecture
•Preprocessing steps: • Train encoder and decoder components
• Encode categorical variables • Calculate reconstruction errors
• Normalize features • Set anomaly threshold
• Split into train/test sets • Cluster-detected anomalies
2.Model Selection & Training: 3.Evaluation Flow:
a) Supervised Path (XGBoost): • Calculate performance metrics
• Initialize XGBoost with GPU parameters • Generate ROC curves
• Train on labeled data • Create confusion matrices
• Choose between binary/multiclass • Produce final results
• Predict anomalies Key Decision Points:

b) Semi-Supervised Path (GAN): 1. Model Selection based on label availability


2. Binary vs Multiclass in supervised approach
• Initialize Generator and Discriminator
3. Convergence check in GAN training
• Training loop between networks
4. Threshold setting in autoencoder
• Generate fake samples
• Update networks until convergence
References
1. Dhaliwal, S., Nahid, A., & Abbas, R. (2018). Effective Intrusion Detection
System Using XGBoost. Information, 9(7), 149. doi:10.3390/info9070149

2. Brownlee, J. A Gentle Introduction to XGBoost for Applied Machine Learning.


Machine Learning Mastery. Available online:
http://machinelearningmastery.com/gentle-introduction-xgboost-applied-machine
-learning/
3. A Study on NSL-KDD Dataset for Intrusion Detection System Based on
Classification Algorithms. Available online:
https://pdfs.semanticscholar.org/1b34/80021c4ab0f632efa99e01a9b073903c555
4.pdf

4. Haowen, Chen, Zhao, Li, Zeyan, Zhihan, . . . Honglin. (2018, February 12).
Unsupervised Anomaly Detection via Variational Auto-Encoder for Seasonal KPIs
5. in Web Applications.
Ellison, Retrieved
D. (n.d.). Fraud from
Detection https://arxiv.org/abs/1802.03903
Using Autoencoders in Keras with a TensorFlow
Backend. Retrieved from
https://www.datascience.com/blog/fraud-detection-with-tensorflow
6. Zenati, H., Foo, C., Lecouat, B., Manek, G. and Chandrasekhar, V. (2018). Efficient
GAN-Based Anomaly Detection. [online] Arxiv.org. Available at:
https://arxiv.org/abs/1802.06222
Conclusion: Future Directions

Continuous Monitoring Data Augmentation Advanced Analytics


Implementing real-time anomaly Using synthetic data generation to Exploring advanced AI techniques,
detection systems to ensure enhance training datasets and such as deep learning and
ongoing network security. improve model performance. reinforcement learning, for
enhanced anomaly detection.
Thank you !!

You might also like