AI Anomaly Detection in Network Traffic
AI Anomaly Detection in Network Traffic
TEAM MEMBERS:
10. References
11. Conclusion
The Problem: Outsmarting Cyberattacks
Autoencoders
Unsupervised (When you don’t have labels for your data)neural
networks that learn patterns in normal data to detect deviations.
XgBoost
Optimized version of GBT incorporating parallelism, tree pruning and regularization
Gradient Boosting
Utilize Gradient Descent to minimize errors in the sequentially built trees.
Boosting
Trees built sequentially minimizing errors from previous trees and
weighing better performing ones more.
Random Forest
Utilize random subsets of a dataset to build multiple decision trees
Bagging
Ensemble of multiple decision trees to arrive at
decision through majority voting
Decision Trees
Tree based algorithm that outputs decisions
based on certain conditions.
Autoencoders: Learning Normal Patterns
Strengths
1
Can detect anomalies in high-dimensional data.
Unsupervised Learning
2
Does not require labeled data for training.
Limitations
3
May be sensitive to noise in the data.
Generative Adversarial
Networks (GANs): Synthetic
Data Generation
Strengths
Can learn complex, high-dimensional features for improved
anomaly detection.
Data Augmentation
Generating synthetic data to enhance training data.
Limitations
Training GANs can be computationally expensive and unstable.
Discussion: Trade-offs and
Considerations
Complexity Computational
Requirements
GANs offer the most
complex model, while GANs demand significant
Decision Trees are relatively computational resources,
simple. while Decision Trees are
more efficient.
Detection Performance
GANs generally achieve the best performance, but other
models may be suitable depending on the specific
requirements.
UML Diagrams
A UML diagram is a way to visualize systems and software using Unified
Modeling Language (UML). Software engineers create UML diagrams to
understand the designs, code architecture, and proposed implementation of
complex software systems. UML diagrams are also used to model workflows
and business processes.
The most used UML Diagrams are as follows:
1. Use Case Diagram
2. Class Diagram
3. Sequence Diagram
4. Activity Diagram
Use Case Diagram
1.Supervised Learning :
3.Unsupervised Learning :
•Uses XGBoost with GPU acceleration Each approach has its
•Uses an autoencoder architecture
•Implements both binary (normal vs advantages:
•Learns normal network traffic patterns
anomaly) and multi-class classification •XGBoost is best when you have
•Detects anomalies through reconstruction error
•Features complete labeled data usage labeled data
•Implements KMeans clustering for anomaly
•Strong performance metrics with explicit •GAN works well with partially labeled
categorization
anomaly labeling data
•Features dimensionality reduction through the latent
space •Autoencoder is ideal when you don't
have labeled data
2.Semi-Supervised Learning :
•Implements a GAN (Generative Adversarial
Network)
•Generator creates synthetic normal network
traffic
•Discriminator learns to distinguish normal vs
anomalous traffic
•Trains primarily on normal traffic data
•Uses threshold-based anomaly detection
Class Diagram
The class diagram shows the main 4.Autoencoder with Key relationships:
5.AnomalyEvaluator:
components of the system: EncoderNetwork and •All detection classes depend on
•Common evaluation
1.DataPreprocessor: DecoderNetwork: DataPreprocessor
methods for all approaches
•Handles all data preprocessing tasks •Implements unsupervised •Each detector uses an
•Includes ROC curves,
•Includes scaling, encoding, and data splitting learning AnomalyEvaluator for
confusion matrices
•Contains methods for reducing anomalies in •EncoderNetwork performance metrics
•Handles visualization of
the dataset compresses data to latent •GANDetector contains
results
2.XGBoostDetector: space Generator and Discriminator
•Implements supervised learning approach •DecoderNetwork •Autoencoder contains
•Manages model training and prediction reconstructs data from EncoderNetwork and
•Includes binary and multiclass classification latent space DecoderNetwork
capabilities •Uses reconstruction error
3.GANDetector with Generator and for anomaly detection
Discriminator:
•Implements semi-supervised learning
•Generator creates synthetic normal samples
•Discriminator learns to distinguish normal vs
anomalous patterns
•Contains methods for training and anomaly
Sequence
Diagram
The sequence diagram shows three main detection
pathways:
1.Supervised Learning Path (XGBoost):
• Loads preprocessed data
• Initializes XGBoost with GPU parameters
• Trains on labeled data Common Elements:
• Makes predictions and evaluates • All paths start with data preprocessing
• Evaluates results
Activity Diagram
The activity diagram illustrates the complete workflow with three main
paths:
1.Data Preparation Flow: c) Unsupervised Path (Autoencoder):
•Load KDD Cup network data • Initialize autoencoder architecture
•Preprocessing steps: • Train encoder and decoder components
• Encode categorical variables • Calculate reconstruction errors
• Normalize features • Set anomaly threshold
• Split into train/test sets • Cluster-detected anomalies
2.Model Selection & Training: 3.Evaluation Flow:
a) Supervised Path (XGBoost): • Calculate performance metrics
• Initialize XGBoost with GPU parameters • Generate ROC curves
• Train on labeled data • Create confusion matrices
• Choose between binary/multiclass • Produce final results
• Predict anomalies Key Decision Points:
4. Haowen, Chen, Zhao, Li, Zeyan, Zhihan, . . . Honglin. (2018, February 12).
Unsupervised Anomaly Detection via Variational Auto-Encoder for Seasonal KPIs
5. in Web Applications.
Ellison, Retrieved
D. (n.d.). Fraud from
Detection https://arxiv.org/abs/1802.03903
Using Autoencoders in Keras with a TensorFlow
Backend. Retrieved from
https://www.datascience.com/blog/fraud-detection-with-tensorflow
6. Zenati, H., Foo, C., Lecouat, B., Manek, G. and Chandrasekhar, V. (2018). Efficient
GAN-Based Anomaly Detection. [online] Arxiv.org. Available at:
https://arxiv.org/abs/1802.06222
Conclusion: Future Directions