0% found this document useful (0 votes)
6 views23 pages

batch-05-ml-ppt-1 (1)

The document discusses the application of Fuzzy C-Means (FCM) clustering for intrusion detection in network traffic, emphasizing its ability to assign varying degrees of membership to data points. It outlines the steps for implementing FCM, including data normalization, membership value calculation, and cluster centroid updates, while also proposing a hybrid AI-driven Intrusion Detection System that integrates both signature-based and anomaly-based detection methods. The document highlights the importance of advanced machine learning algorithms and real-time data analysis to enhance detection accuracy and reduce false positives in cybersecurity.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views23 pages

batch-05-ml-ppt-1 (1)

The document discusses the application of Fuzzy C-Means (FCM) clustering for intrusion detection in network traffic, emphasizing its ability to assign varying degrees of membership to data points. It outlines the steps for implementing FCM, including data normalization, membership value calculation, and cluster centroid updates, while also proposing a hybrid AI-driven Intrusion Detection System that integrates both signature-based and anomaly-based detection methods. The document highlights the importance of advanced machine learning algorithms and real-time data analysis to enhance detection accuracy and reduce false positives in cybersecurity.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 23

FUZZY C-MEANS

CLUSTERING
T2 REVIEW
BATCH-05
PRESENTED BY :221FA04319-Yashwanth Reddy
221FA04330-Sravya
221FA04425-Charishma
221FA04617-Yamini
SUBMITTED TO :Mr.Sourav Mondal
T1-Question
Problem Statement
a.)How would you apply clustering algorithms to identify and categorize different types of network traffic
patterns and system behaviors indicative of potential intrusions? Discuss innovative approaches for
clustering network data streams, system logs, and user activities to uncover hidden patterns and anomalies
in the network traffic
b.) Apply Fuzzy C-Means clustering algorithm on the given dataset for intrusion detection.
c. Propose novel approaches distinguish between normal network activities and suspicious or
malicious behavior. Furthermore, explore the integration of advanced machine learning
algorithms, anomaly detection techniques, and real-time data analysis to enhance the system's
ability to adapt to emerging threats and minimize false positives. Your solution should prioritize
both detection accuracy and scalability to effectively safeguard critical network infrastructure
against evolving cyber threats.
Abstract :

Fuzzy C Means is a soft clustering technique in which every data point is assigned a cluster along
with the probability of it being in the cluster.
soft clustering
■ Soft clustering, also known as fuzzy clustering or probabilistic clustering, assigns each data point
a degree of membership/probability values that indicate the likelihood of a data point belonging
to each cluster.
■ Soft clustering allows the representation of data points that may belong to multiple clusters.
Fuzzy C Means and Gaussian Mixed Models are examples of Soft clustering.

How to Run the FCM Algorithm


Initialization: Randomly choose and initialize cluster centroids from the data set and specify a
fuzziness parameter (m) to control the degree of fuzziness in the clustering.
Membership Update: Calculate the degree of membership for each data point to each cluster based
on its distance to the cluster centroids using a distance metric (ex: Euclidean distance).
Centroid Update: Update the centroid value and recalculate the cluster centroids based on the
updated membership values.
Convergence Check: Repeat steps 2 and 3 until a specified number of iterations is reached or the
membership values and centroids converge to stable values.
The Maths Behind Fuzzy C Means
■ In a traditional k-means algorithm, we mathematically solve it via the following steps:
■ Randomly initialize the cluster centers, based on the k-value.
■ Calculate the distance to each centroid using a distance metric. Ex: Euclidean distance, Manhattan distance.
■ Assign the clusters to each data point and then form k-clusters.
■ For each cluster, compute the mean of the data points belonging to that cluster and then update the centroid
of each cluster.
■ Update until the centroids don’t change or a pre-defined number of iterations are over.
1. Our objective is to minimize the objective function which is as follows:
Here:
n = number of data point
c = number of clusters
x = ‘i’ data point
v = centroid of ‘j’ cluster
w = membership value of data point of i to cluster j
m = fuzziness parameter (m>1)
2.Update the membership values using the formula:

3. Update cluster centroid values using a weighted average of the data points:

4. Keep updating the membership values and the cluster centers until the membership values and cluster centers
stop changing significantly or when a predefined number of iterations is reached.

5. Assign each data point to the cluster or multiple clusters for which it has the highest membership value.
Description :

Fuzzy C-Means (FCM):


Fuzzy C-Means (FCM) is a clustering algorithm that assigns data points to multiple clusters
with varying degrees of membership rather than forcing a strict classification. Unlike K-
Means, where each data point belongs to only one cluster, FCM allows for partial
membership, making it more flexible for detecting anomalies in network traffic.
Working Mechanism:
1. Initialize cluster centroids randomly.
2. Compute the membership matrix using:uij​=
3. Update cluster centroids:
cj=
4.Repeat steps 2 and 3 until convergence (when centroids stabilize).
Solution :
a) Applying Clustering Algorithms for Intrusion Detection
 Intrusion detection in networks involves identifying unusual patterns in traffic that may indicate
cyber threats like hacking attempts, malware, or data breaches. Clustering algorithms can help by
grouping similar types of network behaviors, making it easier to detect and categorize normal vs.
suspicious activities.
Steps to analysis network traffic and intrusion detection:
Step 1: Understanding Network Traffic and System Behavior
Network traffic consists of different types of data moving between devices. Each communication(or
"packet") has attributes like:
• Source & destination IP addresses (who is sending/receiving data)
• Protocol used (HTTP, FTP, SSH, etc.)
• Data size & transmission speed
• Request frequency (how often a device connects)
Step 2: Using Clustering Algorithms for Intrusion Detection
1. Choosing the Right Clustering Algorithm
Different clustering algorithms can be used based on the type of data and required accuracy:
• K-Means Clustering – Groups network traffic into distinct clusters based on similarity (e.g., normal
users vs. suspicious activities).
• Fuzzy C-Means (FCM) – Allows overlapping clusters, making it useful for cases where behaviors
are not completely normal or completely malicious.
• DBSCAN (Density-Based Clustering) – Detects clusters based on data density, making it good for
spotting outliers like rare cyber attacks.
2. Steps to Apply Clustering:
1. Collect Data – Gather system logs, network packets, and user activities from servers and firewalls.
2. Preprocess Data – Remove irrelevant data, normalize values (e.g., converting IP addresses into
numerical values), and extract key features like connection frequency, request type, and response
time.
3. Benefits of Clustering in Intrusion Detection
Feature Description
Unsupervised Learning No need for labeled attack data
Outlier Detection Detects previously unseen attacks
Behavior Profiling Models normal user/machine behavior
Adaptability Adjusts to changing network patterns
4. Visualization & Interpretation
Use t-SNE or PCA to visualize high-dimensional clusters.
Use heatmaps, scatter plots, and cluster timelines to interpret behavior visually.
Example Use-Case
You cluster user login attempts by time of day, IP location, and device ID.
Anomalies: Multiple login attempts from different continents in a short time = suspicious behavior.
1. Apply Clustering Algorithm – Use a clustering algorithm like FCM to classify
activities into different groups (e.g., normal traffic, suspicious activity, and malicious
attacks).
2. Analyze the Results – Check if any cluster has unusual patterns like sudden spikes in
traffic, repeated failed login attempts, or unexpected data transfers.
3. Detect & Respond – If a cluster is labeled as suspicious, trigger an alert for network
administrators to investigate further.
b) Given network traffic data:

Timesta Source Destinat Bytes Packets Intrusion


mp IP ion IP Transferr Transferr Detected
ed ed
2024-03-
34567 80 5000 20 No
01 08:00
2024-03-
80 34567 10000 30 No
01 08:05
2024-03-
12345 22 2000 10 Yes
01 08:10
2024- 03- 203.0.11 192.168. 4000 15 Yes
01 3.2 1.20
Normalize the Data
Since FCM works best with normalized data, we scale each feature between 0 and 1 using Min-Max
Normalization:
Xnormalized​=

Feature Min Value Max Value


Source Port 80 34567
Destination Port 22 34567
Bytes Transferred 2000 1000
Packets Transferred 10 30

Normalized Values Calculation:


For each value, apply the Min-Max formula:

For Data Point A (34567, 80, 5000, 20)


Source Port: =1 .0
Destination Port:= = 0.0017
Bytes Transferred: = =0.375
Packets Transferred: =0.5
In same way calculate for given dataset.
Step 3: Initialize Clusters
We assume C = 2 clusters (Normal and Intrusion) and initialize cluster centers randomly:
• Cluster 1 (C1): (0.9, 0.5, 0.4, 0.6)
• Cluster 2 (C2): (0.2, 0.2, 0.1, 0.1)
Step 4: Compute Membership Values
■ Membership values uiju_{ij}uij​are calculated using:
uij​=
(We assume m=2m = 2m=2 for simplicity.)
Euclidean Distance Calculation
Distance between data point and cluster centroids:
D=
For A (1.000, 0.0017, 0.375, 0.5):
• Distance to Cluster 1:
• DA1​= ​
• = 0.543
• Distance to Cluster 2:
• DA2​=
• = 0.947
Now, calculating membership values:
uA1=
=0.752
uA2​=1−uA1​=0.248
Following similar calculations for B and C, we get:

Membership in Cluster Membership in Cluster


Data Point
1 2
A (Normal) 0.752 0.248
B (Normal) 0.905 0.095
C (Intrusion) 0.205 0.795
Step 5: Update Cluster Centers and Repeat
Using:
■ cj=
■ New centroids are computed and iterations continue until convergence.
Final Result:
• A and B are classified as Normal Traffic
• C is classified as Intrusion
■ Thus, Fuzzy C-Means successfully detected the intrusion based on network activity patterns.
c) To effectively distinguish between normal network activities and malicious behavior, we can
propose a hybrid AI-driven Intrusion Detection System (IDS) that integrates advanced machine
learning, anomaly detection, and real-time data analysis. Below is a structured approach:
1. Hybrid Detection Approach
A combination of signature-based detection (for known threats) and
anomaly-based detection (for new and evolving threats) ensures a
comprehensive security system.
a. Signature-Based Detection:
• Utilizes predefined attack signatures to detect known cyber threats.
• Incorporates continuously updated threat intelligence feeds to stay relevant.
b. Anomaly-Based Detection:
• Uses machine learning (ML) models to detect deviations from normal behavior.
• Identifies zero-day attacks and advanced persistent threats (APTs).
2. Advanced Machine Learning Algorithms for Threat Detection
a. Supervised Learning (For Classification of Attacks)
• Random Forest / XGBoost: Robust in handling imbalanced datasets.
• Deep Learning (CNN, LSTMs): Useful for time-series and sequential attack
patterns.
1. Hybrid Learning System (Unsupervised + Supervised + Online Learning)
Component Purpose Algorithm
Examples
Unsupervised Learn hidden patterns from raw data FCM,DBSCAN,Autoencoders
Supervised Classify known threats accurately XGBoost,Random Forest, SVM
Online ML Adapt to new behaviors in real time HoeffdingTrees,OnlinekNN

2. Real-Time Anomaly Detection with Autoencoders


Autoencoder (AE) learns a compressed representation of normal traffic.
If reconstruction error > threshold, it flags as anomaly.
Best for: DDoS detection, port scans, insider anomalies
3. Graph-Based Behavior Modeling
Represent network entities (IP, MAC, user, port, service) as nodes
Connections and interactions are edges
Use Graph Neural Networks (GNNs) or DeepWalk to detect abnormal flows or lateral movement.
Detects: Privilege escalation, lateral movement, command & control (C2) traffic.
4) Ensemble Techniques for Reduced False Positives
Combine the predictions from multiple models:
Supervised (RF, XGBoost)
Unsupervised (DBSCAN, Isolation Forest)
Neural models (LSTM, CNN, AE)
Use majority voting, stacking, or meta-classifiers
Boosts accuracy, balances precision vs. recall, minimizes overfitting.
OUTPUT
Thank you

You might also like