0% found this document useful (0 votes)

39 views8 pages

K-means Clustering of Driver Data

The document outlines a lab exercise on K-means clustering applied to a dataset of 4000 drivers, focusing on their mean distance driven per day and mean overspeed percentage. It details the steps taken to import libraries, load the dataset, visualize it with scatter plots, and apply K-means clustering for different values of K (3, 4, 5, and 6). The conclusion drawn from the visual inspection of the clustering results is that K=4 is the optimal value for effectively separating the clusters.

Uploaded by

alpeshoza

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

39 views8 pages

K-means Clustering of Driver Data

Uploaded by

alpeshoza

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

24/09/2024, 23:52 Labsheet2

Applications
(10 Marks) of Machine Learning Labsheet 2 - K-means Clustering of Drivers Data
Name: Alpesh Oza
Roll No. : [Link].U3CSC2107022
Given data consist of 4000 drivers with ID, mean_dist_day and mean_overspeed_perc.
Get the scatter plot of the dataset Apply K-means clustering algorithm with K=3, 4, 5 and 6. Plot the dataset as clusters Visually inspect
the plots and infer. According to you what is the apt value for K ?
Step 1: Import the Required Libraries
We start by importing the necessary libraries:
pandas: For handling and manipulating the dataset.
numpy: For numerical operations.
matplotlib: For plotting scatter plots and cluster results.
KMeans from [Link] : For applying K-means clustering on the data.
In [1]: import pandas as pd
import numpy as np
import [Link] as plt
from [Link] import KMeans

Step 2: Load the Dataset

We load the [Link] file, which contains 4000 drivers along with the following columns:
ID: Unique identifier for each driver.
localhost:8888/doc/tree/Desktop/[Link] 1/8
24/09/2024, 23:52 Labsheet2

mean_dist_day: Average distance driven per day.

mean_over_speed_perc: Percentage of time the driver spends overspeeding.
We then display the first few rows to understand the dataset structure.
In [2]: # Load the dataset
data = pd.read_csv('[Link]')

# Display the first few rows to understand the dataset

print([Link]())

id mean_dist_day mean_over_speed_perc
0 3423311935 71.24 28
1 3423313212 52.53 25
2 3423313724 64.54 27
3 3423311373 55.69 22
4 3423310999 54.58 25

Step 3: Plot the Scatter Plot of the Data

We visualize the dataset using a scatter plot where:
X-axis: Represents the mean distance driven per day.
Y-axis: Represents the mean percentage of time the driver spends overspeeding.
This plot helps us understand the distribution of the data before applying clustering.
In [3]: # Scatter plot of the dataset
[Link](data['mean_dist_day'], data['mean_over_speed_perc'], color='blue')
[Link]('Scatter plot of Drivers Data')
[Link]('Mean Distance per Day')
[Link]('Mean Overspeed Percentage')
[Link]()

localhost:8888/doc/tree/Desktop/[Link] 2/8
24/09/2024, 23:52 Labsheet2

Step 4: Apply K-means Clustering with Different Values of K

We apply K-means clustering using different values of K (number of clusters). Specifically, we try K=3, 4, 5, and 6.
For each value of K, the K-means algorithm assigns drivers to clusters based on their average distance driven and overspeeding
percentage.
The resulting clusters are plotted using different colors to represent different groups.
In [4]: # Function to plot K-means clusters with explicit n_init parameter
def plot_clusters(k, data):

localhost:8888/doc/tree/Desktop/[Link] 3/8
24/09/2024, 23:52 Labsheet2

kmeans = KMeans(n_clusters=k, n_init=10) # Explicitly set n_init to 10

data['cluster'] = kmeans.fit_predict(data[['mean_dist_day', 'mean_over_speed_perc']])

# Plot the clusters

[Link](data['mean_dist_day'], data['mean_over_speed_perc'], c=data['cluster'], cmap='viridis')
[Link](f'K-means Clustering with K={k}')
[Link]('Mean Distance per Day')
[Link]('Mean Overspeed Percentage')
[Link]()

# Apply K-means clustering with K=3, 4, 5, 6 and plot results

for k in [3, 4, 5, 6]:
plot_clusters(k, data)

localhost:8888/doc/tree/Desktop/[Link] 4/8
24/09/2024, 23:52 Labsheet2

localhost:8888/doc/tree/Desktop/[Link] 5/8
24/09/2024, 23:52 Labsheet2

localhost:8888/doc/tree/Desktop/[Link] 6/8
24/09/2024, 23:52 Labsheet2

Step 5: Determine the Optimal Value of K

After visually inspecting the clustering results for K=3, 4, 5, and 6, we infer the optimal value of K based on how well-separated the
clusters are.
A good value for K will have well-separated clusters with minimal overlap.
Step 6: Conclusion
localhost:8888/doc/tree/Desktop/[Link] 7/8
24/09/2024, 23:52 Labsheet2

Based on the visual inspection of the scatter plots, we conclude that K=4 is the optimal value for clustering the drivers' data. This value
provides the best separation of clusters, ensuring that each group of drivers is distinct based on their driving behavior (mean distance
per day and mean overspeed percentage).
In [ ]:

localhost:8888/doc/tree/Desktop/[Link] 8/8

K-Means Clustering for Driver Data
No ratings yet
K-Means Clustering for Driver Data
3 pages
Clustering Analysis of Driver Data
No ratings yet
Clustering Analysis of Driver Data
7 pages
Trip Clustering Analysis for Cab Service
No ratings yet
Trip Clustering Analysis for Cab Service
10 pages
DSE Assignment 6
No ratings yet
DSE Assignment 6
25 pages
Delivery Feet Data Using K Mean Clustering With Applied SPSS
No ratings yet
Delivery Feet Data Using K Mean Clustering With Applied SPSS
2 pages
K-Means Clustering Analysis for Cricketers
No ratings yet
K-Means Clustering Analysis for Cricketers
6 pages
K-Means Clustering for Cricketers
No ratings yet
K-Means Clustering for Cricketers
7 pages
K-means and Agglomerative Clustering Guide
No ratings yet
K-means and Agglomerative Clustering Guide
6 pages
K-Means Clustering Implementation Guide
No ratings yet
K-Means Clustering Implementation Guide
8 pages
K-means Clustering of Used Cars
No ratings yet
K-means Clustering of Used Cars
9 pages
Python Data Scaling and Clustering Techniques
No ratings yet
Python Data Scaling and Clustering Techniques
8 pages
K-Means Clustering Implementation in Python
No ratings yet
K-Means Clustering Implementation in Python
4 pages
K-Means Clustering with Iris Dataset
No ratings yet
K-Means Clustering with Iris Dataset
3 pages
K-means Clustering Implementation Guide
No ratings yet
K-means Clustering Implementation Guide
3 pages
K-Means Clustering Explained
No ratings yet
K-Means Clustering Explained
38 pages
Support Vector Machine
No ratings yet
Support Vector Machine
5 pages
K-means Clustering Implementation Guide
No ratings yet
K-means Clustering Implementation Guide
9 pages
Python Programs for Data Mining Techniques
No ratings yet
Python Programs for Data Mining Techniques
37 pages
K-Means Clustering Explained: Challenges & Implementation
No ratings yet
K-Means Clustering Explained: Challenges & Implementation
11 pages
Exp 9
No ratings yet
Exp 9
3 pages
Python Data Scaling and Clustering Methods
No ratings yet
Python Data Scaling and Clustering Methods
20 pages
AI ML Lab Book Akshay
No ratings yet
AI ML Lab Book Akshay
30 pages
Apriori Algorithm for Association Rules
No ratings yet
Apriori Algorithm for Association Rules
8 pages
Clustering Algorithms in Python: K-means & Agglomerative
No ratings yet
Clustering Algorithms in Python: K-means & Agglomerative
9 pages
K-Means Clustering: Unsupervised Learning
No ratings yet
K-Means Clustering: Unsupervised Learning
5 pages
K-Means Customer Segmentation Analysis
No ratings yet
K-Means Customer Segmentation Analysis
8 pages
K-Means Clustering Lab Report
No ratings yet
K-Means Clustering Lab Report
7 pages
AI ML HemNT
No ratings yet
AI ML HemNT
30 pages
Pattern Recognition Algorithms in AI
No ratings yet
Pattern Recognition Algorithms in AI
28 pages
Data S7
No ratings yet
Data S7
2 pages
K-Means Clustering for Data Science
No ratings yet
K-Means Clustering for Data Science
10 pages
Clustering Analysis with Python
No ratings yet
Clustering Analysis with Python
4 pages
Ids Lab12 27112025 031835pm
No ratings yet
Ids Lab12 27112025 031835pm
12 pages
K-Means Clustering with Scikit-Learn
No ratings yet
K-Means Clustering with Scikit-Learn
6 pages
K-Means Clustering in Machine Learning
No ratings yet
K-Means Clustering in Machine Learning
12 pages
K-Means Clustering Experiment Guide
No ratings yet
K-Means Clustering Experiment Guide
9 pages
Business Analytics: Unsupervised Learning Assignment
No ratings yet
Business Analytics: Unsupervised Learning Assignment
27 pages
Optimized K-Means for Banking Retention
No ratings yet
Optimized K-Means for Banking Retention
7 pages
Image Clustering with K-Means in Python
No ratings yet
Image Clustering with K-Means in Python
3 pages
Clustering Analysis: K-Means vs EM
No ratings yet
Clustering Analysis: K-Means vs EM
6 pages
K-Means Clustering in Python Guide
No ratings yet
K-Means Clustering in Python Guide
2 pages
Min-Max Normalization in Data Sets
No ratings yet
Min-Max Normalization in Data Sets
43 pages
Olympic Decathlon PCA and Clustering Analysis
No ratings yet
Olympic Decathlon PCA and Clustering Analysis
7 pages
K-Means Clustering Guide and Analysis
No ratings yet
K-Means Clustering Guide and Analysis
20 pages
WINSEM2025-26 CSE3008 ELA AP2025264000564 2026-01-27 Reference-Material-I
No ratings yet
WINSEM2025-26 CSE3008 ELA AP2025264000564 2026-01-27 Reference-Material-I
3 pages
K-mean Clustering in Python Guide
No ratings yet
K-mean Clustering in Python Guide
5 pages
K-Means Clustering Overview 2025
No ratings yet
K-Means Clustering Overview 2025
19 pages
Statistical Methods for Machine Learning
No ratings yet
Statistical Methods for Machine Learning
37 pages
MS6711 Homework 1: Data Mining Tasks
No ratings yet
MS6711 Homework 1: Data Mining Tasks
6 pages
KMeans and EM Clustering Analysis
No ratings yet
KMeans and EM Clustering Analysis
2 pages
K-Means Clustering Lab with Sklearn
No ratings yet
K-Means Clustering Lab with Sklearn
21 pages
K-Means Algorithm
No ratings yet
K-Means Algorithm
29 pages
Clustering in Unsupervised Learning
No ratings yet
Clustering in Unsupervised Learning
15 pages
K-means++ Algorithm for Improved Clustering
No ratings yet
K-means++ Algorithm for Improved Clustering
5 pages
Python K-means Clustering Guide
No ratings yet
Python K-means Clustering Guide
8 pages
Advanced Machine Learning Experiments
No ratings yet
Advanced Machine Learning Experiments
15 pages
Data Preparation and Outlier Handling
No ratings yet
Data Preparation and Outlier Handling
52 pages
K-Means Clustering Analysis Guide
No ratings yet
K-Means Clustering Analysis Guide
10 pages
Bellman-Ford vs Dijkstra Algorithms
No ratings yet
Bellman-Ford vs Dijkstra Algorithms
22 pages
Unsupervised Learning: K-means & Apriori
No ratings yet
Unsupervised Learning: K-means & Apriori
73 pages
Heuristic Algorithms and Examples
No ratings yet
Heuristic Algorithms and Examples
3 pages
Python Sorting Techniques: Selection & Bubble
No ratings yet
Python Sorting Techniques: Selection & Bubble
12 pages
Grade 8 Pseudocode Basics and Examples
No ratings yet
Grade 8 Pseudocode Basics and Examples
15 pages
Rubik's Cube Solver: BFS, DFS, A* Methods
No ratings yet
Rubik's Cube Solver: BFS, DFS, A* Methods
6 pages
Automated Program Verification Methods
No ratings yet
Automated Program Verification Methods
51 pages
CSC402 Algorithm Analysis: Key Concepts
No ratings yet
CSC402 Algorithm Analysis: Key Concepts
3 pages
Linear Regression Techniques in Python
No ratings yet
Linear Regression Techniques in Python
25 pages
Data Structures Lab Assignment Guide
No ratings yet
Data Structures Lab Assignment Guide
1 page
Book PDF
No ratings yet
Book PDF
516 pages
Understanding Reinforcement Learning Concepts
No ratings yet
Understanding Reinforcement Learning Concepts
7 pages
Extended Binary Tree Overview
100% (1)
Extended Binary Tree Overview
2 pages
Texago Shipping Cost Analysis
No ratings yet
Texago Shipping Cost Analysis
7 pages
DFT and FFT Practice Questions
No ratings yet
DFT and FFT Practice Questions
3 pages
Adiabatic Algorithm and QMA Completeness
No ratings yet
Adiabatic Algorithm and QMA Completeness
10 pages
Merge Sort Algorithm Overview
No ratings yet
Merge Sort Algorithm Overview
33 pages
Divide and Conquer Algorithms Overview
No ratings yet
Divide and Conquer Algorithms Overview
78 pages
Greedy Algorithms and Applications
No ratings yet
Greedy Algorithms and Applications
52 pages
Substitution Method for Recurrence Relations
No ratings yet
Substitution Method for Recurrence Relations
83 pages
Classical and Hierarchical Planning Approaches
No ratings yet
Classical and Hierarchical Planning Approaches
15 pages
AI Quiz Questions and Answers
0% (1)
AI Quiz Questions and Answers
4 pages
Algorithm Design and Analysis Guide
No ratings yet
Algorithm Design and Analysis Guide
26 pages
Introduction to Computing Concepts
No ratings yet
Introduction to Computing Concepts
151 pages
CFG and CNF in Automata Theory
No ratings yet
CFG and CNF in Automata Theory
40 pages
Tractable Problems in Complexity Theory
No ratings yet
Tractable Problems in Complexity Theory
16 pages
iPhone Purchase Prediction Assignment
No ratings yet
iPhone Purchase Prediction Assignment
3 pages
Binary Subtraction and Arithmetic Rules
No ratings yet
Binary Subtraction and Arithmetic Rules
35 pages
Deadlock Detection with PySpark
No ratings yet
Deadlock Detection with PySpark
4 pages
Data Compression Practice Questions
No ratings yet
Data Compression Practice Questions
10 pages

K-means Clustering of Driver Data

Uploaded by

K-means Clustering of Driver Data

Uploaded by

24/09/2024, 23:52 Labsheet2

Step 2: Load the Dataset

mean_dist_day: Average distance driven per day.

# Display the first few rows to understand the dataset

Step 3: Plot the Scatter Plot of the Data

Step 4: Apply K-means Clustering with Different Values of K

kmeans = KMeans(n_clusters=k, n_init=10) # Explicitly set n_init to 10

# Plot the clusters

# Apply K-means clustering with K=3, 4, 5, 6 and plot results

Step 5: Determine the Optimal Value of K

You might also like