0% found this document useful (0 votes)

13 views6 pages

Bi 6 New

The document outlines an assignment by Manjiri Makode on data classification and clustering using Python. It includes the implementation of a K-Nearest Neighbors (KNN) classifier and a Support Vector Machine (SVM) classifier on a dataset of emails, providing metrics such as accuracy, precision, recall, and F1 score for both models. Additionally, it demonstrates clustering using K-Means on the Iris dataset, visualizing the results with PCA.

Uploaded by

jshruti6896

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

13 views6 pages

Bi 6 New

Uploaded by

jshruti6896

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

Name: Manjiri Makode

Roll No: 2441015

Batch: C
Assignment No.05: Perform the data classification using classification algorithm. Or Perform the data clustering
using clustering algorithm.

1. Classification
In [1]: import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
%matplotlib inline
import warnings
warnings.filterwarnings('ignore')
from sklearn import metrics

In [2]: df=pd.read_csv('emails.csv')

In [3]: df.head()
Out[3]:
Email
the to ect and for of a you hou ... connevey jay valued lay infrastructure military allowin
No.

Email
0 0 0 1 0 0 0 2 0 0 ... 0 0 0 0 0 0
1

Email
1 8 13 24 6 6 2 102 1 27 ... 0 0 0 0 0 0
2

Email
2 0 0 1 0 0 0 8 0 0 ... 0 0 0 0 0 0
3

Email
3 0 5 22 0 5 1 51 2 10 ... 0 0 0 0 0 0
4

Email
4 7 6 17 1 5 2 57 0 9 ... 0 0 0 0 0 0
5

5 rows × 3002 columns

 

In [4]: df.tail()

Out[4]:
Email
the to ect and for of a you hou ... connevey jay valued lay infrastructure military allo
No.

Email
5167 2 2 2 3 0 0 32 0 0 ... 0 0 0 0 0 0
5168

Email
5168 35 27 11 2 6 5 151 4 3 ... 0 0 0 0 0 0
5169

Email
5169 0 0 1 1 0 0 11 0 0 ... 0 0 0 0 0 0
5170

Email
5170 2 7 1 0 2 1 28 2 0 ... 0 0 0 0 0 0
5171

Email
5171 22 24 5 1 6 5 148 8 2 ... 0 0 0 0 0 0
5172

5 rows × 3002 columns

 
In [5]: df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5172 entries, 0 to 5171
Columns: 3002 entries, Email No. to Prediction
dtypes: int64(3001), object(1)
memory usage: 118.5+ MB

In [6]: df.describe()

Out[6]:
the to ect and for of a you

count 5172.000000 5172.000000 5172.000000 5172.000000 5172.000000 5172.000000 5172.000000 5172.000000 5

mean 6.640565 6.188128 5.143852 3.075599 3.124710 2.627030 55.517401 2.466551

std 11.745009 9.534576 14.101142 6.045970 4.680522 6.229845 87.574172 4.314444

min 0.000000 0.000000 1.000000 0.000000 0.000000 0.000000 0.000000 0.000000

25% 0.000000 1.000000 1.000000 0.000000 1.000000 0.000000 12.000000 0.000000

50% 3.000000 3.000000 1.000000 1.000000 2.000000 1.000000 28.000000 1.000000

75% 8.000000 7.000000 4.000000 3.000000 4.000000 2.000000 62.250000 3.000000

max 210.000000 132.000000 344.000000 89.000000 47.000000 77.000000 1898.000000 70.000000

8 rows × 3001 columns

 

In [7]: df.columns

Out[7]: Index(['Email No.', 'the', 'to', 'ect', 'and', 'for', 'of', 'a', 'you', 'hou',
...
'connevey', 'jay', 'valued', 'lay', 'infrastructure', 'military',
'allowing', 'ff', 'dry', 'Prediction'],
dtype='object', length=3002)

In [8]: df.dtypes

Out[8]: Email No. object

the int64
to int64
ect int64
and int64
...
military int64
allowing int64
ff int64
dry int64
Prediction int64
Length: 3002, dtype: object

In [9]: df.size

Out[9]: 15526344
In [10]: df.isna().sum()
Out[10]: Email No. 0
the 0
to 0
ect 0
and 0
..
military 0
allowing 0
ff 0
dry 0
Prediction 0
Length: 3002, dtype: int64

In [11]: df.dropna(inplace=True)

In [12]: df.drop(['Email No.'],axis=1,inplace=True)

In [13]: X = df.drop(['Prediction'],axis = 1)
X

Out[13]:
the to ect and for of a you hou in ... enhancements connevey jay valued lay infrastructure

0 0 0 1 0 0 0 2 0 0 0 ... 0 0 0 0 0 0

1 8 13 24 6 6 2 102 1 27 18 ... 0 0 0 0 0 0

2 0 0 1 0 0 0 8 0 0 4 ... 0 0 0 0 0 0

3 0 5 22 0 5 1 51 2 10 1 ... 0 0 0 0 0 0

4 7 6 17 1 5 2 57 0 9 3 ... 0 0 0 0 0 0

... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...

5167 2 2 2 3 0 0 32 0 0 5 ... 0 0 0 0 0 0

5168 35 27 11 2 6 5 151 4 3 23 ... 0 0 0 0 0 0

5169 0 0 1 1 0 0 11 0 0 1 ... 0 0 0 0 0 0

5170 2 7 1 0 2 1 28 2 0 8 ... 0 0 0 0 0 0

5171 22 24 5 1 6 5 148 8 2 23 ... 0 0 0 0 0 0

5172 rows × 3000 columns

 

In [14]: y = df['Prediction']
y
Out[14]: 0 0
1 0
2 0
3 0
4 0
..
5167 0
5168 0
5169 1
5170 1
5171 0
Name: Prediction, Length: 5172, dtype: int64

In [15]: from sklearn.model_selection import train_test_split

from sklearn.preprocessing import scale
x = scale(X)
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size = 0.3, random_state = 42)
KNN Classifier
In [16]: from sklearn.neighbors import KNeighborsClassifier
knn= KNeighborsClassifier(n_neighbors=7)
knn.fit(x_train,y_train)
y_pred=knn.predict(x_test)

In [17]: print("Prediction",y_pred)

Prediction [0 0 1 ... 1 1 1]

In [18]: print("Confusion Matrix: ")

print(metrics.confusion_matrix(y_true=y_test, y_pred=y_pred))
plt.show()

Confusion Matrix:
[[804 293]
[ 16 439]]

In [19]: print("KNN Accuracy: ",metrics.accuracy_score(y_test,y_pred))

KNN Accuracy: 0.8009020618556701

In [20]: print("KNN Precision score: ",metrics.precision_score(y_test,y_pred))

KNN Precision score: 0.5997267759562842

In [21]: print("KNN Recall score: ",metrics.recall_score(y_test,y_pred))

KNN Recall score: 0.9648351648351648

In [22]: print("KNN F1 Score: ",metrics.f1_score(y_test,y_pred))

KNN F1 Score: 0.7396798652064027

In [23]: print("Classification Report:\n", metrics.classification_report(y_test, y_pred,

target_names=["Not Spam", "Spam"]))

Classification Report:
precision recall f1-score support

Not Spam 0.98 0.73 0.84 1097

Spam 0.60 0.96 0.74 455

accuracy 0.80 1552

macro avg 0.79 0.85 0.79 1552
weighted avg 0.87 0.80 0.81 1552

SVM Classifier
In [24]: from sklearn.svm import SVC
model=SVC(C=1)
model.fit(x_train,y_train)
y_pred=model.predict(x_test)

In [25]: print('Confusion Matrix: ')

print(metrics.confusion_matrix(y_true=y_test, y_pred=y_pred))

Confusion Matrix:
[[1091 6]
[ 90 365]]
In [26]: print("SVM accuracy: ",metrics.accuracy_score(y_test,y_pred))

SVM accuracy: 0.9381443298969072

In [27]: print("SVM Precision score: ",metrics.precision_score(y_test,y_pred))

SVM Precision score: 0.9838274932614556

In [28]: print("SVM Recall score: ",metrics.recall_score(y_test,y_pred))

SVM Recall score: 0.8021978021978022

In [29]: print("SVM F1 Score: ",metrics.f1_score(y_test,y_pred))

SVM F1 Score: 0.8837772397094431

In [30]: print("SVM Classification Report:\n", metrics.classification_report(y_test, y_pred,

target_names=["Not Spam", "Spam"]))

SVM Classification Report:

precision recall f1-score support

Not Spam 0.92 0.99 0.96 1097

Spam 0.98 0.80 0.88 455

accuracy 0.94 1552

macro avg 0.95 0.90 0.92 1552
weighted avg 0.94 0.94 0.94 1552

2. Clustering using K-Means

In [31]: from sklearn.datasets import load_iris
from sklearn.cluster import KMeans
import matplotlib.pyplot as plt
from sklearn.decomposition import PCA

In [32]: # Load dataset

iris = load_iris()
X = iris.data

In [33]: # Clustering
kmeans = KMeans(n_clusters=3, random_state=42)
labels = kmeans.fit_predict(X)

In [34]: # Visualizing using PCA

pca = PCA(n_components=2)
X_pca = pca.fit_transform(X)
In [35]: plt.scatter(X_pca[:, 0], X_pca[:, 1], c=labels, cmap='viridis')
plt.title("K-Means Clustering (PCA Reduced)")
plt.xlabel("PCA Component 1")
plt.ylabel("PCA Component 2")
plt.show()

Batch Control IsA 9 21 2010
100% (2)
Batch Control IsA 9 21 2010
52 pages
Spouses Cha Vs CA GR No. 124520
No ratings yet
Spouses Cha Vs CA GR No. 124520
2 pages
Design Deliverables
No ratings yet
Design Deliverables
11 pages
Standard Costs and Variance Analysis Part 3
100% (2)
Standard Costs and Variance Analysis Part 3
6 pages
Service and Parts Frymaster Bigl30 Series Manual Lov™ Gas Fryer
No ratings yet
Service and Parts Frymaster Bigl30 Series Manual Lov™ Gas Fryer
75 pages
The Ailing Planet Notes
100% (1)
The Ailing Planet Notes
4 pages
Scikit Learn Cheat Sheet Python
No ratings yet
Scikit Learn Cheat Sheet Python
1 page
EOI 2019 01 Website PDF
No ratings yet
EOI 2019 01 Website PDF
15 pages
Case Study - Classifier
No ratings yet
Case Study - Classifier
5 pages
Muhammad Okasha - KFUPM GRADUATE PROGRAMS - Online Application - Regular Programs - 241
No ratings yet
Muhammad Okasha - KFUPM GRADUATE PROGRAMS - Online Application - Regular Programs - 241
8 pages
IGCSE-OL Geo CB Answers Theme 2 Natural Environment
100% (1)
IGCSE-OL Geo CB Answers Theme 2 Natural Environment
55 pages
Purple Ocean Strategy
No ratings yet
Purple Ocean Strategy
11 pages
ML Lab 8
No ratings yet
ML Lab 8
9 pages
E Ticket
No ratings yet
E Ticket
2 pages
Scikit-Learn: Scikit-Learn Is An Open Source Python Library That
100% (1)
Scikit-Learn: Scikit-Learn Is An Open Source Python Library That
1 page
Scikit-Learn Cheat Sheet Python For Data Science: Preprocessing The Data Evaluate Your Model's Performance
100% (1)
Scikit-Learn Cheat Sheet Python For Data Science: Preprocessing The Data Evaluate Your Model's Performance
1 page
Machine Learning
No ratings yet
Machine Learning
3 pages
Architect Thomas Doerr
No ratings yet
Architect Thomas Doerr
7 pages
ML Lab Programs (1-13)
No ratings yet
ML Lab Programs (1-13)
44 pages
A Study ON "Training and Development"
No ratings yet
A Study ON "Training and Development"
83 pages
DSSF RefGuide 16-17 11-16-16
No ratings yet
DSSF RefGuide 16-17 11-16-16
41 pages
Python For Data Science Cheat Sheet: Scikit-Learn Create Your Model Evaluate Your Model's Performance
100% (1)
Python For Data Science Cheat Sheet: Scikit-Learn Create Your Model Evaluate Your Model's Performance
1 page
Fee Structure 2024 25 MBBS
No ratings yet
Fee Structure 2024 25 MBBS
1 page
Maxbox Starter60 Machine Learning
No ratings yet
Maxbox Starter60 Machine Learning
8 pages
(Communication Electronic Circuits) Preface
No ratings yet
(Communication Electronic Circuits) Preface
2 pages
AOA 2023 Solution
No ratings yet
AOA 2023 Solution
25 pages
Act 8
No ratings yet
Act 8
20 pages
NHB Ebook Wet Markets
No ratings yet
NHB Ebook Wet Markets
19 pages
CP4252 Machine Learning Lab Manual
No ratings yet
CP4252 Machine Learning Lab Manual
33 pages
Land and Inland
No ratings yet
Land and Inland
28 pages
Data Preprocessing
No ratings yet
Data Preprocessing
9 pages
Lab 1 - Machine Learning with Python - ML Engineering مهم
No ratings yet
Lab 1 - Machine Learning with Python - ML Engineering مهم
10 pages
Machine Learning Lab New
No ratings yet
Machine Learning Lab New
14 pages
Machine Learning With Python - Machine Learning Algorithms - KNN
No ratings yet
Machine Learning With Python - Machine Learning Algorithms - KNN
15 pages
EX - NO:3: Algorithm
No ratings yet
EX - NO:3: Algorithm
11 pages
Pratham ML
No ratings yet
Pratham ML
14 pages
Res Net
No ratings yet
Res Net
13 pages
Assignment 3
No ratings yet
Assignment 3
7 pages
02 - Email - Spam - Ipynb - Colab
No ratings yet
02 - Email - Spam - Ipynb - Colab
11 pages
Data Mining
No ratings yet
Data Mining
16 pages
Fyp 4
No ratings yet
Fyp 4
12 pages
Action Plan: Department of Education
No ratings yet
Action Plan: Department of Education
3 pages
Unit2 ML Programs
No ratings yet
Unit2 ML Programs
7 pages
Remaining ML Program
No ratings yet
Remaining ML Program
12 pages
I Avaliação Parcial - 25.0 PTS - Gabarito
No ratings yet
I Avaliação Parcial - 25.0 PTS - Gabarito
9 pages
ML Practical Kiranjot 6-10
No ratings yet
ML Practical Kiranjot 6-10
10 pages
02 Activity 1 READING WRITING
No ratings yet
02 Activity 1 READING WRITING
5 pages
Csas Allocation (Dufresher24)
No ratings yet
Csas Allocation (Dufresher24)
4 pages
HPC Report 1
No ratings yet
HPC Report 1
12 pages
Unit 3
No ratings yet
Unit 3
3 pages
Predict The Price of The Uber Ride From A Given Pickup Point To The Agreed Drop-Off Location
No ratings yet
Predict The Price of The Uber Ride From A Given Pickup Point To The Agreed Drop-Off Location
9 pages
Machine Learning Assignment
No ratings yet
Machine Learning Assignment
8 pages
ML Practical 2
No ratings yet
ML Practical 2
6 pages
ML 2 16
No ratings yet
ML 2 16
6 pages
ML Practical 2D
No ratings yet
ML Practical 2D
6 pages
Implementing KNN Algorithm: Importing Libraries
No ratings yet
Implementing KNN Algorithm: Importing Libraries
6 pages
Naive Bayes Gaussian Table Tennis - Jupyter Notebook
No ratings yet
Naive Bayes Gaussian Table Tennis - Jupyter Notebook
6 pages
Machine Learning Assignment
No ratings yet
Machine Learning Assignment
7 pages
Python Code For KNN Classifier 1. Initial Message
No ratings yet
Python Code For KNN Classifier 1. Initial Message
7 pages
Name: Mussab Bin Shahid Sap-Id: 2024 Assignment: Machine-Learning
No ratings yet
Name: Mussab Bin Shahid Sap-Id: 2024 Assignment: Machine-Learning
5 pages
Preparation and Applications of Foam Ceramics
No ratings yet
Preparation and Applications of Foam Ceramics
6 pages
ML Mini Project
No ratings yet
ML Mini Project
9 pages
Aam Codes
No ratings yet
Aam Codes
8 pages
Practical - 5 - 52
No ratings yet
Practical - 5 - 52
4 pages
Estrada V Sandiganbayan Digest
No ratings yet
Estrada V Sandiganbayan Digest
2 pages
ML Lab6
No ratings yet
ML Lab6
4 pages
SPPUML5
No ratings yet
SPPUML5
4 pages
Assignment No 2 - ML - Output
No ratings yet
Assignment No 2 - ML - Output
4 pages
SanatKulkarni - AP22110010183 - Assignment5
No ratings yet
SanatKulkarni - AP22110010183 - Assignment5
8 pages
Dsbda 10
No ratings yet
Dsbda 10
5 pages
KNN - Predictive Analysis
No ratings yet
KNN - Predictive Analysis
6 pages
Scikit-Learn Cheat Sheet
No ratings yet
Scikit-Learn Cheat Sheet
1 page
Smile-B3-Plus: Residential Series 3 KW Inverter
No ratings yet
Smile-B3-Plus: Residential Series 3 KW Inverter
2 pages
KNN
No ratings yet
KNN
4 pages
ML 5
No ratings yet
ML 5
3 pages
Scikit-Learn Cheat Sheet
No ratings yet
Scikit-Learn Cheat Sheet
1 page
Pract5 1
No ratings yet
Pract5 1
3 pages
Import As Import As Import As Import As From Import
No ratings yet
Import As Import As Import As Import As From Import
3 pages
Project-4 (KNN CLASSIFICATION) (2) PRANAB
No ratings yet
Project-4 (KNN CLASSIFICATION) (2) PRANAB
2 pages
Ml-Exp-2 - Jupyter Notebook
No ratings yet
Ml-Exp-2 - Jupyter Notebook
2 pages
Scaling in One Range: 5172 Rows × 3002 Columns
No ratings yet
Scaling in One Range: 5172 Rows × 3002 Columns
2 pages
Addition - Ipynb - Colab
No ratings yet
Addition - Ipynb - Colab
2 pages
Prac7 23bme053
No ratings yet
Prac7 23bme053
2 pages
Quiz No. 2
No ratings yet
Quiz No. 2
1 page
CB2201 5
No ratings yet
CB2201 5
1 page
ML 2
No ratings yet
ML 2
1 page
HTML in A Day For Digital Marketing Pro Course
No ratings yet
HTML in A Day For Digital Marketing Pro Course
1 page
DATA SCI Ex12 KNN-correct and Wrong Predictions
No ratings yet
DATA SCI Ex12 KNN-correct and Wrong Predictions
1 page
3D Printing For Dummies
From Everand
3D Printing For Dummies
Richard Horne
3.5/5 (3)
Modern encryption and virtual security
From Everand
Modern encryption and virtual security
Dr mahdi pourdehghan
No ratings yet
Programming PowerPoint With VBA Straight to the Point
From Everand
Programming PowerPoint With VBA Straight to the Point
Eduardo N Sanchez
No ratings yet
The Smart Math Tricks Secrets to Solving Math Fast and Easy
From Everand
The Smart Math Tricks Secrets to Solving Math Fast and Easy
Leonardo Cruz
No ratings yet
Let's Get Coding
From Everand
Let's Get Coding
Philip Searle
No ratings yet
Microsoft Visual Basic Interview Questions: Microsoft VB Certification Review
From Everand
Microsoft Visual Basic Interview Questions: Microsoft VB Certification Review
Equity Press
No ratings yet