2 - Jupyter Notebook
2 - Jupyter Notebook
In [3]: df = pd.read_csv('emails.csv')
In [4]: df
Out[4]: Email
the to ect and for of a you hou ... connevey jay valued lay infras
No.
Email
0 0 0 1 0 0 0 2 0 0 ... 0 0 0 0
1
Email
1 8 13 24 6 6 2 102 1 27 ... 0 0 0 0
2
Email
2 0 0 1 0 0 0 8 0 0 ... 0 0 0 0
3
Email
3 0 5 22 0 5 1 51 2 10 ... 0 0 0 0
4
Email
4 7 6 17 1 5 2 57 0 9 ... 0 0 0 0
5
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
Email
5167 2 2 2 3 0 0 32 0 0 ... 0 0 0 0
5168
Email
5168 35 27 11 2 6 5 151 4 3 ... 0 0 0 0
5169
Email
5169 0 0 1 1 0 0 11 0 0 ... 0 0 0 0
5170
Email
5170 2 7 1 0 2 1 28 2 0 ... 0 0 0 0
5171
Email
5171 22 24 5 1 6 5 148 8 2 ... 0 0 0 0
5172
localhost:8888/notebooks/Desktop/B190594295/2.ipynb 1/6
08/11/2023, 13:02 2 - Jupyter Notebook
In [5]: df.describe()
In [6]: df.shape
In [7]: df.isnull().any()
localhost:8888/notebooks/Desktop/B190594295/2.ipynb 2/6
08/11/2023, 13:02 2 - Jupyter Notebook
Out[8]: the to ect and for of a you hou in ... connevey jay valued lay infrastru
0 0 0 1 0 0 0 2 0 0 0 ... 0 0 0 0
1 8 13 24 6 6 2 102 1 27 18 ... 0 0 0 0
2 0 0 1 0 0 0 8 0 0 4 ... 0 0 0 0
3 0 5 22 0 5 1 51 2 10 1 ... 0 0 0 0
4 7 6 17 1 5 2 57 0 9 3 ... 0 0 0 0
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
5167 2 2 2 3 0 0 32 0 0 5 ... 0 0 0 0
5169 0 0 1 1 0 0 11 0 0 1 ... 0 0 0 0
5170 2 7 1 0 2 1 28 2 0 8 ... 0 0 0 0
In [9]: df.columns
Out[9]: Index(['the', 'to', 'ect', 'and', 'for', 'of', 'a', 'you', 'hou', 'in',
...
'connevey', 'jay', 'valued', 'lay', 'infrastructure', 'military',
'allowing', 'ff', 'dry', 'Prediction'],
dtype='object', length=3001)
In [10]: df.Prediction.unique()
localhost:8888/notebooks/Desktop/B190594295/2.ipynb 3/6
08/11/2023, 13:02 2 - Jupyter Notebook
In [12]: df
Out[12]: the to ect and for of a you hou in ... connevey jay valued lay infrastru
0 0 0 1 0 0 0 2 0 0 0 ... 0 0 0 0
1 8 13 24 6 6 2 102 1 27 18 ... 0 0 0 0
2 0 0 1 0 0 0 8 0 0 4 ... 0 0 0 0
3 0 5 22 0 5 1 51 2 10 1 ... 0 0 0 0
4 7 6 17 1 5 2 57 0 9 3 ... 0 0 0 0
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
5167 2 2 2 3 0 0 32 0 0 5 ... 0 0 0 0
5169 0 0 1 1 0 0 11 0 0 1 ... 0 0 0 0
5170 2 7 1 0 2 1 28 2 0 8 ... 0 0 0 0
In [13]: X = df.drop(columns='Prediction',axis = 1)
Y = df['Prediction']
In [14]: X.columns
Out[14]: Index(['the', 'to', 'ect', 'and', 'for', 'of', 'a', 'you', 'hou', 'in',
...
'enhancements', 'connevey', 'jay', 'valued', 'lay', 'infrastructur
e',
'military', 'allowing', 'ff', 'dry'],
dtype='object', length=3000)
In [15]: Y.head()
In [17]: KN = KNeighborsClassifier
knn = KN(n_neighbors=7)
knn.fit(x_train, y_train)
y_pred = knn.predict(x_test)
localhost:8888/notebooks/Desktop/B190594295/2.ipynb 4/6
08/11/2023, 13:02 2 - Jupyter Notebook
Prediction:
['Not spam' 'Spam' 'Not spam' ... 'Not spam' 'Not spam' 'Not spam']
In [19]: M = metrics.accuracy_score(y_test,y_pred)
print("KNN accuracy: ", M)
In [20]: C = metrics.confusion_matrix(y_test,y_pred)
print("Confusion matrix: ", C)
In [22]: n = metrics.accuracy_score(y_test,y_pred)
print("SVM accuracy: ", n)
In [24]: df = pd.DataFrame({
'Model Name': ['KNN', 'SVM'],
'Accuracy Score': [87.05, 90.14]
})
In [25]: df
0 KNN 87.05
1 SVM 90.14
In [ ]:
In [ ]:
localhost:8888/notebooks/Desktop/B190594295/2.ipynb 5/6
08/11/2023, 13:02 2 - Jupyter Notebook
localhost:8888/notebooks/Desktop/B190594295/2.ipynb 6/6