0% found this document useful (0 votes)
25 views2 pages

Ml-Exp-2 - Jupyter Notebook

Uploaded by

engageelite1407
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
25 views2 pages

Ml-Exp-2 - Jupyter Notebook

Uploaded by

engageelite1407
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 2

10/20/24, 11:16 PM ml-exp-2 - Jupyter Notebook

In [24]:  # Import necessary libraries


import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.neighbors import KNeighborsClassifier
from sklearn.svm import SVC
from sklearn.metrics import classification_report, accuracy_score

# Load the dataset
url = '/kaggle/input/email-spam-classification-dataset-csv/emails.csv'
df = pd.read_csv(url)

# Display the first few rows of the dataset
#print("Initial Dataset:")
#print(df.head())

# Check unique values in 'Prediction' before mapping
#print("\nUnique values in 'Prediction' before mapping:")
#print(df['Prediction'].unique())

# Drop rows where 'Prediction' is NaN to avoid issues
#df = df.dropna(subset=['Prediction'])

# Check the size of the DataFrame after dropping NaNs
#print(f"\nNumber of rows after dropping NaNs: {len(df)}")

# Check unique values in 'Prediction' after dropping NaNs
#print("\nUnique values in 'Prediction' after dropping NaNs:")
#print(df['Prediction'].unique())

# Split dataset into features and labels
X = df['Email No.']
y = df['Prediction']

# Check if the data has sufficient samples
if len(X) == 0 or len(y) == 0:
raise ValueError("The dataset is empty after preprocessing. Please

# Split the dataset into training and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2

# Convert the text data into numerical data using TF-IDF vectorization
tfidf = TfidfVectorizer(stop_words='english', max_df=0.7)
X_train_tfidf = tfidf.fit_transform(X_train)
X_test_tfidf = tfidf.transform(X_test)

# K-Nearest Neighbors Classifier
knn = KNeighborsClassifier(n_neighbors=5)
knn.fit(X_train_tfidf, y_train)
y_pred_knn = knn.predict(X_test_tfidf)

# Support Vector Machine Classifier
svm = SVC()
svm.fit(X_train_tfidf, y_train)
y_pred_svm = svm.predict(X_test_tfidf)

# Evaluate the models
print("K-Nearest Neighbors (KNN) Results:")
print(classification_report(y_test, y_pred_knn))
print(f"Accuracy: {accuracy_score(y_test, y_pred_knn) * 100:.2f}%")

localhost:8888/notebooks/Downloads/ml-exp-2.ipynb 3/5
10/20/24, 11:16 PM ml-exp-2 - Jupyter Notebook
print("\nSupport Vector Machine (SVM) Results:")
print(classification_report(y_test, y_pred_svm))
print(f"Accuracy: {accuracy_score(y_test, y_pred_svm) * 100:.2f}%")

K-Nearest Neighbors (KNN) Results:


precision recall f1-score support

0 0.71 1.00 0.83 739


1 0.00 0.00 0.00 296

accuracy 0.71 1035


macro avg 0.36 0.50 0.42 1035
weighted avg 0.51 0.71 0.59 1035

Accuracy: 71.40%

Support Vector Machine (SVM) Results:


precision recall f1-score support

0 0.71 1.00 0.83 739


1 0.00 0.00 0.00 296

accuracy 0.71 1035


macro avg 0.36 0.50 0.42 1035
weighted avg 0.51 0.71 0.59 1035

Accuracy: 71.40%

/opt/conda/lib/python3.10/site-packages/sklearn/metrics/_classificati
on.py:1344: UndefinedMetricWarning: Precision and F-score are ill-def
ined and being set to 0.0 in labels with no predicted samples. Use `z
ero_division` parameter to control this behavior.
_warn_prf(average, modifier, msg_start, len(result))
/opt/conda/lib/python3.10/site-packages/sklearn/metrics/_classificati
on.py:1344: UndefinedMetricWarning: Precision and F-score are ill-def
ined and being set to 0.0 in labels with no predicted samples. Use `z
ero_division` parameter to control this behavior.
_warn_prf(average, modifier, msg_start, len(result))
/opt/conda/lib/python3.10/site-packages/sklearn/metrics/_classificati
on.py:1344: UndefinedMetricWarning: Precision and F-score are ill-def
ined and being set to 0.0 in labels with no predicted samples. Use `z
ero_division` parameter to control this behavior.
_warn_prf(average, modifier, msg_start, len(result))
/opt/conda/lib/python3.10/site-packages/sklearn/metrics/_classificati
on.py:1344: UndefinedMetricWarning: Precision and F-score are ill-def
ined and being set to 0.0 in labels with no predicted samples. Use `z
ero_division` parameter to control this behavior.
_warn_prf(average, modifier, msg_start, len(result))
/opt/conda/lib/python3.10/site-packages/sklearn/metrics/_classificati
on.py:1344: UndefinedMetricWarning: Precision and F-score are ill-def
ined and being set to 0.0 in labels with no predicted samples. Use `z
ero_division` parameter to control this behavior.
_warn_prf(average, modifier, msg_start, len(result))
/opt/conda/lib/python3.10/site-packages/sklearn/metrics/_classificati
on.py:1344: UndefinedMetricWarning: Precision and F-score are ill-def
ined and being set to 0.0 in labels with no predicted samples. Use `z
ero_division` parameter to control this behavior.
_warn_prf(average, modifier, msg_start, len(result))

localhost:8888/notebooks/Downloads/ml-exp-2.ipynb 4/5

You might also like