0% found this document useful (0 votes)

45 views

Saurabh

The document describes developing a spam mail detection system using Python. It discusses using machine learning algorithms like KNN, SVM, random forest and Naive Bayes to classify SMS messages as spam or ham. It presents results of applying these algorithms on SMS spam datasets, with KNN and linear SVM performing best with error rates of 3.11% and 1.19% respectively. Feature extraction and preprocessing are done before applying the algorithms. The aim is to accurately filter spam SMS messages.

Uploaded by

saurabh

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

45 views

Saurabh

Uploaded by

saurabh

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 26

MICRO PROJECT

REPORT

Develop a Spam mail detection using Python

BACHELOR OF TECHNOLOGY
IN
ELECTRONICS AND COMMUNICATION

THDC INSTITUTE OF HYDROPOWER ENGINEERING AND

TECHNOLOGY, TEHRI, UTTARAKHAND, INDIA
(UTTARAKHAND TECHNICAL UNIVERSITY,
DEHRADUN) 2022-2023

Ms. Nidhi lakhera Divyansh Kataria (700970102004)

Saurabh Sharma
(700970102007)
SIGNATURE ……… ECE(4th Year, 8th Semester)
Acknowledgement

As we were working on this project, we found lots of information that

helped us in making this project and we are glad that we successfully

completed this project

and was able to understand many things.Thank you,Nidhi Lakhera mam, for

giving us this opportunity to complete this project and for enabling us to

learn so much. We have no more valuable words to express our gratitude,

but our heart is still full of gratitude for all the kindness shown to us.

Saurabh Sharma,
Divyansh kataria
B. Tech, 8th Sem E
CE

I endorse the above declaration of the Student.

(Name and Signature of the Supervisor)

CONTENT

Chapter Name

1.Abstract ………………………………………………….
2.Introduction ……………………………………………
3.Requirements ……………………………………………
4.Flow Chart ……………………………………………
5. Introduction to the machine
Learning algorithms ……………………………………..

5.1 K-Nearest Neighbours …………………………

5.2 Support Vector Machines (SVM)………………
5.3 Random Forest …………………………………………
5.4 Naïve Bayes ……………………………………………….

Step 1 . About Bayes Theorem …………

Step 2 understand data ………………………..
Step 3: bag_of_word …………………………….
Step 4: training_and_testing ……………………
Step 5 : Implementing NB ML
alogorithm…… Step 6: Evaluate
model ………………………………

5.5: Adaboost ……………………………………………………

5.6. NLTK ………………………………………………………….

6. Python Code ScreenShot ……………………………

7. Result ScreenShot ………………………………………..
8. Conclusion ……………………………..………………………….
1. ABSTRACT

Short_Message_Service (SMS), which allows users to

send and receive messages, has become a multi-
billion dollar industry as
mobile phone usage has soared. The cost of messaging
services has also decreased, which has led to an increase in the
amount of spam that is delivered to mobile devices. Up to 4 0 %
of SMS messages in
some regions of Asia were spam in 2012 . Due to short
message lengths, lack of reliable databases for SMS
spams, informal language, and brief message
characteristics, the current email filtering algorithms may
not perform well in their. In this project, real SMS
spam databases from the ML repository are used. Following feature
extraction and preprocessing, On the databases, numerous machine
learning methods are used. After comparing the results, the best
algorithm for text message spam filtering is then presented. The
results utilising that in this study decreases the total error rate of
the best model in the original research referencing this. The
following
algorithms are used in this technique: Spam
communications are categorised in mobile device
communication using decision trees, K- Nearest
Neighbour, and logistic regression The SMS spam collecting set is used to test
the approach.
2. INTRO

Fig 1.1

Chat technology is simply one aspect of SMS. SMS technology was made
possible by standard, an accepted international standard. Spam is the term
for the abuse of electronic messaging services to send large numbers of
unwanted
messages to anybody. Eventhough email spam is the most well-known
example, identical offences in other media and mediums are
frequently referred to as "spam."
SMS In this sense, spam is frequently unsolicited bulk communications
that contains some commercial interest and is quite similar to email
spams.
Phishing URLs and business promotion are spread via SMS spam.
Commercial spammers use malware to transmit SMS since most countries
outlaw the practise. Since it is challenging to pinpoint the origin of spam
when it is sent from a hacked computer, spammers take less of a risk
while doing so. Only letters, numbers, and a few symbols are permitted in
SMS messages. A
brief glance at the mails identifies. Almost all spm msg direct users to
call a phone number or go to a website. A simple SQL query on the spam
yields
results that reveal this trend. Due of the low cost and large bandwidth
of the SMS network, SMS spam is widely used.
Fig 1. 2

Everytime a user receives an SMS spam message, their mobile phone

notifies them of the message's arrival. The consumer will be unhappy
when they
realise the message is unwanted, and SMS spam uses up some of the
storage space on their mobile device.

There are several notable differences emails and text messages.Contrary, which
may access a range of sizable datasets, actual databases for SMS spams are
quite scarce. The number of criteria that can be utilised to classify text
messages is also considerably less than those of emails due to the shorter
duration of text
messages. There is also no header in this case. In addition, text messages
use significantly less professional language than emails do and are chock
full of
acronyms. All of these elements could lead to a significant decline in the
effectiveness of the most important Short text message spam filtering algorithms are
utilised.

ML algorithms to the problem of classifying SMS spam, compare

their results to learn more and further research the problem, and
create a programme based on one of these approaches that can
precisely
filter SMS spams. A number of machine learning
algorithms are then implemented using the module in
Python after performing data feature extraction and basic
analysis in MAT_LAB. Data is first
analysed in MAT_LAB, and then several machine learning
techniques are applied using the learn module in python.
Fig 1.3

In the third installment of a three-part series, we'll examine the spam or ham
classifier from the standpoint of AI ideas, experiment with several
classification algorithms in a based on performance criteria. A web-
based Python.

applications of machine learning in modern internet technology.

service providers have integrated spam detection
algorithms that label such content as " Junk Mail" when
it is received.
In this project, the nave_bayes approach is utilised to create
a model that, depending on the training data we provide
the model, can classify a dataset. The words "free," "win,"
"winner," "cash," "prize," and similar expressions are
frequently used in these letters because they are meant to
catch your attention and in a sense persuade you to
open them.
Exclamation marks and writing in all capitals are other characteristics
of spam communications. Since spam texts are often
pretty evident to the receiver, we want to train a model
to identify them for us.
Finding spam mails is a binary classification issue since
messages can only be categorised and nothing else. This
is a supervised
learning problem as well because we will be giving the model a
tagged
3. Needful Equipment
For hardware

Processor: 1.5 GHz or more

4GB or more of RAM

HDD: at

least

100GB

software specifications

Python 3 IDLE or the

Anaconda Jupyter Notebook

4. Flow Chart
INTRO TO ML ALGORITHM

5.1 KNN
K-Nearest Neighbor is a straightforward instance-based
learning technique that can be used to solve classification
challenges. According to this method, a test sample's label is
predicted using the votes of its knn closest neighbour.

overall_ err Spm_ cought Blocked_hm

Knn_
3 3.11 82.5 0.35
12 3.25 86.2 0.42
22 2.91 79.4 0.41
52 3.25 78.6 0.34
90 4.12 69.5 0.17

5.2Support Vector Machine

On the dataset, support vector machine is used. The with various kernels are
shown in Table I I for a 10-fold cross validation. The table demonstrates
that the linear kernel outperforms alternative mappings in terms of
performance. The
error rate decreases while the degree of the polynomial from two to three,
but it does not decrease as the degree is raised higher. Here, the dataset is
subjected to another kernel called the radial basis function (RBF). The
following equation represents the RBF kernel for the two samples

Kernal Function Overall error Spam cought Blocked ham

linear 1.19 94.1 0.45

Degree 2 2.04 86.3 0.23

Degree 3 1.67 90.4 0.47

polynomial
Degree 4 2.01 92.45 0.65
polynomial

Radial basis 23.16 79.6 0.35

function
Sigmoid 22.4 0 0
We observe that the text message's character count is a very helpful factor for categorising
spam.

When features are ordered based on the mutual

informationcriterion, this feature has the highest mutual with the target labels.
Also, although text messages with lengths below a specific threshold are
normally hams, they could be mistakenly labelled as spams due to the tokens
that correlate. This is shown when looking at the samples that were improperly
classified.
The result_show no accuracy advantage over the algorithm, despite the model
being more complex and taking longer to train on data when using SVM with
different
kernels.-

5.3 Random Forest

Random-forests is a technique for classification that uses

ensemble ageing. The is a group of assembled from the boot
strap sample of a training set. when a node is divided during the
construction of the decision-tree, the split that is chosen is the
among a random selection of characteristics.A single model's
bias will increase as a result, however averaging can also
make up for
the increase in bias by lowering variance. As a result, a
better model is created. The scikit learn python
library's random forest implementation, which averages
the probabilistic predictions, is used in this study. For
this method, two numbers of estimators
are simulated. The overall error with 12 estim-ators is
1.91% the SC is 86.6% and the bh is 0.71% With 90
estim-ators, the overall
error will be 1.41%, the SC will be 92.2%, and the BH will be
0.52% We notice that, when compared to the naïve- bays-
algorithm,
performance is unchanged despite the model's
increased complexity.
5.4 Naïve Bays Algoritm

Step 1 About Bayes Theorem

The bayes Theorem one of the first prob-lastic algorithm created by Reverend-
Bayes (and use, no less, to try to infer the presence of god), Still works incredibly
well in some situations. To
understand this theorem, an example is recommended. Consider yourself a Secret
Service agent tasked with protecting the
democratic presidential candidate as he or she
delivers a campaign speech. Your task is challenging,
and you must always
be on guard for threats because it is a public event that is open to everyone.
Consequently, a reasonable place to start is by giving
each person a distinct threat level. Therefore, based on a person's physical
characteristics, such as their age, sex, and other minor
details like whether or not they are carrying a bag or seem tense, you can
determine whether they pose a threat.
If a person checks all the right boxes up until the point where your level of doubt is
crossed, and have them removed from the area. The works similarly to how we
determine the (a person who poses a threat) based on the probability of numerous.
The indepe-ndence of these features from one another
is something to take into account. For instance, if a
child exhibits signs of anxiety throughout the event,
the likelihood that they
pose a threat is lower than, say, if it were a big man. To clarify, age AND anxiousness are
the two characteristics we are taking into
account here. If we examine each of these
characteristics separately, we might be able to create
a model that marks
EVERYONE who exhibits anxiety as a possible threat. But given the likelihood that
any children present at the event will be anxious, it is possible that we will get a
lot of false positives.

Thus, by taking a person's age into account in addition to the

"nervousness" aspect, we would undoubtedly receive a more accurate conclusion regarding
who poses a threat and who does not.
The "Naive" portion of the theory is where it assumes that each aspect is
independent of the others, which may not always be the case and may therefore influence the
verdict.
In essence, the bayes theorem determines the likelihood that an event will occur based on the
proba-bilistic- distributions of a number of other events, in this
Case, the likelihood that a message would be spm. Later in the mission, we will go into
the bayes Theorem's operations, but first, let's examine the data we will be
using.
Step 2 understand data
Fig 2

Step 3: bag_of_word

We have a substantial set of text data (5580 rows of data). Email and other
messages usually contain a lot of language, yet the majority of machine
learning algorithms require numerical data as input.
In this part, we'd like to introduce the notion, which is a term for issues with
processing a single text data set or a collection of text data. BOW's
fundamental concept is to count the instances of each word inside a given
body of text. The order in which the words appear is irrelevant, according
to the BOW notion,
which analyses each word separately.
We can turn a group of documents into a matrix using a technique we'll
cover later, where each document represents a row, each word or token
represents a column, and the values in each row and column represent the
frequency with which each word or token appears in that document.

Step 4: training_and_testing

We can return to our dataset and continue our analysis now that we know
how to handle the Bag of Words problem. To later test our model, we would
first divide
our dataset into a training and testing set.

After dividing the data, our next goal is to carry out Step 2's
procedures: Convert our data to the desired matrix format and
bag of words. As before,
we will use CountVectorizer() to accomplish this. Here, there are two
steps to think about:

We will be using the data from X test, which has been transformed into
a matrix, to make predictions about the "sms message" column. Then, in
a subsequent step.
Step 5 : Implementing NB ML alogorithm

I'll utilise the technique to produce predictions on our dataset for SMS Spm
_Detection.

Particularly, we'll apply the multinomial nv byes

implementation. Using discrete features to categorise data
is appropriate for this particular classifier. Word counts in
the form of integers are
accepted as input.

Step 6: Evaluate model

our model is performing in relation to the forecasts we
made on our test set. There are other ways to do this, but
let's first quickly go over them.

Accuracy is used to determine how frequently the classifier

makes accurate predictions. The ratio of accurate
forecasts to all predictions is measured.
The percentage of messages that were mistakenly classified as
spam is revealed by precision. It is the proportion of genuine
positives (words flagged as spam that are actually spam) to all
positives, in other words (words labelled as spam regardless of
whether that classification was accurate).

The proportion of messages that we wrongly classified as spam is

shown by recall ( sensitivity) . It measures the proportion of true
positives—
words that were marked as spam but are in fact spam—to the
total number of words that were marked as spam.

For example, in our situation, if we had 100 text messages and only two
of them were spam and the other 98 were not, this is an example of a
classification problem where the classification distributions are skewed.
5.5 : Adaboost
classifiers one at a time, refining each one to account for examples that were
misidentified by prior classifiers . Even if the classifiers employed are only
moderately superior to random guessing, the final model will be improved.
To ensemble strategy combination others.

Certain weights are added to the training samples at each Ada Boost
iteration. These weights are distributed uniformly prior to the initial iteration.
Following that, the current model increases weights for labels that were
wrongly classified and decreases weights for samples that were incorrectly categorised.
This suggests that the new predictor is concentrating
on the problems with the.

Mdl S(C) (B)H Accuu-

NB 95.35 0.52 97.73
S(VM) 93.36 0.63 89.63
KNN 83.87 0.32 97.56
Random forest 91.62 0.63 97.52

Adaboost 93.21 0.45 98.56

with
decision-tree
We investigated implementing with the module. In the simulation with 12
estimators, the total error rate is 3.1%, the SC is 86.7%, and the BH is
0.64%. When the number of estimators is increased to 90, these figures will
change to (3.41, 93.6, and 0.71)%,. Similar to, naïve Bayes algorithm
performs better
than Ada boost with decision trees despite being significantly more sophisticated.

5.6 NLTK

Leading Python development environments for working with

human language data include NLTK. It offers straightforward
interfaces to more than 50 corpora
and lexical resources, including WordNet, as well as a collection of text
processing libraries for categorization, tokenization, stemming, tagging,
parsing, and
semantic reasoning, wrappers for powerful NLP libraries, and a lively discussion board.

NLTK is appropriate for linguists, engineers, students, educators, academics,

and industry users equally thanks to a hands-on approach covering
programming
foundations along with themes in computational linguistics and full API
documentation. Windows, Mac OS X, and Linux all support NLTK. The project is
community- driven, free, and open source, which is the best part.

Fig 3
6. Python code ScreenShot
7. Result ScreenShot
8. Conclusion

The outcomes of various classification models run on the SMS

Spam dataset are displayed. Results of the simulation.The
best classifiers for SMS spam
detection include SVM with a linear kernel and multinomial naive Bayes with

Laplace smoothing. The SVM-based classifier in the original research that

used this dataset had the highest overall accuracy (92.64%), making it the
best one. With an overall accuracy of 92.60%, enhanced naive Bayes is the
next best
classifier in their research. When compared to the outcome of earlier
research, our classifier cuts the overall error in half. The variables that led
to this increase in outcomes include the addition of significant
characteristics like the amount of characters in messages, the
addition of specific thresholds for the length,
and analysis of learning curves and misclassified data.
The capability of Naive Bayes to handle an exceptionally high
number of features is one of its key benefits over other
classification methods. Since there
are hundreds of distinct words, they are all considered as features in our situation.
Additionally, it functions effectively even when irrelevant
characteristics are present and is mostly unaffected by them. Its
relative simplicity is another key benefit, unless often in situations
when the data distribution is known. Rarely does the data overfit the
model.
Another key benefit is how quickly the model trains and predicts given
the volume of data it can manage. Overall, Naive Bayes' algorithm
really is a
treasure.

Downloadable Test Bank For Managerial Accounting Tools For Business Decision Making 6th Edition Weygandt 3
0% (3)
Downloadable Test Bank For Managerial Accounting Tools For Business Decision Making 6th Edition Weygandt 3
5 pages
2005 Implantology in General Dental Practice
No ratings yet
2005 Implantology in General Dental Practice
104 pages
MGMT and Leadership 704-2
No ratings yet
MGMT and Leadership 704-2
9 pages
Major Project by Ali (Intrainz)
No ratings yet
Major Project by Ali (Intrainz)
25 pages
Machine Learning Paper-2
No ratings yet
Machine Learning Paper-2
4 pages
Final ppt
No ratings yet
Final ppt
51 pages
Sms Spam Detcetion Review Paper
No ratings yet
Sms Spam Detcetion Review Paper
4 pages
sms spam detection project final
No ratings yet
sms spam detection project final
59 pages
Spam SMS (or) Email Detection and Classification using Machine Learning
No ratings yet
Spam SMS (or) Email Detection and Classification using Machine Learning
5 pages
abh1
No ratings yet
abh1
17 pages
IJNRD2403165
No ratings yet
IJNRD2403165
5 pages
Solution: March 2018
No ratings yet
Solution: March 2018
8 pages
b 14 Sms Spam Detection Ml Ieee Report (1)
No ratings yet
b 14 Sms Spam Detection Ml Ieee Report (1)
5 pages
Anchalora
No ratings yet
Anchalora
29 pages
(KAVYA R SHETTY)
No ratings yet
(KAVYA R SHETTY)
21 pages
SMS Spam Detection and Classification Using NLP Thesis
No ratings yet
SMS Spam Detection and Classification Using NLP Thesis
14 pages
Message Spam Identification by Naive Bayes Classifier Algorithm Using Machine Learning
No ratings yet
Message Spam Identification by Naive Bayes Classifier Algorithm Using Machine Learning
5 pages
SMS Spam Filtering Using Supervised Machine Learning Algorithms
No ratings yet
SMS Spam Filtering Using Supervised Machine Learning Algorithms
6 pages
Jebin 2
No ratings yet
Jebin 2
22 pages
Content-Based Sms Spam Filtering Using Machine Learning Technique
No ratings yet
Content-Based Sms Spam Filtering Using Machine Learning Technique
7 pages
Final PPT
No ratings yet
Final PPT
18 pages
1822 b Deleted
No ratings yet
1822 b Deleted
38 pages
Kriti Final Report
No ratings yet
Kriti Final Report
60 pages
46_ijme...Mech Engg..Research Paper-1
No ratings yet
46_ijme...Mech Engg..Research Paper-1
10 pages
1822 b Deleted Merged Cropped
No ratings yet
1822 b Deleted Merged Cropped
40 pages
Spam Detection in Text Using Machine Learning 1
No ratings yet
Spam Detection in Text Using Machine Learning 1
85 pages
Research Paper Spam Detection
No ratings yet
Research Paper Spam Detection
4 pages
IJRPR11625
No ratings yet
IJRPR11625
6 pages
Presentation
No ratings yet
Presentation
27 pages
Detecting Spam Messages Using The Naive Bayes Algorithm of Basic Machine Learning
No ratings yet
Detecting Spam Messages Using The Naive Bayes Algorithm of Basic Machine Learning
3 pages
Report
No ratings yet
Report
19 pages
SMS Spam Classification Using WEKA: Dipak R. Kawade Kavita S. Oza
No ratings yet
SMS Spam Classification Using WEKA: Dipak R. Kawade Kavita S. Oza
5 pages
Random Forests Machine Learning Technique For Email Spam Filtering E. G. Dada and S. B. Joseph
No ratings yet
Random Forests Machine Learning Technique For Email Spam Filtering E. G. Dada and S. B. Joseph
8 pages
Aiml Pro
No ratings yet
Aiml Pro
14 pages
44 Decision Tree Model for Email Classification
No ratings yet
44 Decision Tree Model for Email Classification
4 pages
Base Paper
No ratings yet
Base Paper
22 pages
SMS Spam Detection 1
No ratings yet
SMS Spam Detection 1
9 pages
Detecting Phishing in Text Messages
No ratings yet
Detecting Phishing in Text Messages
6 pages
Development of Content-Based SMS Classification Application by Using Word2Vec-based Feature Extraction
No ratings yet
Development of Content-Based SMS Classification Application by Using Word2Vec-based Feature Extraction
10 pages
SMS Classification: Conjoint Analysis of Multinomial Naive Bayes Application
No ratings yet
SMS Classification: Conjoint Analysis of Multinomial Naive Bayes Application
9 pages
Sms
No ratings yet
Sms
16 pages
Project Report Emaildetection
No ratings yet
Project Report Emaildetection
44 pages
Enhancing Email Security with Naïve Bayes Spam Detection.docx Fully edited
No ratings yet
Enhancing Email Security with Naïve Bayes Spam Detection.docx Fully edited
64 pages
Nisha Internship3
No ratings yet
Nisha Internship3
87 pages
Spam Detection Using BERT
No ratings yet
Spam Detection Using BERT
6 pages
Multi-Purpose Chat Bot: Team Formation Team Members
No ratings yet
Multi-Purpose Chat Bot: Team Formation Team Members
15 pages
Future Generation Computer Systems: Pradeep Kumar Roy Jyoti Prakash Singh Snehasish Banerjee
No ratings yet
Future Generation Computer Systems: Pradeep Kumar Roy Jyoti Prakash Singh Snehasish Banerjee
10 pages
A hybrid machine learning approach for spam and malware
No ratings yet
A hybrid machine learning approach for spam and malware
14 pages
Synopsis Email Spam
No ratings yet
Synopsis Email Spam
9 pages
SMS Spam Detection Using Machine Learning
No ratings yet
SMS Spam Detection Using Machine Learning
9 pages
Supervised Learningclassification Part3
No ratings yet
Supervised Learningclassification Part3
42 pages
Spam Mail Detection5x9,x8,w6
No ratings yet
Spam Mail Detection5x9,x8,w6
11 pages
Survey On Spam Filtering in Text Analysis: Saksham Sharma, Rabi Raj Yadav
No ratings yet
Survey On Spam Filtering in Text Analysis: Saksham Sharma, Rabi Raj Yadav
7 pages
Miniproject Thirukumaran
No ratings yet
Miniproject Thirukumaran
38 pages
Decision Tree Model For Email Classification: Ivana Čavor
No ratings yet
Decision Tree Model For Email Classification: Ivana Čavor
4 pages
Spam Detection Thesis
100% (3)
Spam Detection Thesis
6 pages
Madhavan_2021_IOP_Conf._Ser.__Mater._Sci._Eng._1022_012113
No ratings yet
Madhavan_2021_IOP_Conf._Ser.__Mater._Sci._Eng._1022_012113
12 pages
Icdici 274 Spam Sms
No ratings yet
Icdici 274 Spam Sms
6 pages
Spam Filter - Machine Learning
No ratings yet
Spam Filter - Machine Learning
25 pages
vishal FOML micro project vishal & milan
No ratings yet
vishal FOML micro project vishal & milan
26 pages
Project_Report_Template_AICTE_Internship_2025
No ratings yet
Project_Report_Template_AICTE_Internship_2025
21 pages
Chatanya
No ratings yet
Chatanya
43 pages
Automatic Target Recognition: Fundamentals and Applications
From Everand
Automatic Target Recognition: Fundamentals and Applications
Fouad Sabry
No ratings yet
Narrative Report On The Conduct of Slac
No ratings yet
Narrative Report On The Conduct of Slac
7 pages
Sales Leaseback (Lessor)
No ratings yet
Sales Leaseback (Lessor)
3 pages
9 16docx
No ratings yet
9 16docx
21 pages
Korean 1
No ratings yet
Korean 1
8 pages
Biomolecules Chemistry Assignment
67% (3)
Biomolecules Chemistry Assignment
28 pages
Brains Over Brawn-1
No ratings yet
Brains Over Brawn-1
10 pages
Acute Coronary Syndrome: PGI Calro Antonio H. Lanuza Manila Doctors Hospital - Internal Medicine Department
No ratings yet
Acute Coronary Syndrome: PGI Calro Antonio H. Lanuza Manila Doctors Hospital - Internal Medicine Department
60 pages
Harlene Anderson: Postmodern Collaborative and Person-Centred Therapies: What Would Carl Rogers Say?
No ratings yet
Harlene Anderson: Postmodern Collaborative and Person-Centred Therapies: What Would Carl Rogers Say?
22 pages
CO Expt No. 4
No ratings yet
CO Expt No. 4
6 pages
Laboratory 1
No ratings yet
Laboratory 1
2 pages
0 Sylibus JJ108 ENGINEERING LABORATORY 1
No ratings yet
0 Sylibus JJ108 ENGINEERING LABORATORY 1
3 pages
12th Grade British Literature and Composition - Hohnadel
No ratings yet
12th Grade British Literature and Composition - Hohnadel
3 pages
Https WWW - Careerizma.com App CT Freereport - PHP
No ratings yet
Https WWW - Careerizma.com App CT Freereport - PHP
8 pages
The Maryland General Assembly Reading List
No ratings yet
The Maryland General Assembly Reading List
15 pages
FBLA Facts
No ratings yet
FBLA Facts
9 pages
Nationalization of Commercial Banks
No ratings yet
Nationalization of Commercial Banks
12 pages
RULE 113-116 of Rules of Court
No ratings yet
RULE 113-116 of Rules of Court
6 pages
How To Implement DES in Ruby
No ratings yet
How To Implement DES in Ruby
8 pages
Behavioral Economics
No ratings yet
Behavioral Economics
125 pages
Devereux Pi Portfolio Standards 1 2
No ratings yet
Devereux Pi Portfolio Standards 1 2
3 pages
United States v. Kevin Witasick, 4th Cir. (2011)
No ratings yet
United States v. Kevin Witasick, 4th Cir. (2011)
12 pages
21st Century Literature - Lesson 2
No ratings yet
21st Century Literature - Lesson 2
46 pages
Online Classifieds: A Summer Training Report ON
No ratings yet
Online Classifieds: A Summer Training Report ON
45 pages
Charleston Tour Guide Licenses Ruling
No ratings yet
Charleston Tour Guide Licenses Ruling
33 pages
Report
No ratings yet
Report
1 page
(Anatomy of A Nation in Decline) 11 - Reference Point: We Are Living in The End Times
100% (1)
(Anatomy of A Nation in Decline) 11 - Reference Point: We Are Living in The End Times
6 pages
Design Thinking Example
No ratings yet
Design Thinking Example
12 pages

Saurabh

Uploaded by

Saurabh

Uploaded by

MICRO PROJECT

Develop a Spam mail detection using Python

THDC INSTITUTE OF HYDROPOWER ENGINEERING AND

Ms. Nidhi lakhera Divyansh Kataria (700970102004)

As we were working on this project, we found lots of information that

helped us in making this project and we are glad that we successfully

completed this project

giving us this opportunity to complete this project and for enabling us to

learn so much. We have no more valuable words to express our gratitude,

I endorse the above declaration of the Student.

(Name and Signature of the Supervisor)

5.1 K-Nearest Neighbours …………………………

Step 1 . About Bayes Theorem …………

5.5: Adaboost ……………………………………………………

6. Python Code ScreenShot ……………………………

Short_Message_Service (SMS), which allows users to

Everytime a user receives an SMS spam message, their mobile phone

ML algorithms to the problem of classifying SMS spam, compare

applications of machine learning in modern internet technology.

Processor: 1.5 GHz or more

4GB or more of RAM

Python 3 IDLE or the

Anaconda Jupyter Notebook

overall_ err Spm_ cought Blocked_hm

5.2Support Vector Machine

Kernal Function Overall error Spam cought Blocked ham

linear 1.19 94.1 0.45

Degree 3 1.67 90.4 0.47

Radial basis 23.16 79.6 0.35

When features are ordered based on the mutual

5.3 Random Forest

Random-forests is a technique for classification that uses

Step 1 About Bayes Theorem

Thus, by taking a person's age into account in addition to the

Particularly, we'll apply the multinomial nv byes

Step 6: Evaluate model

Accuracy is used to determine how frequently the classifier

The proportion of messages that we wrongly classified as spam is

Mdl S(C) (B)H Accuu-

Adaboost 93.21 0.45 98.56

Leading Python development environments for working with

NLTK is appropriate for linguists, engineers, students, educators, academics,

The outcomes of various classification models run on the SMS

Laplace smoothing. The SVM-based classifier in the original research that

You might also like