Saurabh
Saurabh
REPORT
ON
BACHELOR OF TECHNOLOGY
IN
ELECTRONICS AND COMMUNICATION
and was able to understand many things.Thank you,Nidhi Lakhera mam, for
but our heart is still full of gratitude for all the kindness shown to us.
Saurabh Sharma,
Divyansh kataria
B. Tech, 8th Sem E
CE
Chapter Name
1.Abstract ………………………………………………….
2.Introduction ……………………………………………
3.Requirements ……………………………………………
4.Flow Chart ……………………………………………
5. Introduction to the machine
Learning algorithms ……………………………………..
Fig 1.1
Chat technology is simply one aspect of SMS. SMS technology was made
possible by standard, an accepted international standard. Spam is the term
for the abuse of electronic messaging services to send large numbers of
unwanted
messages to anybody. Eventhough email spam is the most well-known
example, identical offences in other media and mediums are
frequently referred to as "spam."
SMS In this sense, spam is frequently unsolicited bulk communications
that contains some commercial interest and is quite similar to email
spams.
Phishing URLs and business promotion are spread via SMS spam.
Commercial spammers use malware to transmit SMS since most countries
outlaw the practise. Since it is challenging to pinpoint the origin of spam
when it is sent from a hacked computer, spammers take less of a risk
while doing so. Only letters, numbers, and a few symbols are permitted in
SMS messages. A
brief glance at the mails identifies. Almost all spm msg direct users to
call a phone number or go to a website. A simple SQL query on the spam
yields
results that reveal this trend. Due of the low cost and large bandwidth
of the SMS network, SMS spam is widely used.
Fig 1. 2
There are several notable differences emails and text messages.Contrary, which
may access a range of sizable datasets, actual databases for SMS spams are
quite scarce. The number of criteria that can be utilised to classify text
messages is also considerably less than those of emails due to the shorter
duration of text
messages. There is also no header in this case. In addition, text messages
use significantly less professional language than emails do and are chock
full of
acronyms. All of these elements could lead to a significant decline in the
effectiveness of the most important Short text message spam filtering algorithms are
utilised.
In the third installment of a three-part series, we'll examine the spam or ham
classifier from the standpoint of AI ideas, experiment with several
classification algorithms in a based on performance criteria. A web-
based Python.
HDD: at
least
100GB
software specifications
5.1 KNN
K-Nearest Neighbor is a straightforward instance-based
learning technique that can be used to solve classification
challenges. According to this method, a test sample's label is
predicted using the votes of its knn closest neighbour.
On the dataset, support vector machine is used. The with various kernels are
shown in Table I I for a 10-fold cross validation. The table demonstrates
that the linear kernel outperforms alternative mappings in terms of
performance. The
error rate decreases while the degree of the polynomial from two to three,
but it does not decrease as the degree is raised higher. Here, the dataset is
subjected to another kernel called the radial basis function (RBF). The
following equation represents the RBF kernel for the two samples
The bayes Theorem one of the first prob-lastic algorithm created by Reverend-
Bayes (and use, no less, to try to infer the presence of god), Still works incredibly
well in some situations. To
understand this theorem, an example is recommended. Consider yourself a Secret
Service agent tasked with protecting the
democratic presidential candidate as he or she
delivers a campaign speech. Your task is challenging,
and you must always
be on guard for threats because it is a public event that is open to everyone.
Consequently, a reasonable place to start is by giving
each person a distinct threat level. Therefore, based on a person's physical
characteristics, such as their age, sex, and other minor
details like whether or not they are carrying a bag or seem tense, you can
determine whether they pose a threat.
If a person checks all the right boxes up until the point where your level of doubt is
crossed, and have them removed from the area. The works similarly to how we
determine the (a person who poses a threat) based on the probability of numerous.
The indepe-ndence of these features from one another
is something to take into account. For instance, if a
child exhibits signs of anxiety throughout the event,
the likelihood that they
pose a threat is lower than, say, if it were a big man. To clarify, age AND anxiousness are
the two characteristics we are taking into
account here. If we examine each of these
characteristics separately, we might be able to create
a model that marks
EVERYONE who exhibits anxiety as a possible threat. But given the likelihood that
any children present at the event will be anxious, it is possible that we will get a
lot of false positives.
Step 3: bag_of_word
We have a substantial set of text data (5580 rows of data). Email and other
messages usually contain a lot of language, yet the majority of machine
learning algorithms require numerical data as input.
In this part, we'd like to introduce the notion, which is a term for issues with
processing a single text data set or a collection of text data. BOW's
fundamental concept is to count the instances of each word inside a given
body of text. The order in which the words appear is irrelevant, according
to the BOW notion,
which analyses each word separately.
We can turn a group of documents into a matrix using a technique we'll
cover later, where each document represents a row, each word or token
represents a column, and the values in each row and column represent the
frequency with which each word or token appears in that document.
Step 4: training_and_testing
We can return to our dataset and continue our analysis now that we know
how to handle the Bag of Words problem. To later test our model, we would
first divide
our dataset into a training and testing set.
After dividing the data, our next goal is to carry out Step 2's
procedures: Convert our data to the desired matrix format and
bag of words. As before,
we will use CountVectorizer() to accomplish this. Here, there are two
steps to think about:
We will be using the data from X test, which has been transformed into
a matrix, to make predictions about the "sms message" column. Then, in
a subsequent step.
Step 5 : Implementing NB ML alogorithm
I'll utilise the technique to produce predictions on our dataset for SMS Spm
_Detection.
For example, in our situation, if we had 100 text messages and only two
of them were spam and the other 98 were not, this is an example of a
classification problem where the classification distributions are skewed.
5.5 : Adaboost
classifiers one at a time, refining each one to account for examples that were
misidentified by prior classifiers . Even if the classifiers employed are only
moderately superior to random guessing, the final model will be improved.
To ensemble strategy combination others.
Certain weights are added to the training samples at each Ada Boost
iteration. These weights are distributed uniformly prior to the initial iteration.
Following that, the current model increases weights for labels that were
wrongly classified and decreases weights for samples that were incorrectly categorised.
This suggests that the new predictor is concentrating
on the problems with the.
5.6 NLTK
Fig 3
6. Python code ScreenShot
7. Result ScreenShot
8. Conclusion