0% found this document useful (0 votes)

35 views19 pages

TWITTER SENTIMENT NLP Projectt

Uploaded by

begoj22622

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

35 views19 pages

TWITTER SENTIMENT NLP Projectt

Uploaded by

begoj22622

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 19

“Twitter Sentiment

Analysis”

A Mini Project Proposal Submitted in

partial fulfillment of the requirements of the
degree of

BACHELOR OF

ENGINEERING IN

COMPUTER ENGINEERING

Submitted By
`` NamaN Bhalani (02)
Sachin Prajapati (31)
Dhrumil Upadhyay (53)
Shrey Varma (56)

Neelam Phadnis (Assistant Professor. HOD)

DEPARTMENT OF COMPUTER ENGINEERING

Accredited by NBA for 3 years w.e .f. 1st July 2022

SHREE L. R. TIWARI COLLEGE OF ENGINEERING

SHREE L.R. TIWARI EDUCATIONAL CAMPUS, MIRA ROAD (East),
THANE -401 107, MAHARASHTRA.

University of Mumbai
(AY 2024-25)
TABLE OF CONTENTS

1. Introduction
2. Literature Review
3. Implementation
4. Resources
5. Emoticons:
6. Unicode:
7. Case:
8. Targets:
9. Negation:
10. Sequence of repeated characters:
11. Machine learning
12. Naive Bayes
13. Baseline
14. Improvements
15. Conclusion
16. References
Introduction

Sentiment analysis deals with identifying and classifying opinions or sentiments expressed in source
text. Social media is generating a vast amount of sentiment rich data in the form of tweets, status updates,
blog posts etc. Sentiment analysis of this user generated data is very useful in knowing the opinion of
the crowd. Twitter sentiment analysis is difficult compared to general sentiment analysis due to the
presence of slang words and misspellings. The maximum limit of characters that are allowed in Twitter
is 140. Knowledge base approach and Machine learning approach are the two strategies used for
analyzing sentiments from the text. In this paper, we try to analyze the twitter posts about electronic
products like mobiles, laptops etc using Machine Learning approach. By doing sentiment analysis in a
specific domain, it is possible to identify the effect of domain information in sentiment classification.
We present a new feature vector for classifying the tweets as positive, negative and extract peoples'
opinion about products. In this project I choose to try to classify tweets from Twitter into “positive” or
“negative” sentiment by building a model based on probabilities. Twitter is a microblogging website
where people can share their feelings quickly and spontaneously by sending a tweets limited by 140
characters. You can directly address a tweet to someone by adding the target sign “@” or
participate to a topic by adding an hastag “#” to your tweet. Because of the usage of Twitter, it is
a perfect source of data to determine the current overall opinion about anything.

Implementation

To gather the data many options are possible. In some previous paper researches, they built a
program to collect automatically a corpus of tweets based on two classes, “positive” and
“negative”, by querying Twitter with two type of emoticons:
● Happy emoticons, such as “:)”, “:P”, “: )” etc.
● Sad emoticons, such as “:(“, “:’(”, “=(“.
Others make their own dataset of tweets my collecting and annotating them manually which very
long and fastidious.
Additionally to find a way of getting a corpus of tweets, we need to take of having a balanced data
set, meaning we should have an equal number of positive and negative tweets, but it needs also to
be large enough. Indeed, more the data we have, more we can train our classifier and more the
accuracy will be.
After many researches, I found a dataset of 1578612 tweets in english coming from two sources:
Kaggle and Sentiment140. It is composed of four columns that are ItemID, Sentiment,
SentimentSource and SentimentText. We are only interested by the Sentiment column
corresponding to our label class taking a binary value, 0 if the tweet is negative, 1 if the tweet is
positive and the SentimentText columns containing the tweets in a raw format.

Table 1. Example of twitter posts annotated with their corresponding sentiment, 0 if it is negative, 1
if it is positive.

In the Table 1 showing the first ten twitter posts we can already notice some particularities
and difficulties that we are going to encounter during the preprocessing steps.
● The presence of acronyms "bf" or more complicated "APL". Does it means apple ?Apple (the
company) ? In this context we have "friend" after so we could think that he refers to his
smartphone and so Apple, but what about if the word "friend" was not here ?
● The presence of sequences of repeated characters such as
"Juuuuuuuuuuuuuuuuussssst", "hmmmm". In general when we repeat
several characters in a word, it is to emphasize it, to increase its impact.
● The presence of emoticons , ":O", "T_T", ": |" and much more, give insights about
user's moods.
● Spelling mistakes and “urban grammar ” like "im gunna" or "mi".
● The presence of nouns such as "TV", "New
Moon". Furthermore, we can also add,
● People also indicate their moods, emotions, states, between two such as, *\cries*,
*hummin*, *sigh*.
● The negation, “can't”, “cannot”, “don't”, “haven't” that we need to handle like: “I don’t
likechocolate”, “like” in this case is negative.
We could also be interested by the grammar structure of the tweets, or if a tweet is
subjective/objective and so on. As you can see, it is extremely complex to deal with languages
and even more when we want to analyse text typed by users on the Internet because people don’t
take care of making sentences that are grammatically correct and use a ton of acronyms and words
that are more or less english in our case. We can visualize a bit more the dataset by making a
chart of how many positive and negative tweets does it contains,

Figure 1: Histogram of the tweets according to their sentiment

We have exactly 790177 positive tweets and 788435 negative tweets which signify that the dataset
is well balanced. There is also no duplicates.
Finally, let’s recall the Twitter terminology since we are going to have to deal with in the tweets:
● Hashtag: A hashtag is any word or phrase immediately preceded by the # symbol. Whenyou click
on a hashtag, you’ll see other Tweets containing the same keyword or topic. ● @username: A
username
is how you’re identified on Twitter, and is always preceded immediately by the @ symbol. For
instance, Katy Perry is @katyperry.
● MT: Similar to RT (Retweet), an abbreviation for “Modified Tweet.” Placed before
theRetweeted text when users manually retweet a message with modifications, for example
shortening a Tweet.
● Retweet: RT, A Tweet that you forward to your followers is known as a Retweet. Oftenused to
pass along news or other valuable discoveries on Twitter, Retweets always retain original
attribution.
● Emoticons: Composed using punctuation and letters, they are used to express
emotions concisely, ";) :) ...".
5
Now we have the corpus of tweets, we need to use other resources to make easier the pre processing
step.
Resources

In order to facilitate the pre processing part of the data, we introduce five resources which are,
● An emoticon dictionary regrouping 132 of the most used emoticons in western
withtheir sentiment, negative or positive.
● An acronym dictionary of 5465 acronyms with their translation.
● A stop word dictionary corresponding to words which are filtered out before or
afterprocessing of natural language data because they are not useful in our case.
● A positive and negative word dictionaries g iven the polarity (sentiment out of context)of words.
● A negative contractions and auxiliaries dictionary which will be used to detectnegation in
a given tweet such as “don’t”, “can’t”, “cannot”, etc.
The introduction of these resources will allow to uniform tweets and remove some of their
complexities with the acronym dictionary for instance because a lot of acronyms are used in tweets.
The positive and negative word dictionaries could be useful to increase (or not) the accuracy score
of the classifier. The emoticon dictionary has been built from wikipedia with each emoticon
annotated manually. The stop word dictionary contains 635 words such as “the”, “of”, “without”.
Normally they should not be useful for classifying tweets according to their sentiment but it is
possible that they are.
Also we use Python 2.7 (https://www.python.org/) which is a programming language widely used in
data science and scikit learn (http://scikit learn.org/) a very complete and useful library for machine
learning containing every techniques, methods we need and the website is also full of tutorials well
explained. With Python, the libraries, Numpy (http://www.numpy.org/) and Panda
(http://pandas.pydata.org/) for manipulating data easily and intuitively are just essential.
Pre-processing

Now that we have the corpus of tweets and all the resources that could be useful, we can pre process
the tweets. It is a very important since all the modifications that we are going to during this process
will directly impact the classifier’s performance. The pre processing includes cleaning, normalization,
transformation, feature extraction and selection, etc. The result of pre processing will be consistent
and uniform data that are workable to maximize the classifier's performance. All of the tweets are pre
processed by passing through the following steps in the same order.

Emoticons:

We replace all emoticons by their sentiment polarity ||pos|| and ||neg|| using the emoticon
dictionary. To do the replacement, we pass through each tweet and by using a regex we find
out if it contains emoticons, if yes they are replaced by their corresponding polarity.

Table 2. Before processing emoticons, list of tweets where some of them

contain emoticons.
Table 3. After processing emoticons, they have been replaced by their corresponding tag
The data set contains 19469 positive emoticons and 11025 negative emoticons.

Unicode:

The data set contains 19469 positive emoticons and 11025 negative emoticons.

Table 4. Tweets before processing Unicode.

Table 5. Tweets after processing Unicode.

Case:

The case is something that can appears useless but in fact it is really important for distinguish proper
noun and other kind of words. Indeed: “General Motor” is the same thing that “general motor”, or
“MSc” and “msc”. So reduce all letters to lowercase should be normally done wisely. In this project,
for simplicity we will not take care of that since we assume that it should not impact too much the
classifier’s performance.

Table 6. Tweets before processing lowercase.

Table 7. Tweets after processing lowercase.

Targets:

The target correspond to usernames in twitter preceded by “@” symbol. It is used to address a tweet to
someone or just grab the attention. We replace all usernames/targets by the tag ||target|| . Notice that in
the data set we have 735757 targets.
Table 8. Tweets before processing targets.

Table 9. Tweets after processing targets.

Acronyms:

We replace all acronyms with their translation. An acronym is an abbreviation formed from the initial
components in a phrase or a word. Usually these components are individual letters (as in NATO or laser)
or parts of words or names (as in Benelux). Many acronyms are used in our data set of tweets as you can
see in the following bar chart. At this point, tweets are going to be tokenized by getting rid of the
punctuation and using split in order to do the process really fast. We could use nltk.tokenizer but it is
definitely much much slower (also much more accurate).
Figure 3. Top 20 of acronyms in the data set of tweets
As you can see, “lol”, “u”, “im”, “2” are really often used by users. The table below shows the top 20
acronyms with their translation and their count.

Table 10. Top 20 of acronyms in the data set of tweets with their translation and count

Negation:

We replace all negation words such as “not”, “no”, “never” by the tag ||not|| using the negation dictionary
in order to take more or less of sentences like "I don't like it". Here like should not be considered as
positive because of the "don't" before. To do so we will replace "don't" by ||not|| and the word like will
not be counted as positive. We should say that each time a negation is encountered, the
words followed by the negation word contained in the positive and negative word dictionaries will be
reversed, positive becomes negative, negative becomes positive, we will do this when we will try to find
positive and negative words.

Figure 4. A tweet before processing negation words.

Figure 5. A tweet after processing negation words.

Sequence of repeated characters:

Now, we replace all sequences of repeated characters by two characters (e.g: "helloooo" = "helloo")
to keep the emphasized usage of the word.

Table 11. Tweets before processing sequences of repeated characters.

Table 12. Tweets after processing sequences of repeated characters.

Machine learning

Once we have applied the different steps of the preprocessing part, we can now focus on the machine
learning part. There are three major models used in sentiment analysis to classify a sentence into positive
or negative: SVM, Naive Bayes and Language Models (N Gram). SVM is known to be the model giving
the best results but in this project we focus only on probabilistic model that are Naive Bayes and
Language Models that have been widely used in this field. Let’s first introduce the Naive Bayes model
which is well known for its simplicity and efficiency for text classification.

Naive Bayes

In machine learning, naive Bayes classifiers are a family of simple probabilistic classifiers based on
applying Bayes' theorem with strong (naive)independence assumptions between the features. Naive
Bayes classifiers are highly scalable, requiring a number of parameters linear in the number of variables
(features/predictors) in a learning problem. Maximum likelihood training can be done by evaluating a
closed form expression (mathematical expression that can be evaluated in a finite number of operations),
which takes linear time. It is based on the application of the Baye’s rule given by the following formula:

Formula: Baye’s rule

where D denotes the document and C the category (label), d and c are instances of D and C and P(D =
d) = ∑ (D |C )P(C ).
The Multi variate Bernoulli Model : Also called binomial model, useful if our feature vectors are binary
(e.g 0s and 1s). An application can be text classification with bag of words model where the 0s 1s are
"word does not occur in the document" and "word occurs in the document" respectively. ● The
Multinomial Model : Typically used for discrete counts. In text classification, we extend the Bernoulli
model further by counting the number of times a word $w_i$ appears over the number of words rather
than saying 0 or 1 if word occurs or not. ● the Gaussian Model : We assume that features follow a normal
distribution. Instead of discrete counts, we have continuous features.

Baseline

In every machine learning task, it is always good to have what we called a baseline. It often a “quick and
dirty” implementation of a basic model for doing the first classification and based on its accuracy, try to
improve it. We use the Multinomial Naive Bayes as learning algorithm with the Laplace smoothing
representing the classic way of doing text classification. Since we need to extract features from our data
set of tweets, we use the bag of words model to represent it. The bag of words model is a simplifying
representation of a document where it is represented as a bag of its words without taking consideration
of the grammar or word order. In text classification, the count (number of time) of each word appears is
a document is used as a feature for training the classifier. Firstly, we divide the data set into two parts,
the training set and the test set. To do this, we first shuffle the data set to get rid of any order applied to
the data, then we from the set of positive tweets and the set of negative tweets, we take 3/4 of tweets
from each set and merge them together to make the training set. The rest is used to make the test set.
Finally the size of the training set is 1183958 tweets and the test set is 394654 tweets. Notice that they
are balanced and follow the same distribution of the initial data set. Once the training set and the test set
are created we actually need a third set of data called the validation set . It is really useful because it will
be used to validate our model against unseen data and tune the possible parameters of the learning
algorithm to avoid underfitting and overfitting for example. We need this validation set because our test
set should be used only to verify how well the model will generalize . If we use the test set rather than
the validation set, our model could be overly optimistic and twist the results. 16 To make the validation
set, there are two main options: ● Split the training set into two parts
(60%, 20%) with a ratio 2:8 where each part contains an equal distribution of example types. We train
the classifier with the largest part, and make prediction with the smaller one to validate the model. This
technique works well but has the disadvantage of our classifier not getting trained and validated on all
examples in the data set (without counting the test set). ● The K fold cross validation . We split the data
set into k parts, hold out one, combine the others and train on them, then validate against the held out
portion. We repeat that process k times (each fold), holding out a different portion each time. Then we
average the score measured for each fold to get a more accurate estimation of our model's performance.

Figure 6. 10 fold cross validation

We split the training data into 10 folds and cross validate on them using scikit learn as shown in the
figure 2.4.2.1 above. The number of K folds is arbitrary and usually set to 10 it is not a rule. In fact,
determine the best K is still an unsolved problem but with lower K: computationally cheaper, less
variance, more bias. With large K: computationally expensive, higher variance, lower bias. We can now
train the naive bayes classifier with the training set, validate it using the hold out part of data taken from
the training set, the validation set, repeat this 10 times and average the results to get the final accuracy
which is about 0.77 as shown in the screen results below,
Figure 7. Result of the naive bayes classifier with the score representing the average of the results of
each 10 fold cross validation, and the overall confusion matrix.

Improvements

From the baseline, the goal is to improve the accuracy of the classifier, which is 0.77, in order to
determine better which tweet is positive or negative. There are several ways of doing this and we present
only few possible improvements (or not). First we could try to removed what we called, stop words.
Stop words usually refer to the most common words in the English language (in our case) such as: "the",
"of", “to” and so on. They do not indicate any valuable information about the sentiment of a sentence
and it can be necessary to remove them from the tweets in order to keep only words for which we are
interested. To do this we use the list of 635 stopwords that we found. In the table below, you can see the
most frequent words in the data set with their counts,

Table 13. Most frequent words in the data set with their corresponding count.
We could also try to stem the words in the data set. Stemming is the process by which endings are
removed from words in order to remove things like tense or plurality. The stem form of a word could
not exist in a dictionary (different from Lemmatization). This technique allows to unify words and
reduce the dimensionality of the dataset. It's not appropriate for all cases but can make it easier to connect
together tenses to see if you're covering the same subject matter. It is faster than

Lemmatization ( remove inflectional endings only and return the base or dictionary form of a word,
which is known as the lemma). Using the library NLTK which is a library in Python specialized in
natural language processing, we get the following results after stemming the words in the data set,

Figure 8. Result of the naive bayes classifier after stemming.

We actually lose 0.002 in accuracy score compared to the results of the baseline. We conclude that
stemming words does not improve the classifier’s accuracy and actually do not make any sensible
changes.

Let’s introduce language models to see if we can have better results than those for our baseline. Language
models are models assigning probabilities to sequence of words. Initially, they are extensively used in
speech recognition and spelling correction but it turns out that they give good results in text
classification.

An important note is that n gram classifiers are in fact a generalization of Naive Bayes. A unigram
classifier with Laplace smoothing corresponds exactly to the traditional naive Bayes classifier. Since we
use bag of words model, meaning we translate this sentence: "I don't like chocolate" into "I", "don't",
"like", "chocolate", we could try to use bigram model to take care of negation with "don't like" for this
example. Using bigrams as feature in the classifier we get the following results,
Formula: Results of the naive bayes classifier with bigram features.
Using only bigram features we have slightly improved our accuracy score about 0.01. Based on that
we can think of adding unigram and bigram could increase the accuracy score more.

Formula: Results of the naive bayes classifier with unigram and bigram features. and
indeed, we increased slightly the accuracy score about 0.02 compared to the baseline.

Conclusion

Nowadays, sentiment analysis or opinion mining is a hot topic in machine learning. We are still
far to detect the sentiments of s corpus of texts very accurately because of the complexity in the
English language and even more if we consider other languages such as Chinese.
In this project we tried to show the basic way of classifying tweets into positive or negative
category using Naive Bayes as baseline and how language models are related to the Naive Bayes
and can produce better results. We could further improve our classifier by trying to extract more
features from the tweets, trying different kinds of features, tuning the parameters of the naïve
Bayes classifier, or trying another classifier all together.
References

[1] Alexander Pak, Patrick Paroubek. 2010, Twitter as a Corpus for Sentiment Analysis and
OpinionMining.

[2] Alec Go, Richa Bhayani, Lei Huang. Twitter Sentiment Classification using Distant
Supervision.
Jin Bai, Jian Yun Nie. Using Language Models for Text Classification.

[3] Apoorv Agarwal, Boyi Xie, Ilia Vovsha, Owen Rambow, Rebecca Passonneau. Sentiment
Analysisof Twitter Data.

[4] Fuchun Peng. 2003, Augmenting Naive Bayes Classifiers with Statistical Language Models

Unit V-Ccs45-Ethics and Ai Notes
100% (1)
Unit V-Ccs45-Ethics and Ai Notes
31 pages
Michael Final Project
100% (1)
Michael Final Project
59 pages
Emotion Detection Analysis Documenration
No ratings yet
Emotion Detection Analysis Documenration
37 pages
Twitter Sentiment Analysis (NLP) : This Photo CC By-Nc
100% (1)
Twitter Sentiment Analysis (NLP) : This Photo CC By-Nc
18 pages
Sentiment Analysis Final Documentation Report
50% (2)
Sentiment Analysis Final Documentation Report
21 pages
Big Data Research Paper
No ratings yet
Big Data Research Paper
14 pages
Sentiment Analysis
No ratings yet
Sentiment Analysis
32 pages
Twitter Sentiment Analysis by Robin Singh
No ratings yet
Twitter Sentiment Analysis by Robin Singh
57 pages
Cse499a Report
No ratings yet
Cse499a Report
18 pages
3) Sentiment Analysis of Tweets Including Emoji Data
No ratings yet
3) Sentiment Analysis of Tweets Including Emoji Data
22 pages
Introduction
No ratings yet
Introduction
27 pages
Pre Processing
No ratings yet
Pre Processing
9 pages
Twiiter Sentiment Analysis
No ratings yet
Twiiter Sentiment Analysis
15 pages
13 Chapter 6 PSO GA DT
No ratings yet
13 Chapter 6 PSO GA DT
11 pages
Proposed Preprocessing Methods For Manipulate Text of Tweet
No ratings yet
Proposed Preprocessing Methods For Manipulate Text of Tweet
12 pages
CS1026 - Assignment 3
No ratings yet
CS1026 - Assignment 3
3 pages
Twitter Sentiment Analysis Using Deep Learning
No ratings yet
Twitter Sentiment Analysis Using Deep Learning
17 pages
Sentiment Analysis On Twitter Data
No ratings yet
Sentiment Analysis On Twitter Data
23 pages
Preprocessing Harvested Public Data Stream
No ratings yet
Preprocessing Harvested Public Data Stream
8 pages
Polarity Identification Through Emoticon Using Context Based Sentiment Analysis - 1605073640
No ratings yet
Polarity Identification Through Emoticon Using Context Based Sentiment Analysis - 1605073640
5 pages
Sentiment Analysis of User-Generated Twitter Updates Using Various Classification Techniques
No ratings yet
Sentiment Analysis of User-Generated Twitter Updates Using Various Classification Techniques
18 pages
Sentiment Analysis of Informal Malay Tweets With Deep Learning
No ratings yet
Sentiment Analysis of Informal Malay Tweets With Deep Learning
9 pages
Text Analysis With NLTK Cheatsheet
No ratings yet
Text Analysis With NLTK Cheatsheet
9 pages
Ijcse What
No ratings yet
Ijcse What
10 pages
S Arlan 2014
No ratings yet
S Arlan 2014
5 pages
Twitter Sentiment Analysis System
No ratings yet
Twitter Sentiment Analysis System
5 pages
Authors:: Apoorv Agarwal Boyi Xie Ilia Vovsha Owen Rambow Rebecca Passonneau
No ratings yet
Authors:: Apoorv Agarwal Boyi Xie Ilia Vovsha Owen Rambow Rebecca Passonneau
9 pages
Engineering Journal Sentiment Analysis Methodology of Twitter Data With An Application On Hajj Season
No ratings yet
Engineering Journal Sentiment Analysis Methodology of Twitter Data With An Application On Hajj Season
6 pages
Twitter Sentiment Analysis
No ratings yet
Twitter Sentiment Analysis
25 pages
Crowd Sourcing Platform IEEE Paper 1
No ratings yet
Crowd Sourcing Platform IEEE Paper 1
7 pages
TSA Synopsis
No ratings yet
TSA Synopsis
18 pages
Template For The First Slide of PPT Presentation1
No ratings yet
Template For The First Slide of PPT Presentation1
18 pages
ML Paper (Namrit & Ritika)
No ratings yet
ML Paper (Namrit & Ritika)
16 pages
571 Document Mod
No ratings yet
571 Document Mod
30 pages
Anand Institute of Higher Technology Department of Computer Science and Engineering ACADEMIC YEAR: 2018-19 Mini Project Report
No ratings yet
Anand Institute of Higher Technology Department of Computer Science and Engineering ACADEMIC YEAR: 2018-19 Mini Project Report
9 pages
Sentiment Analysis Presentationnotes
No ratings yet
Sentiment Analysis Presentationnotes
4 pages
Sentiment Analysis For Social Media
No ratings yet
Sentiment Analysis For Social Media
6 pages
Project Report
No ratings yet
Project Report
10 pages
Sentiment Analysis On Twitter in R
No ratings yet
Sentiment Analysis On Twitter in R
3 pages
6 Project Report Sem6
No ratings yet
6 Project Report Sem6
13 pages
(IJCST-V8I5P3) : Gajendra R. Wani
No ratings yet
(IJCST-V8I5P3) : Gajendra R. Wani
4 pages
CS5228 Project 2 Twitter Sentiment Analysis Group No.: 29: 1 Problem Statement
No ratings yet
CS5228 Project 2 Twitter Sentiment Analysis Group No.: 29: 1 Problem Statement
15 pages
Implementation of Sentiment Analysis On Twitter Data
No ratings yet
Implementation of Sentiment Analysis On Twitter Data
6 pages
Feature Extraction of Geo-Tagged Twitter Data For Sentiment Analysis
No ratings yet
Feature Extraction of Geo-Tagged Twitter Data For Sentiment Analysis
6 pages
Sentiment Analyis
No ratings yet
Sentiment Analyis
13 pages
Hate Speech Detection Using Machine Learning2
No ratings yet
Hate Speech Detection Using Machine Learning2
4 pages
Twitter Sentiment Analysis
No ratings yet
Twitter Sentiment Analysis
7 pages
Sentiment of Tweets
No ratings yet
Sentiment of Tweets
7 pages
Sentiment Analysis On Twitter
100% (2)
Sentiment Analysis On Twitter
8 pages
Sentiment Analysis On IMDB Movie Comments and Twit
No ratings yet
Sentiment Analysis On IMDB Movie Comments and Twit
8 pages
Sentiment Analysis
No ratings yet
Sentiment Analysis
3 pages
Sentiment Analysis of Tweets Using Machine Learning
No ratings yet
Sentiment Analysis of Tweets Using Machine Learning
22 pages
Sentiment Analysis of Twitter Data by Making Use of SVM Random Forest and Decision Tree Algorithm
No ratings yet
Sentiment Analysis of Twitter Data by Making Use of SVM Random Forest and Decision Tree Algorithm
6 pages
Project Proposal Machine Learning: Title: Team Members
No ratings yet
Project Proposal Machine Learning: Title: Team Members
2 pages
M3-Social Media Text Analytics
No ratings yet
M3-Social Media Text Analytics
19 pages
DA Project Report
No ratings yet
DA Project Report
17 pages
Twitter Sentiment Analysis: The Good The Bad and The OMG!: Efthymios Kouloumpis Theresa Wilson Johanna Moore
No ratings yet
Twitter Sentiment Analysis: The Good The Bad and The OMG!: Efthymios Kouloumpis Theresa Wilson Johanna Moore
4 pages
Machine Learning Algorithm For Sentimental Analysis of Twitter Feeds
No ratings yet
Machine Learning Algorithm For Sentimental Analysis of Twitter Feeds
4 pages
Twitter Sentiment Analysis Using Machine Learning Algorithms IJERTV12IS070128
No ratings yet
Twitter Sentiment Analysis Using Machine Learning Algorithms IJERTV12IS070128
3 pages
Sypnosis: Twitter Sentimental Analysis
No ratings yet
Sypnosis: Twitter Sentimental Analysis
3 pages
Unit 1 Text and Speech Analysis Notes
No ratings yet
Unit 1 Text and Speech Analysis Notes
28 pages
Group-Project Final Documentation2
No ratings yet
Group-Project Final Documentation2
59 pages
A Review On Advances in Sentiment Analysis A Deep Learning Approach Using Transformer Based Models
No ratings yet
A Review On Advances in Sentiment Analysis A Deep Learning Approach Using Transformer Based Models
5 pages
Minor PROJECT - MBA
No ratings yet
Minor PROJECT - MBA
38 pages
Meet Your Customers v4.7 Ebook
No ratings yet
Meet Your Customers v4.7 Ebook
197 pages
BU - Hackathon Problem Statement V2
No ratings yet
BU - Hackathon Problem Statement V2
81 pages
Survery of LLMs For Financial Applications 2024
No ratings yet
Survery of LLMs For Financial Applications 2024
39 pages
Gayuuu NLP
No ratings yet
Gayuuu NLP
16 pages
Rescue Bots
No ratings yet
Rescue Bots
12 pages
Book of Abstracts Joint Conference
No ratings yet
Book of Abstracts Joint Conference
72 pages
Research Proposal On Hate Speech Detection On Ethiopian Social Media Text Using Sentiment Analysis
No ratings yet
Research Proposal On Hate Speech Detection On Ethiopian Social Media Text Using Sentiment Analysis
7 pages
Mini Project Report 2f Inal
No ratings yet
Mini Project Report 2f Inal
16 pages
Commander-GPT: Fully Unleashing The Sarcasm Detection Capa-Bility of Multi-Modal Large Language Models
No ratings yet
Commander-GPT: Fully Unleashing The Sarcasm Detection Capa-Bility of Multi-Modal Large Language Models
13 pages
ChatGPT For Finance Research Opportunities and Limitations
No ratings yet
ChatGPT For Finance Research Opportunities and Limitations
33 pages
EDA Unit-3
No ratings yet
EDA Unit-3
16 pages
Compilation of MA Course Descriptions by Prof Benhima
No ratings yet
Compilation of MA Course Descriptions by Prof Benhima
15 pages
VADI Report
No ratings yet
VADI Report
19 pages
Navya - Week 4 Assignment
No ratings yet
Navya - Week 4 Assignment
7 pages
Social Media Commerce PDF
No ratings yet
Social Media Commerce PDF
13 pages
5 Tahapan Case Folding, Tokenization Dan Filtering, Stopword Removal, Stemming.
No ratings yet
5 Tahapan Case Folding, Tokenization Dan Filtering, Stopword Removal, Stemming.
7 pages
Ilovepdf Merged
No ratings yet
Ilovepdf Merged
9 pages
211013 웨인힐스벤처스 - 브로슈어 (영문) 복사
No ratings yet
211013 웨인힐스벤처스 - 브로슈어 (영문) 복사
9 pages
Stock Price Analytics
No ratings yet
Stock Price Analytics
10 pages
Integrating Multimodal Deep Learning For Enhanced News Sentiment Analysis and Market Movement Forecasting
No ratings yet
Integrating Multimodal Deep Learning For Enhanced News Sentiment Analysis and Market Movement Forecasting
8 pages
NLP Notes
No ratings yet
NLP Notes
6 pages
DLNLP - Course Outline
No ratings yet
DLNLP - Course Outline
3 pages
Social Media Cyberbullying Detection Using Machine Learning in Bengali Language IJERTV10IS050083
No ratings yet
Social Media Cyberbullying Detection Using Machine Learning in Bengali Language IJERTV10IS050083
4 pages
Hands-on Supervised Learning with Python
From Everand
Hands-on Supervised Learning with Python
Madeleine Shang
No ratings yet
Me and My AI: 1, #1
From Everand
Me and My AI: 1, #1
Factsmasterx
No ratings yet
Learn Emotion Analysis with R: Perform Sentiment Assessments, Extract Emotions, and Learn NLP Techniques Using R and Shiny (English Edition)
From Everand
Learn Emotion Analysis with R: Perform Sentiment Assessments, Extract Emotions, and Learn NLP Techniques Using R and Shiny (English Edition)
PARTHA MAJUMDAR
No ratings yet

TWITTER SENTIMENT NLP Projectt

Uploaded by

TWITTER SENTIMENT NLP Projectt

Uploaded by

“Twitter Sentiment

A Mini Project Proposal Submitted in

Neelam Phadnis (Assistant Professor. HOD)

DEPARTMENT OF COMPUTER ENGINEERING

SHREE L. R. TIWARI COLLEGE OF ENGINEERING

Figure 1: Histogram of the tweets according to their sentiment

Table 2. Before processing emoticons, list of tweets where some of them

Table 4. Tweets before processing Unicode.

Table 5. Tweets after processing Unicode.

Table 6. Tweets before processing lowercase.

Table 7. Tweets after processing lowercase.

Table 9. Tweets after processing targets.

Figure 4. A tweet before processing negation words.

Figure 5. A tweet after processing negation words.

Sequence of repeated characters:

Table 11. Tweets before processing sequences of repeated characters.

Formula: Baye’s rule

Figure 6. 10 fold cross validation

Figure 8. Result of the naive bayes classifier after stemming.

You might also like