Research on Sentiment Analysis
Sanjay Singla Prashant Ahluwalia Abhay Choudhary
Chandigarh University Chandigarh University Chandigarh University
[email protected] [email protected] [email protected]Abstract—In the present times, we all know the use of the complex model called roberta model from hugging
social media on such a grand level. We all are expressing face. We will do the analysis of how these models
emotions on the internet through the social media apps.
Many institutes, companies, businesses are looking to get perform and we will even explore pre-trained
reviews of their products online. Those who understand pipelines to make sentiment analysis more easy for
the emotions of others with empathy and work upon them us.
are the people and organizations who are best at serving
others and acquire the most of the shares of the market.
Emotions are always the driving factor of success of any
Sentiment analysis is used everywhere nowadays,
company. If any organization correctly assess the emotions Data Science coupled with AI is the technology of
of the people and work upon them and make their products the 21 century.
according to the people then obviously the business of that
organization will flourish.
Objective is to study the reliability of sentiment
Same refers to the restaurant and food apps. If a food app analysis tools we have in 21th century in the real
or restaurant can understand what are the reviews of the world.
people on their any food product they can work according
to it, they add new dishes and cut down those which are
not needed will only just make their profits more and will
add to their increased share in the market.
Political parties too all over the globe use sentiment II. LITERATURE REVIEW
analysis to know the sentiment regarding their work in the Many research papers on the sentiment analysis
mind of the people . Even there has been many instances have been published over past on the sentiment
of data leaks that meant that large organizations nowadays analysis where authors have talked about the various
are hungry for data. Data is the new oil.
things such as Text Preparation, Sentiment
And from this we can get the feelings of others regarding a Documentation, Sentiment classification etc.
particular thing. Its same as if you could know what
someone feels about you .You will improve yourself but if That was crucial to understand for me the sentiment
you could not even know about these things then this analysis. Further studying about the lexical approach
becomes a very challenging thing. topics and machine learning approaches are what
and how they are different have helped to
If you can make life of the end consumer easy and happy understand the basic differences between the two
you can become the best in your domain of business and basic model that I have compared here .
sentiment analysis in my opinion is the very good tool to
know whether the end customer is happy or not with the
product.
III. PROBLEM STATEMENT
Nowadays many challenges are faced by the
I. INTRODUCTION businesses all over the globe .One such challenge is
to know about the real feeling of the end consumer
We will be doing a nlp project from start to finish. for your product. If at a right time this can be
Nlp is used to identify the emotions behind the text. known and required adjustments can be made. This
First we will start with the traditional approach can be a life saver for the business or the other way
using nltk and then we will move on to the more
is the complete destruction of the business and vaders model with regarding to the reality of human
making it fall into irrelevance. perceptions.
The emotion conveyed by sentiment analysis model
traditionally may be irrelevant because everyone is a
unique individual, sometimes people use sarcasm V. SENTIMENT ANALYSIS
too, To solve this problem first we apply the simple
sentiment analysis vaders model by the nltk and 1.Detecting the sentiment
also after that the more accurate model called Sentiment detection refers to find the sentiment in
roberta model by hugging face and at last compare the review with the use of nlp or machine learning
the results of both the vaders and roberta model. technique.
It is also known as opinion mining .
Detection classifies the given set of words from a
review in positive negative or neutral.
2.Machine learning vs lexical based approach
Lexical based approach focuses on the large corpus
of data to detect the sentiment . It also uses
dictionary too for detecting the sentiment.
On the other hand machine learning based approach
is based upon the machine learning models and
techniques to assess the sentiments.
Fig. 1 : Showing different results of sentiment analysis Machine learning based models are more scalable as
compared to the lexical based models and flexible
too.
It is normally seen that the machine leaning
IV. OBJECTIVES:
approaches almost always are better in the accuracy
than lexical based approaches.
The objectives that we will use this sentiment
analysis for is that : On the other hand, as the lexical based approaches
1.To obtain the feedback from the customers of the are more rudimentary they have less resource
food app regarding what are their sentiments and requirements than the machine learning based
emotions regarding to the food that is being served approaches.
to them .
Machine learning approaches are are more easy to
2.To make sure that this process is as clear as be applied to different domains in the form of
possible and is free from ambiguity by making sure languages whereas it is harder to do so in terms of
that the system understands the sarcasm too and lexical based approach.
other literary devices too.
Only problem in machine learning approach is the
3.To understand human emotions more and more need of large computational capacities as compared
using the better model such as roberta by Hugging- to the lexical based approach. But most of the time
face. sentiment analysis is done by large organizations
who have ample amount of computation capacities
4.To finally draw the comparisons between the and team to pull of these jobs.
roberta model and vaders model and check about the
correctness about the each model.
5.To draw the conclusion regarding the reliability of
the model. To know how accurate is the roberta or
Fig2. Types of sentiment analysis
VI. PROPOSED METHODOLOGY:
1.Data collection and pre processing:
The first step in any sentiment analysis model is the
collection of data. Data is collected from the people
in the form of reviews.
Fig 3.Sentiment analysis.
Consumers are made to give reviews and are
provided with the facility to rate their reviews too in 2.Application of Vaders model :
the form of number of starts in their feedback.
NLTK
This method is used everywhere, in google play
store, amazon store, shoppify etc even the online The Natural Language Toolkit is a group of libraries
services places too such as consultation. that is used for NLP processing in the python
language. We can do various acts such as
This helped to collect data and that led to us know classification, tokenization, stemming etc.
about the emotions of people.
Tokenization means that we break down a large text
But this is not enough, data after being collected is into small small parts of it whom we call as
cleaned and pre-processed too.For cleaning and tokens .It helps us to understand the sentiment and
preprocessing several techniques are used in the case emotion of people by segregating the words so that
of discrepancies found. we get what we want from the paragraph precisely.
Sometimes the values are missing that are filled up It make sure that we get those word from the
by the mean values other times the values that are sentence that we want for analysis like if there is a
not relevant are simply brushed off to make our sentence that :"I like the food". upon breaking the
process smoother as we only need specific part of sentence we will get all these words as separate and
the data only to make a prediction. here we will analyze the sentiment by the word
"like".
Some parts of data are cut , some factors are ignored
to make the model more and more accurate. VADERS is the model based on the basic nltk
Achieving 100% accuracy is not possible and in real
models that are considered good nowadays accuracy Vaders model is applied on the dataset and the
is generally between 70% to 80%. negative, positive and neutral reviews are recorded.
Only artificially prepared datasets lead to such high VADERS is a rule based sentiment analyzer.It is
level of accuracy. available in nltk package and is subset of the nltk . It
has a lot of lexical features that are words ,which are
generally called as per their value of sentiment as
positive and negative or neutral.
It tells us about the emotion in the sentence as well
as the intensity of the emotion too.
this model is very good at handling the human
emotions in the sentences
eg:if I am saying ,"It kinda sucked" - This model much much better than the Vaders model that was
can easily interpret it too. obviously a primitive model in comparison to it.
The large quantity of data that we possess makes it Results with the Roberta model were more confident
possible for us that this vaders model is so much in comparison to the Roberta Model.
accurate and is very good in the sentiment analysis.
In Roberta model it was found that the 5 stars
Vader model give us result in the form of the reviews were more on the positive side as compared
negative, positive and compound scores regarding to the Vaders model. Similarly 1 star review were
the certain type of data that we have kept as the data. more on the negative side as compared to the Vaders
models. It can be said that the Vaders model is the
But it has a disadvantages, It is unable to understand more confused model in comparison to the Roberta
the human language as a whole and fails to make model and Roberta was confused less though none
sense that is correct according to human of them was 100% accurate.
understanding and application of languages.
for example, When we passed the sentence such That is to say that the reviews in the Roberta model
as ,"The pasta was horrible. Only poitive point was were more in the natural flow of what people wanted
that I was not even expecting that it would we good. to convey.
My friend ordered it, but I am happy that she
realized her mistake now."
Obviously as a human, we can understand it is
VII. CONCLUSION
clearly a negative sentiment for the product .In fact,
fully negative. But still this model says this
statement as positive as it has only picked up upon The conclusion is that deep learning models have
specific words only. such as goodness, happy etc as come a long way. It can be said that now machines
a positive sentiment. are becoming more and more emphatic. Computers
can understand human emotions better that the
This is therefore not the most reliable model anytime in the history. This is the new dimension in
therefore. the human experience that was not possible earlier.
3.Application of Roberta by hugging face: This is getting better and better only as the time is
passing by more and more. Though human society is
Now roberta model is too applied on the dataset, ever changing as nowadays like most of the history
results were bad but were better in comparison to the there has been advent of new words in the language
vaders model as it was able to understand some part of humans.
of the human language as it was earlier trained on
the human sentiments by the use of deep learning. These models need to be made auto learning model
in my opinion so that AI can become more and more
This model understood the reviews in the much effective. The day AI will learn to learn in itself the
better way than the Vaders model. But it had one meaning of the words without much needed to train
condition , The condition was that the model that them by the large number of dataset and machine
was in its dataset from early one , It was able to learning engineers, the AI will become as good as
perform good but when it came to the model in the humans in understanding the human emotions and
real sense. sentiments.
This too was too much far away from perfection as This in my opinion is a possibility if AI starts
in the case of vaders model but it was somewhat learning by collecting dataset from the surrounding
better. on its own and asking the Questions and humans
answering them. This is scary yet very exciting .
4.Comparison of the results:
This is bound to happen I guess with the pace in
When the results were compared for both the models which development is taking place. Only thing that
it was found that the Roberta model performed can stop it from happening is practicality because it
is impossible for us to provide such a large number orientation of adjectives, New York, N.Y.10027,
of memory space and processing power to AI. USA
VIII. REFERENCES
1. NasukawaY(2003)Sentimentanalysis:capturing
favorability using natural language process ing, IBM
Almaden Research Center, CA 95120,
https://doi.org/10.1145/945645.945658
2. MoheyD(2016)Asurveyonsentimentanalysis
challenges. J King Saud Univ Eng https://doi.
org/10.1016/j.jksues.2016.04.002
3. Alessia D (2015) Approaches, tools and
applications for sentiment analysis implementation.
Int J Comput Appl 125(3)
4. Xu W,Ritter A, Grishman R (2013) Gathering and
generating paraphrases from twitter with application
to normalization
5. Hazra TK (2015) Mitigating the adversities of
social media through real time tweet extraction
system, IEEE,
https://doi.org/10.1109/iemcon.2015.7344483
6. Semih Y (2014) Tagging accuracy analysis on
part-of-speech taggers. J Comput Commun 2:157–
162, https://doi.org/10.4236/jcc.2014.24021
7. El-Din DM (2015) Online paper review analysis.
Int J Adv Comput Sci Appl 6(9)
8. Kaushik L (2013) Sentiment extraction from
natural audio streams, IEEE https://doi.org/10.
1109/icassp.2013.6639321
9. Vaghela VB(2016)Analysisofvarious sentiment
classification techniques. Int J Comput Appl 140(3)
10. BiltawiL M (2016) Sentiment classification
techniques for Arabic language a survey, IEEE,
https://doi.org/10.1109/iacs.2016.7476075
11. GoelA(2016)Realtimesentiment analysis of
tweets using naive bayes, IEEE, https://doi.org/
10.1109/ngct.2016.7877424 12.
Hu M, Liu B (2004) Mining and summarizing
customer reviews, seattle, Washington, USA,
https://doi.org/10.1145/1014052.1014073
13.Rob Mulla
14.KimS-M(2004)Determiningthe sentiment of
opinions, ACM Digital Library, https://doi.org/
10.3115/1220355.1220555
15. Mohammad S (2009) Generating high-coverage
semantic orientation lexicons from overtly marked
words and a thesaurus. In: Conference on empirical
methods in natural language pro cessing, pp 599–
608
16. Miller GA (1993) Introduction to word net: an
on-line lexical database 16. Hatzivassiloglou V,
McKeown R(1998)Predicting the semantic