0% found this document useful (0 votes)
9 views

research ashish

Uploaded by

Tarun Chauhan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views

research ashish

Uploaded by

Tarun Chauhan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

SENTIMENTAL ANALYSIS

ASHISH SHUKLA
(Bachelor of Technology, Final year student), Associate Professor,
Department of Computer Science & Engineering, GLA University, Mathura, U.P., India
Abstract: - Through different social networking OVERVIEW
sites and review sites, the internet web generates
vast amounts of data that capture user ideas, A key component of Natural Language Processing
feelings, opinions, and discussions on social issues, (NLP) research which aims in identifying and
politics, brands, and products. Politicians, classifying feelings or views expressed in text is
corporations, and people are all greatly impacted sentiment analysis. Large amounts of data are
by these beliefs. Sentiment analysis in today’s produced by the expanding world of platforms such as
world is becoming more popular as a result of the social networks, forums, blogs, and review sites. These
unstructured nature of this data, which calls for statistics reflect a range of opinions on social and
analysis and organization political issues, as well as goods and services. This
Sentiment analysis is the process of organizing unlabelled data is crucial for influencing decisions in a
textual data in order to recognize and classify range of fields, including politics, marketing, and
emotional expressions or opinions into favourable, governance, because it can reveal information about
unfavourable, negative, and positive categories. public opinion. By categorizing sentiments into
Although essential for deriving insights from user positive, negative, and neutral categories, governments
feedback, the subject is beset by difficulties, most can measure public sentiment effectively and
notably the lack of labelled datasets in Natural businesses can assess customer satisfaction.
Language Processing (NLP)
Sentiment analysis uses deep learning techniques, MOTIVATION
which are known for their capability to
automatically learn and extract features, and tackle The need for automated systems that can process and
those obstacles. analyse textual data at scale is highlighted by the rise
in digital communication. In addition to being
Keywords: - Sentiment analysis, deep learning challenging, manual analysis of such large amounts of
models, machine learning, natural language data is also prone to bias and error. Sentiment analysis
processing. helps to overcome this challenge by providing
efficient, scalable, and objective methods for
INTRODUCTION
interpreting sentiments. Because sentiment analysis
has so many uses, it is always changing. It helps with
To transform this raw textual data into meaningful customer service improvement, product feedback
knowledge, sentiment analysis—also referred to as analysis, and brand reputation monitoring.
response mining—is crucial. It involves Numerous organizations use it to predict outcomes,
classifying opinions and emotional manifestations determine new trends,and assess public opinion.
into categories such as good, negative, and neutral. Additionally, sentiment analysis is crucial to
Sentiment analysis methodically arranges input to academic research because it provides important
provide decision-makers with relevant data. The insights in fields like linguistics, psychology, and
primary challenge facing the topic, despite its sociology.
increasing popularity, is the scarcity of annotated
datasets required to create efficient Natural OBJECTIVE
Language Processing (NLP) models. In order to
The main goal of this study is to identify and
better understand how deep learning techniques
categorize sentiments, opinions, or emotions found
can possibly overcome current constraints and
in data in the form of text by systematically
improve the accuracy and efficiency of sentiment
analysing it. Using advanced computational
categorization systems, this paper explores the
techniques, the researchers aim to turn unlabelled
most recent applications of these techniques to
data into valuable text, offering a clearer
sentiment analysis.
understanding of human thoughts across different
areas.
The study focuses on these key objectives: were Naïve Bayes, which are Support Vector Machines
(SVM), and
● Build Effective Analytical Models: Logistic Regression models. The first algorithm was the
Use deep learning models to improve Naïve Bayes, which was fast for computation and best
the accuracy and efficiency of rolled out for text classification practices, second was
sentiment analysis. SVM which was efficient at high dimensionality and
● Overcoming Language Challenges: identified accurate hyperplanes for classification.
Develop strategies to handle complex Coordination with the previous modules, Logistic
language issues like proverbs, idioms, Regression offered a probabilistic viewpoint on the
and context-related ambiguities. matter and therefore was quite easy to interpret. These
● Broaden Application Scope: Create techniques were used extensively throughout the
easy models that can be applied to domains including the analysis of customer feeds,
various tasks, such as analysing social gauging the public sentiments during polls, and
media, evaluating product reviews, measuring the feedback on products for further
assessing policies, and supporting enhancements. These models employed labelled data to
academic research. improve classification to classify and relied on
handcrafted features that checked scalability and
flexibility issues.

Despite the coming of deep learning, sentiment analysis


has, therefore, been complicated and advanced to
incorporate context and semantic aspects. As for today,
recurrent structures as LSTMs and transformer models
like BERT is used. Unlike other methods, the LSTM
networks must cope with sequential dependency
RELATED WORK
problems, and contains a memory cell which can capture
long term dependencies. This makes LSTM incredibly
Substantial scientific work has earned in the sphere good with out of vocabulary words and the order and
of sentiment analysis, and work carried out in this context over a large passage of text. However, BERT
sphere is known among a vast array of approaches. solely depends on a so-called bidirectional transformer
To start with, most of the initiated and developed encoder architecture, and this allows the model to
solutions were based on vocabulary applications, process input in the forward as well as reverse directions.
where there were none-natural language and This bidirectional approach significantly enhances its
dictionaries that include terms associated with capacity to comprehend delicate evaluation of language
sentiment. For example, the AFINN lexicon that is rules and phrases compared to the other methods for
frequently used in the first level classification tasks sentiment classification. These are-reactants were
of sentiment analysis gave words scores in order to clearly pointed out by the BERT model by Devlin et al.
determine their affective nature. However, these 2018 that concretized Bidirectional transformers
methods were inadequate in most cases to account promote Contextual Learning.
for context and complexity of changes within
However, some problems still remain: More special
languages over time. This limitation fostered the
demands are required depending on the genre and
machine learning models; that enabled vast
language of the material, the high requirements for
improvements in accuracy and expressiveness in
computation, and finally, the nature of language
sentiment analysis systems.
uncertainty. Scientists have been considering these
With the advent of machine learning, sentiment problems in terms of the following approaches: models
analysis entered into another phase where the for dispatch to diminish the load on computers and
concept allowed the handling of projects of large include reconstitute transformers free of weight, as well
sets of data and automatic feature extraction. as applications of the machine in real time learning.
Algorithms that became popular in the first years Moreover, transfer learning and domain adaptation are
other two research areas which aim to improve the C. Feature Extraction
generalization of the model in various situations. In the feature extraction we try to find out the best
For the use of resolving the problem of lexical suitable outcome from the reviews and ratings that are
vagueness, the approach that seems most effective generated about the particular product on the e-
for contributing to contextual comprehension is the commerce sites. To find out this we use the method of
implementation of the sentiment-dependent bag of words which can predict the result.
embeddings to a transformer model that requires
fine-tuning between attentions. Such ongoing Bag of Word is the technique in the computer science
improvements result in constant enhancement of field known as natural language processing which is
the esteem systems towards becoming fertile to used to extract features from text. These features can be
handle different functions. used for training machine learning algorithms. It creates
a vocabulary of all the unique words occurring in all the
PROPOSED WORK documents in the training set. The output of the bag of
words model is a frequency vector. Fig 2: - Different
The first important step in sentiment analysis is word count
data collection, which helps in obtaining textual
information from multiple sources that represents D. Sentiment Classification
user sentiments, opinions, and emotions. The ❖ Naive Bayesian Classifier
effectiveness of sentiment analysis models is Consider the case with one normally
directly impacted by the calibre and applicability distributed predictor and two classes. We classify
of the data that is gathered. An overview of the the highest input density, taking the prior into
main elements and methods of data collection in account. A generative classifier is a model that
this situation is provided below: specifies how to generate the data given the class
conditional densities p(x|y = c) and the (prior) class
A. Data Collection probabilities p(y = c). This is a model for the joint
Data collection is the main and first process of distribution p(y, x). We compute the conditional
sentimental analysis which is used to collect the probabilities for classification using Bayes’
basic data or information on which work is done. theorem, p(y = c|x) = p(x|y = c)p(y = c) P c 0∈Y
Consumers express their sentiments about p(x|y = c 0)p(y = c 0) . The Naive Bayes classifier
particular products on e-commerce websites like (NBC) is a simple generative model based on the
Amazon, Flipkart. Their sentiments and opinions assumption that the predictors are conditionally
are expressed in different ways, with different independent given the class label.
vocabulary, context of writing, usage of short The class conditional density the becomes p(x|y = c)
forms and slang, making the data huge and = Y p j=1 p(xj |y = c)
disorganized. Manual analysis of sentiment data is
virtually impossible. Therefore, we use sentiment E. Data Set
analysis to make this effort easy. The website is In the data set it is distributed in 2 parts:
made with many products inside it and the data Training data set: Training data set is taken from
about those products are collected in which UCI machine learning repository which contains
reviews and ratings of those products are stored. 1000 reviews which are labelled positive and
The assessment is collected in the form of reviews negative i.e. 1 and 0. We have used 75% of data to
which are given by the different consumers of that train the classifier and 25% to evaluate the
product shows their opinion. performance of the classifier.
Second type of data is unlabelled data which are
B. Pre-processing reviews of various product which are to be
The pre-processing is nothing but filtering the classified as positive and negative polarity by
extracted data before analysis. It includes using our trained classifier to review the product.
identifying and eliminating non-textual content
and content that is irrelevant to the area of study
from the data.
RESULT ANALYSIS computation. Although it is efficient in handling
big data while consuming little computational
The report generated by the sentiment analysis resources, it is not very effective in modelling
provides a comprehensive view of user sentiments relationships between features. However, the
based on their reviews. This analysis categorizes fact that it gives a beginning basic performance
feedback into positive or negative responses, satisfying criterion with comparatively low
derived from the specific features mentioned by computational consumptions makes it
users. The polarity classification is instrumental in generally appropriate for the first investigation
summarizing the general sentiment trends and of sentiment classification problems.
highlights critical aspects influencing user
opinions. The results of the analysis showed a wide
range of model scores and different advantages and
disadvantages of the tested models.

● Support Vector Machine (SVM): Regarding the


SVM model, the author presented an accuracy
of 83% which shows that the model possesses
high capability in dealing with high
dimensionality. SVM, is highly suitable in
conditions where variables are few due to its
capacity in maximizing the distance between
classes in multi-dimensional space. This
characteristic enables it to attain enhanced
accuracy, particularly when operating on well-
class-separating problems. Despite this, its
computational intensity may be something of a
problem when used in larger populations.
Kernel functions and other parameters that
comprised the optimization technique played a
vital role in enhancing its classification ability
while restricting computational intensity to
reasonable levels.

1. Model Performance:

● Naive Bayes: The Naïve Bayes model


reached an impressive high of 78%
accuracy significance proving its efficiency
in circumstances that require ease and
lightning-fast decision. This model works ● Long Short-term Memory (LSTM): The
with the premise that in fact all the accuracy rate model LSTM which fits into this
features are independent with the class project attained an impressive 88%. In and
label, thus making it efficient in outperforming its LSTM’s inherent capacity to
extract sequential dependencies and example, in determining sarcasm, mixed polarity and
contextual consistencies from textual data special domain languages. For example, a review, “This
sets. For techniques like sentiment product broke within a week, but at least I got some
analysis where the order and phrases that good exercise returning it,” perfectly illustrates the
the words come in affects the sentiment challenge models have in earmarking sarcasm. Likewise,
heavily, LSTM can be used to address the those reviews with mixed or two opposite sentiments or
vanishing gradient problem using gated with the so-called expert words or phrases, were
mechanisms. The architecture of this misclassified. Solving these problems can be achieved by
model allowed it to recognize context- employing more complex techniques: subspace learning
specific dependencies like comparing and for sentiment-specific embeddings; better attention
exaggerating phrases in textual reviews, schemes within transformers.
which are suitable for datasets with more
sequenced data dependencies. 3. Visualization: The presentations focused on
graphics and other such presentations to come up with
some findings. Pie charts and bar graphs were used for
the representation of sentiments in terms of number
and affinity to particular forms of media while word
clouds were used to get an idea about the number of
times a particular word had been used in positive and
negative remarks. These graphical representations
improved understanding of the results, which would
help in presenting them to different users.
● BERT: BERT fared better than other
models earning a phenomenal 92%
accuracy score. The bidirectional
transformer architecture makes it possible CONCLUSION
to understand context since the
transformer processes text bidirectionally.
In this paper, an integration approach for performing
The complex and even ambiguous
sentiment analysis on the product reviews is suggested
structure of certain sentence handling, as
using both conventional and deep learning approaches.
the authors mentioned that BERT is
The findings reveal that while the initial methods are
trained in large text corpuses, make this
beneficial to carry fast and straightforward diagnostics
model indispensable for sentiment
tests, deep learning models offer greater precision with
analysis. The model was particularly context.
effective in recognizing complex types of
sentiments, irony, and multiple emotions From this experiment, we were able to converge on
which conventional approaches fail to the best model i.e., the BERT model which had an
accommodate. Furthermore, the fine- accuracy of 92%. There are difficulties, such as
tuning provided the exact ability to make recognizing sarcasm, sentiments, beliefs, or
certain adjustments on the product domain-specific terms that reveal ideas for the
review dataset for particular domain further study. A possible future refinement for
optimization. sentiment analysis in the near future could be
either the use of ensemble models or techniques
that involve domain adaptation of the multimodal
data integration at the image and video levels.

2. Error Analysis: There were some issues


although they remained constant in some areas, for
REFERENCES

1. Pang, B., & Lee, L. (2008). Opinion mining


and sentiment analysis. Foundations and
Trends in Information Retrieval, 2(1-2), 1-
135.
2. Vaswani, A., Shazeer, N., Parmar, N.,
Uszkoreit, J., Jones, L., Gomez, A. N., ... &
Polosukhin, I. (2017). Attention is all you
need. Advances in Neural Information
Processing Systems, 30.
3. Mikolov, T., Chen, K., Corrado, G., &
Dean, J. (2013). Efficient estimation of
word representations in vector space. arXiv
preprint arXiv:1301.3781.
4. Devlin, J., Chang, M. W.,
Lee, K., &
Toutanova, K. (2018). BERT: Pre-training
of deep bidirectional transformers for
language understanding. arXiv preprint
arXiv:1810.04805.
5. Liu, B. (2012). Sentiment analysis and
opinion mining. Synthesis Lectures on
Human
Language Technologies, 5(1), 1-167.
6. Jurafsky, D., & Martin, J. H. (2021).
Speech and language processing (3rd ed.).
Pearson.

You might also like