0% found this document useful (0 votes)
54 views

Project Documentation 1

Uploaded by

mouli mudhiraj
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
54 views

Project Documentation 1

Uploaded by

mouli mudhiraj
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 68

SENTIMENTAL ANALYSIS ON MOBILE PHONE

REVIEWS

A project report submitted to


JAWAHARLAL NEHRU TECHNOLOGICAL UNIVERSITY ANANTAPUR,
ANANTAPURAMU
In partial fulfillment of the requirements for the award of the degree of

BACHELOR OF TECHNOLOGY
IN
INFORMATION TECHNOLOGY

Submitted by
M. LEKHANA (17BF1A1232)
K. MOULI (17BF1A1231)
G. RAGA MALIKA (17BF1A1220)

Under the esteemed guidance of


Mr. Y. VIJAYA KUMAR M.Tech.,
Senior Assistant professor

SRI VENKATESWARA COLLEGE OF ENGINEERING


(AUTONOMOUS)
DEPARTMENT OF INFORMATION TECHNOLOGY
(Approved by AICTE, New Delhi & Affiliated to JNTUA, Anantapur)
(Accredited by NBA B.Tech-CIVIL,EEE,MECH,ECE,CSE &IT)
Opp. LIC Training Center, Karakambadi road, Tirupati-517507, A.P
(2017 – 2021)
SRI VENKATESWARA COLLEGE OF ENGINEERING
(AUTONOMOUS)
DEPARTMENT OF INFORMATION TECHNOLOGY
(Approved by AICTE, New Delhi & Affiliated to JNTUA, Anantapur)
(Accredited by NBA B.Tech-CIVIL,EEE,MECH,ECE,CSE &IT)
Karakambadi road, Tirupati-517507, A.P
2017 – 2021

CERTIFICATE
This is to certify that the project report entitled,

“SENTIMENTAL ANALYSIS ON MOBILE PHONE REVIEWS”

is a bonafide record of the project work done and submitted by

M. LEKHANA (17BF1A1232)
K. MOULI (17BF1A1231)
G. RAGA MALIKA (17BF1A1220)

For the partial fulfillment of the requirements for the award of Degree of BACHELOR
OF TECHNOLOGY in INFORMATION TECHNOLOGY, JNTUA, Anantapur.

Project Guide Head of the department


Mr. Y. Vijaya kumar, M.Tech., Dr. S. Murali Krishna, M.Tech., Ph.D.
Senior Assistant Professor Professor and Head of the Department
Dept. of Information Technology Dept. of Information Technology
S.V. College of Engineering S.V. College of Engineering

Submitted for project viva-voce held on:

INTERNAL EXAMINER EXTERNAL EXAMINER


ACKNOWLEDGEMENT
We are thankful to our guide Mr. Y. Vijaya Kumar, Senior Assistant professor for his valuable guidance
and encouragement. His helping attitude and suggestions have helped in the successful completion of the
project.

We are thankful to project coordinator Mr. A. Basi Reddy, Senior Assistant Professor for his guidance
and regular schedules.

We would like to express our grateful and sincere thanks to Dr. S. Murali Krishna, Head, Dept of IT,
for his kind help and encouragement during the course of our study and in the successful completion of
the project work.

We have great pleasure in expressing our hearty thanks to beloved Principal Dr. N. Sudhakar Reddy for
spending his valuable time with us to complete the project.

Successful completion of any project cannot be done without proper support and encouragement. We
sincerely thank the Management for providing all the necessary facilities during the course of study.

We would like to thank our parents and friends, who have the greatest contributions in all our achievements
for the great care and blessings in making us successful in all our endeavors.

M. LEKHANA 17BF1A1232
K. MOULI 17BF1A1231
G. RAGA MALIKA 17BF1A1220
DECLARATION

We hereby declare that the Project report entitled “SENTIMENTAL ANALYSIS ON MOBILE
PHONE REVIEWS” submitted to the department of Information Technology, Sri Venkateswara
College of Engineering, Tirupati in partial fulfilment of requirements for the award of the degree of
Bachelor of Technology.

This Project is the result of our own effort and it has not been submitted to any other University or
Institution for the award of any degree or diploma other than specified above.

M. LEKHANA 17BF1A1232
K. MOULI 17BF1A1231
G. RAGA MALIKA 17BF1A1220
CONTENTS

CHAPTER TITLE PAGENO

ABSTRACT i
LIST OF ABBREVIATIONS ii
LIST OF FIGURES iii
1 INTRODUCTION 1
2 LITERATURE SURVEY 4
2.1 ASPECT BASED SENTIMENTAL ANALYSIS
2.2 CLASSIFICATION LEVELS
2.3 SENTIMENT CLASSIFICATION TECHNIQUES
3 COMPUTATIONAL ENVIRONMENT 9

3.1 HARDWARE REQUIREMENTS


3.2 SOFTWARE REQUIREMENTS
3.3 SOFTWARE FEATURES

4 FEASIBILITY STUDY 15
4.1 ECONOMICAL FEASIBILITY
4.2 TECHNICAL FEASIBILITY
4.3 SOCIAL FEASIBILITY

5 SYSTEM ANALYSIS 17

5.1 EXISTING SYSTEM


5.2 PROPOSED SYSTEM
6 SYSTEM DESIGN 18
6.1 UML DIAGRAMS 19
6.1.1 CLASS DIAGRAM 19
6.1.2 USE CASE DIAGRAM 21
6.1.3 ACTIVITY DIAGRAM 22
6.1.4 COMPONENT DIAGRAM 25
6.1.5 DEPLOYMENTENT DIAGRAM 26
6.2.1 DATA FLOW DIAGRAM 27
6.2.2 CONTROL FLOW DIAGRAM 28

7 SYSTEM IMPLEMENTATION 29
7.1 MODULES 29
7.2 ALGORITHMS 30

8 TESTING 36
8.1 UNIT TESTING 39
8.2 INTEGRATION TESTING 39
8.3 FUNCTIONAL TESTING 40
8.4 SYSTEM TESTING 40
8.5 ACCEPTANCE TESTING 41

9 SAMPLE SOURCE CODE 42


10 SCREEN LAYOUTS 56
11 CONCLUSION 59
12 REFERENCE 60
ABSTRACT

Sentimental analysis is the automated process of understanding the sentiment or opinion of


a given text. This machine learning tool can provide insights by automatically analyzing
product reviews and separating them into tags: Positive, Neutral, Negative. It is hard to
determine whether the product is good or bad from hundreds of mixed reviews. Also, it is
very time-consuming to read many reviews. So, opinion mining of reviews is necessary.

In this project, we aim to perform Sentiment Analysis on product-based reviews. Data


used in this project are mobile phone reviews collected from “amazon.com”. In our work,
we initially Pre-processed raw reviews to the cleaned reviews, before being converted from
text to vector representation using a range of feature extraction techniques such as BoW
Model (Bag of Words Model) and TF-IDE. logistic regression algorithm is used on features
to predict the output. We expect to do review-level categorization of review data with
promising outcomes.

Index Terms — Sentimental Analysis, Bag of words Model, TF-IDE and Logistic Regression

(i)
LIST OF ABBREVIATIONS

1 ABSA ASPECT BASED SENTIMENTAL ANALYSIS


2 API Application Program Interface
3 UML Unified Modeling language
4 IDE Integrated Development Environment
5 DFD Data Flow Diagram
6 CFD Control Flow Diagram

( ii )
LIST OF FIGURES
FIG NO FIG NAME PAGE NO

6.1.1 Class Diagram 20

6.1.2 Use Case Diagram 22

6.1.3 Activity Diagram 23

6.1.4 Component Diagram 24

6.1.5 Deployment Diagram 26

6.2.1 Data flow Diagram 27

6.2.2 Control flow Diagram 28

7.2.1 Logistic Regression flow 30

7.2.2 Logistic Regression model 31

7.2.3 Logistic Regression Graph 32

7.2.4 Logistic Regression Value Graph 33

( iii )
SENTIMENTAL ANALYSIS ON MOBILE PHONE REVIEWS

1. INTRODUCTION
1.1 Introduction & Objective:
Sentiment Analysis is one of the interesting applications of text analytics.
Although it is often associated with sentiment classification of documents, broadly
speaking it refers to the use of text analytics approaches applied to the set of problems
related to identifying and extracting subjective material in text sources.

Sentiment analysis is the interpretation and classification of text-based


data. The point of this analysis is to categorize each data-point into a class that represents
its quality (positive, negative, etc.). Sentiment analysis focuses on the polarity, emotions,
and intentions of authors. Classic sentiment analysis consists of the following steps: pre-
processing, training, feature extraction, and classification. The goal of the project is to
predict the sentiment of a given review. Our predicted data helps in identify whether the
product quality is good or bad based on reviews collected. Here we used mobile phone
reviews data set to predict top and genuine products of mobile phones at amazon website

1.2 Problem statement:


To build a model to predict the sentiment of given review and conduct a
sentimental analysis on product reviews. The trained model predicts user’s sentiment based
on reviews.

1.3 Objectives

The main objective of this project is to go about an extra mile to provide the users with an
output that is the analysis of thousands of reviews. To save time by analyzing thousands
of reviews in short period and if those reviews were analyzed manually it may take upto
decades.

➢ To build a prediction model for sentiment classification.


➢ Data Visualization in pie-chart and bar graphs.
➢ Creating a word cloud.
1.4 What is Sentiment Analysis?

Sentiment is an idea or feeling that someone expresses in words. With that in mind,
sentiment analysis is the process of predicting/extracting these ideas or feelings. We want
to know if the sentiment of a piece of writing is positive, negative or neutral. Exactly what
we mean by positive/negative sentiment depends on the problem we’re trying to solve.

For the BTS example, we are trying to predict a listeners opinion. Positive sentiment
would mean the listener enjoyed the song. We could be using sentiment analysis to flag
potential hate speech on our platform. In this case, negative sentiment would mean the text

Department of IT, SVCE Page 1


SENTIMENTAL ANALYSIS ON MOBILE PHONE REVIEWS

contained racist/sexist opinions. Some other examples include predicting irony/sarcasm or


even a person’s intentions (i.e. are they planning to buy a product)

1.5 Sentiment Analysis with Python


So, there are many types of sentiment analysis models. There are also many ways to do sentiment
analysis. We’ll concentrate on applying one of these methods. That is creating a bag-of-words
model using an SVM. Let’s start with the Python packages that will help us do this.
Packages

On line 2–5, we have some standard packages such as Pandas/NumPy to handle our data and
Matplotlib/Seaborn to visualise it. For modelling, we use the svm package (line 7) from sci-kit
learn. We also use some metrics packages (line 8) to measure the performance of our model. The
last set of packages are used for text processing. They will help us clean our text data and create
model features.

Dataset

To train our sentiment analysis model, we use a sample of tweets from the dataset. This dataset
contains 1.6 million tweets that have been classified as having either a positive or negative
sentiment.

Text cleaning

The next step is to clean the text. We do this to remove aspects of the text that are not important
and hopefully make our models more accurate. Specifically, we will make our text lower case and
remove punctuation. We will also remove very common words, know as stopwords, from the text.

To do this, we have created the function below which takes a piece of text, performs the above
cleaning and returns the cleaned text. In line 18, we apply this function to every tweet in our dataset.

Feature engineering (bag-of-words)

Even after cleaning, like all ML models, SVMs cannot understand text. What we mean by this is
that our model cannot take in the raw text as an input. We have to first represent the text in a
mathematical way. In other words, we must transform the tweets into model features. One way to
do this is by using N-grams.

Department of IT, SVCE Page 2


SENTIMENTAL ANALYSIS ON MOBILE PHONE REVIEWS

N-grams are sets of N consecutive words. In Figure 2, we see an example of how a sentence is
broken down into 1-grams (unigrams) and 2-grams (bigrams). Unigrams are just the individual
words in the sentence. Bigrams are the set of all two consecutive words. Trigrams (3-grams)
would be the set of all 3 consecutive words and so on. You can represent text mathematically by
simply counting the number of times certain N-grams occur.

Department of IT, SVCE Page 3


SENTIMENTAL ANALYSIS ON MOBILE PHONE REVIEWS

2. Literature survey
Sentimental analysis or opinion mining is the computational study of people’s opinions,
sentiments, attitudes, and emotions expressed in written language. It is one of the most
active research areas in natural language processing and text mining in recent years.

2.1 ASPECT BASED SENTIMENTAL ANALYSIS(ABSA)

ABSA deals with identifying aspects of given target entities and estimating the sentiment
polarity for each mentioned aspect.

2.1.1 Aspect extraction

Aspect extraction can be thought of some kind of information extraction which deals with
recognizing aspects in the entity. There are two ways in aspect extraction

• High frequent words or phrases across reviews are to be found and filtered by conditions
like “occurs right after sentiment word”.
• To specify all the aspects in advance and find them in the reviews.

In our project, we are displaying word cloud which is the graphical representation of the
frequently used words in the dataset.

2.1.2 Aspect sentiment classification

Aspect sentiment classification deals with the opinions and illustrates whether certain
opinion on a aspect is positive, negative or neutral.

Word can be described in many ways like it can be attractive, shock, positive, negative or
emotion. To identify the type of word description we need to perform sentimental analysis
where we need to understand the classification between the words and refine the sentiments
into the certain categories like positive and negative. This can be done using lexicon where
lexicon is a collection of hash tables ,dictionaries and wordlist which is used to identify
the polarity of words like positive and negative. Using this lexicons we can classify the
word sentiment.

2.2 CLASSIFICATION LEVELS

Sentiment Analysis (SA) is considered a three-layered approach. The first of the layers is
document based. The second being sentence based. The third is the aspect level, also
considered as word or phrase based.

Department of IT, SVCE Page 4


SENTIMENTAL ANALYSIS ON MOBILE PHONE REVIEWS

2.2.1 Document Level

In SA, the first level is considered the document level. At this level, an entire document is
considered as a whole for SA. In this treatment for conducting SA, the one making an
opinion is considered a single source or an individual entity. A different aspect of SA at
this level has been observed to be sentiment regression. To gauge the extent of positive or
negative outlook, many researchers have turned to using supervised learning to estimate
ratings of a document. In another study, researchers have proposed a linearbased
combination approach based on polarities observed in text documents. A major problem
observed while dealing at document level is at this layer not all sentences involving
expression of opinions can be deemed as subjective sentences. Hence, accuracy of results
is dependent on how closely each sentence is extracted and analysed individually. This
method, therefore, promotes the rate at which subjective sentences can be extracted for the
purpose of SA over objective sentences that can be set aside. Research in SA techniques
often place major thrust and emphasis at the sentence level.

2.2.2 Sentence Level

Research studies abound on classifying and analysing each sentence in a document or piece
of text as either objective or subjective ones. In one example, the authors propose
conducting SA on subjective sentences alone following classification. As a tool for
detecting subjective sentences, machine learning has been an asset for researchers and
scholars in this field. In one study, the authors have proposed a model based on logarithmic
probability rate and a number of root terms to form the basis of scorecard for categorization
of each subjective sentence classified. In another paper, research scholars have postulated
a model that takes into account sentiments of all terms in each sentence to formulate an
overall sentiment for a sentence that is under consideration. SA at sentiment level is not
without its own shortcomings. There may be objective sentences that may actually have
sentiments not detected. An example of such a sentence could be: ―I bought a table from
a reputed online store only to find that its legs are not stable enough.

2.2.3 Aspect Level

Aspect Level SA aims at addressing the shortcomings of document and sentence levels of
SA. Fine-grained control can be exercise with the help of SA. The target focus of aspect
level SA is to examine opinion in critically and exclusively. Aspect level SA assumes that
an opinion can only be one among positive, neutral, negative or an outright objective
sentiment expressed. If one were to consider the sentence ―Telephone call quality of Sony
phones are remarkable save and except for the quality of battery, two statements are
immediately apparent. The first being that the call quality of Sony phones is good, while
that of battery is not so much. This approach, therefore, enables turning unstructured
content into organized form of information that can be subjected to a range of subjective
and quantitative experiments. Such finer degree of SA is mostly beyond the scope of
document and sentence level analyses.

Department of IT, SVCE Page 5


SENTIMENTAL ANALYSIS ON MOBILE PHONE REVIEWS

2.3 SENTIMENT CLASSIFICATION TECHNIQUES

Sentiment classification techniques can be segregated into three categories. These are
machine learning, lexicon-based and hybrid approaches. The first of these techniques
involve popular machine learning (ML) algorithms and involves using linguistic features.
The second involves analyses through a collection of sentiment terms that are precompiled
into sentiment lexicon. This is further divided into dictionary- and corpus-based
approaches that use semantic or statistical methods to gauge the extent of polarity of
sentiment. The hybrid approach involves combining ML and lexiconbased approaches.
The following illustration aims at providing an insight into more popular algorithms used
in sentiment classification techniques.

2.3.1 Machine Learning Approach

In machine learning approach, machine learning (ML) algorithms are used almost
exclusively and extensively to conduct SA. ML algorithms are used on conjunction with
linguistic and syntactic features.

Supervised Learning

Supervised Learning involves datasets that are clearly labelled. Such datasets are share
with assorted supervised learning models [23, 24, 25, 27]. 3.1.1.1 Decision Tree Classifiers
This type of classifier extends a hierarchical breakdown of training data space in which
attribute values are used for data segregation [28]. This method is predicated or based upon
the absence or presence of one or more words and is conducted recursively until a
minimum number of records are registered with lead nodes that are used for classification.

Linear Classification

Linear classification models involve support vector machine (SVM). This is a form of
classifier that is focused on segregating and isolating direct separators between different
classes. It also involves neural system [29]. SVM is a form of supervised learning model
and works chiefly on the principle of decision boundaries setup by decision planes. A
decision plane is defined as a set of objects that are part of a range of class memberships.

Support Vector Machines Classifiers (SVM)

SVMs have been designed to fundamentally identify and isolate linear separators in search
space for the purpose of categorizing assorted classes. Ideally, text data is considered
suitable for SVM classification in view of sparse nature accorded by text. Since, only a
select number of features are irrelevant notwithstanding the tendency to be correlated and
organized into linear distinguishing buckets, text data is often considered an ideal
candidate for SVM. With the help of SVM, a non-linear decision surface can be
constructed out of original feature space. This can be achieved through mapping of data

Department of IT, SVCE Page 6


SENTIMENTAL ANALYSIS ON MOBILE PHONE REVIEWS

instances in a non-linear fashion to an inner product space where classes can be linearly
segregated using hyperplane.

Neural Network (NN)

Neural Networks are constituted of, as the term suggests, by neurons as the basic
fundamental building block. Inputs to neurons are depicted using a vector and denoting
word frequencies in a document across a line. A set of weights are considered for each
neuron to enable computation of a function of the inputs involved. For boundaries that are
nonlinear, multilayer neural networks are employed. These multiple layers are used in
conjunction with multiple pieces of linear boundaries used for approximation of enclosed
regions involving particular class. Neuron outputs generated in previous layers are used to
feed the neurons in subsequent layers. The training process, therefore, becomes
progressively complex as errors are reversing propagated across all layers . As shown by
the authors, SVM and NN can also be deployed for classifying relationships that are of
personal nature in biographical texts. In their research, relations were marked as being
positive, neutral or unknown between two individuals.

Maximum Entropy Classifier (ME)

A type of probabilistic classifier with a place among exponential class of models, it does
not rely on the assumption that components are independent. On the other hand, ME is
dependent on Principle of Maximum Entropy. It selects the model that has the largest
entropy. ME classifiers find use in applications involving dialect identification, assumption
investigation, point arrangement, etc.

2.3.2 Unsupervised Techniques

In this form of approach, classification of sentiment is achieved through comparison.


Component comparisons take place involving word lexicons that are assigned sentiment
values before use. The more popular forms of such group of techniques are hierarchical
and partial clustering.

Lexicon-based Approach

This kind of approach involves determination of polarity by employing opinion words from
sentiment dictionary and matches those with data. Such an approach marks sentiment
scores to indicate positive, negative or objective types of words. Lexicon-based approaches
are dependent on sentiment of lexicon involving a set of precompiled and known sentiment
phrases, terms and idioms. Two forms of subclassifications exist for this type of approach.
These are discussed in subsequent sections.

Dictionary-based Approach

In this approach, arrangement of words is made possible through manual approach


involving a group of instructions that are known beforehand. The conclusion set is

Department of IT, SVCE Page 7


SENTIMENTAL ANALYSIS ON MOBILE PHONE REVIEWS

generated by looking up a notable corpus – WordNet, for appropriate words and antonyms
relevant for the SA. The process is iterative and stops only when no new words are
detected. Else, subsequent iterations follow when words are progressively appended to
seed list. After the process stops, a manual appraisal is conducted to evaluate and correct
errors. However, this approach is not without its flaws as it is often unable to detect in
certain circumstances involving specific introductions or words with spaces.

Corpus-Based

This approach involves dictionaries specific to given domain. The dictionaries are
produced on the basis of seeds of opinion terms that grow out of search for related words
through use of statistical or semantic procedures.

2.4 Combination or Hybrid Method

Aside from involving individual ML approaches or Lexiconbased approach that has been
described earlier, there are select few research techniques that involve a mixture of both.
The improved Naïve Bayes and SVM algorithms find frequent mention in research studies.
To narrow the gap between positive and negative, feature selections such as unigrams and
bigrams are often used. Studies have shown that combination of ML and dictionary-based
methods can significantly improve the level of classification of sentiments.

Department of IT, SVCE Page 8


SENTIMENTAL ANALYSIS ON MOBILE PHONE REVIEWS

3. COMPUTATIONAL ENVIRONMENT

3.1 HARDWARE REQUIREMENTS:

Processor - Quad core Processor


RAM - 4GB or higher
Hard Disk Space - 160 GB
Monitor - SVGA
A Keyboard and a Mouse

3.2 SOFTWARE REQUIREMENTS:

Operating System - Windows 8 or newer, or Linux


IDE - Jupyter Notebook
Technology - Python 3.8
Frame Work - Streamlit
Tool - Anaconda3
Web Browser - Microsoft Browser or Chrome

3.3 SOFTWARE FEATURES

3.3.1 Jupyter Notebook

The Jupyter is a free, open-source, interactive web tool known as a computational


notebook, which can use to combine software code, computational output, explanatory text
and multimedia resources in a single document. Jupyter Notebook is maintained by the
people at Project Jupyter. Jupyter Notebooks are a spin-off project from the IPython
project, which used to have an IPython Notebook project itself. The Jupyter Notebook App
is a server-client application that allows editing and running notebook documents via a
web browser. The Jupyter Notebook App can be executed on a local desktop requiring no
internet access (as described in this document) or can be installed on a remote server and
accessed through the internet.

In addition to displaying/editing/running notebook documents, the Jupyter Notebook App


has a “Dashboard”, a “control panel” showing local files and allowing them to open
notebook documents or shutting down their kernels.
Notebook documents contain the inputs and outputs of an interactive session as well as
additional text that accompanies the code but is not meant for execution. In this way,
notebook files can serve as a complete computational record of a session, interleaving
executable code with explanatory text, mathematics, and rich representations of resulting
objects. These documents are internally JSON files and are saved with the. ipynb
extension.

Department of IT, SVCE Page 9


SENTIMENTAL ANALYSIS ON MOBILE PHONE REVIEWS

Installing Jupyter Notebook using pip:

PIP is a package management system used to install and manage software packages/libraries
written in Python. These files are stored in a large “on-line repository” termed as Python
Package Index (PyPI).
pip uses PyPI as the default source for packages and their dependencies.
To install Jupyter using pip, we need to first check if pip is updated in our system. Use the
following command to update pip:
python -m pip install --upgrade pip
After updating the pip version, follow the instructions provided below to install Jupyter:
• Command to install Jupyter:
python -m pip install jupyter
Launching Jupyter:
Use the following command to launch Jupyter using command-line: jupyter notebook

3.3.2 Python

Python is an interpreted, high-level, general-purpose programming language. Python's


design philosophy emphasizes code readability with its notable use of significant
whitespace. Its language constructs and object-oriented approach aim to help programmers
write clear, logical code for small and large-scale projects. Python interpreters are available
for many operating systems. Python's large standard library, commonly cited as one of its
greatest strengths, provides tools suited to many tasks. For Internet-facing applications,
many standard formats and protocols such as MIME and HTTP are supported. It includes
modules for creating graphical user interfaces, connecting to relational databases,
generating pseudorandom numbers, arithmetic with arbitraryprecision decimals,
manipulating regular expressions, and unit testing. Most Python implementations include
a read–eval–print loop (REPL), permitting them to function as a command line interpreter
for which the user enters statements sequentially and receives results immediately. Other
shells, including IDLE and IPython, add further abilities such as improved auto-
completion, session state retention and syntax highlighting.

HISTORY

Python was conceived in the late 1980s by Guido van Rossum at Centrum Wiskunde &
Informatica (CWI) in the Netherlands as a successor to the ABC language (itself inspired
by SETL), capable of exception handling and interfacing with the Amoeba operating
system. Its implementation began in December 1989. Van Rossum shouldered sole
responsibility for the project, as the lead developer, until July 12, 2018, when he announced
his "permanent vacation" from his responsibilities as Python's Benevolent Dictator for
Life, a title the Python community bestowed upon him to reflect his long-term commitment

Department of IT, SVCE Page 10


SENTIMENTAL ANALYSIS ON MOBILE PHONE REVIEWS

as the project's chief decision-maker. He now shares his leadership as a member of a five-
person steering council. In January, 2019, active Python core developers elected Brett Cannon,
Nick Coghlan, Barry Warsaw, Carol Willing and Van Rossum to a five-member "Steering
Council" to lead the project. Python 2.0 was released on 16 October 2000 with many major new
features, including a cycle-detecting garbage collector and support for Unicode. Python 3.0 was
released on 3 December 2008. It was a major revision of the language that is not completely
backward-compatible. Many of its major features were backported to Python 2.6.x and 2.7.x
version series. Releases of Python 3 include the 2to3 utility, which automates (at least partially)
the translation of Python 2 code to Python 3.Python 2.7's end-of-life date was initially set at
2015 then postponed to 2020 out of concern that a large body of existing code could not easily
be forward-ported to Python 3.

How to Install Python on Windows


There are three installation methods on Windows:

1. The Microsoft Store


2. The full installer
3. Windows Subsystem for Linux

Step 1: Download the Full Installer


Follow these steps to download the full installer:

1. Open a browser window and navigate to the Python.org


2. Under the “Python Releases for Windows” heading, click the link for the Latest Python 3
Release - Python 3.x.x. As of this writing, the latest version was Python 3.8.4.
3. Scroll to the bottom and select either Windows x86-64 executable installer for 64-
bit or Windows x86 executable installer for 32-bit.

When the installer is finished downloading, move on to the next step.

Step 2: Run the Installer


Once you’ve chosen and downloaded an installer, run it by double-clicking on the downloaded
file. A dialog box like the one below will appear:

Department of IT, SVCE Page 11


SENTIMENTAL ANALYSIS ON MOBILE PHONE REVIEWS

There are four things to notice about this dialog box:

1. The default install path is in the AppData/directory of the current Windows user.
2. The Customize installation button can be used to customize the installation location and
which additional features get installed, including pip and IDLE.
3. The Install launcher for all users (recommended) checkbox is checked default. This
means every user on the machine will have access to the py.exe launcher. You can
uncheck this box to restrict Python to the current Windows user.
4. The Add Python 3.8 to PATH checkbox is unchecked by default. There are several
reasons that you might not want Python on PATH, so make sure you understand the
implications before you check this box.

The full installer gives you total control over the installation process. Congratulations—

you now have the latest version of Python 3 on your Windows machine!

3.3.3 Anaconda

Anaconda is a free and open-source distribution of the programming languages Python


and R (check out these Python online courses and R programming courses). The
distribution comes with the Python interpreter and various packages related to machine
learning and data science. Basically, the idea behind Anaconda is to make it easy for people
interested in those fields to install all (or most) of the packages needed with a single
installation. An open-source package and environment management system called Conda,
which makes it easy to install/update packages

Department of IT, SVCE Page 12


SENTIMENTAL ANALYSIS ON MOBILE PHONE REVIEWS

Conda is an open-source package management system and environment management


system that runs on Windows, macOS and Linux. Conda quickly
installs, runs and updates packages and their dependencies. Conda easily creates, saves,
loads and switches between environments on your local computer. It was created for
Python programs, but it can package and distribute software for any language. Conda is a
package manager for any software (installation, upgrade and uninstallation). It works with
virtual system environments. Conda is a packaging tool and installer that aims to do more
than what pip does: handle library dependencies outside of the Python packages as well as
the Python packages themselves.Conda also creates a virtual environment. Conda is
written entirely in Python which makes it easier to use in Python virtual environments.
Furthermore, we can use Conda for C libraries, R packages, Java packages and so on. It
installs binaries. The conda build tool builds packages from source and conda install
installs things from built conda packages. Conda is the package manager of Anaconda, the
Python distribution provided by Continuum Analytics. Anaconda is a set of binaries that
includes Scipy, Numpy, Pandas along with all their dependencies. Machine learning
libraries like TensorFlow, scikit-learn and Theano. Data science libraries like pandas,
NumPy and Dask. Visualization libraries like Bokeh, Datashader, matplotlib and
Holoviews. Jupyter Notebook, a shareable notebook that combines live code,
visualizations and text.

Installing on Windows
1. Download the installer:
o Miniconda installer for Windows.
o Anaconda installer for Windows.
2. Verify your installer hashes.
3. Double-click the .exe file.
4. Follow the instructions on the screen.
If you are unsure about any setting, accept the defaults. You can change them later.
When installation is finished, from the start menu, open the Anaconda Prompt.
5. Test your installation. In your terminal window or Anaconda Prompt, run the
command conda list . A list of installed packages appears if it has been installed
correctly.

Department of IT, SVCE Page 13


SENTIMENTAL ANALYSIS ON MOBILE PHONE REVIEWS

3.3.4 STREAMLIT:

Streamlit is a tool to create web-based frontends with a focus for the Machine Learning
scientists or engineer. It’s a target for Machine Learning scientists to quickly put out a
Python-based web GUI frontend. It is an opensource Python library that makes it easy to
create and share beautiful, custom web apps for machine learning and data science.

Installing on Windows
1. Make sure that you have Python 3.6 - Python 3.8 installed.
2. Install Streamlit using PIP and run the ‘hello world’ app:
3. pip install streamlit
4. streamlit hello
5. That’s it! In the next few seconds, the sample app will open in a new tab in your default
browser.
Still with us? Great! Now make your own app in just 3 more steps:

1. Open a new Python file, import Streamlit, and write some code
2. Run the file with:
streamlit run [filename]
3. When you’re ready, click ‘Deploy’ from the Streamlit menu to share your app with the
world!
Now that you’re set up, let’s dive into more of how Streamlit works and how to build great apps.

Department of IT, SVCE Page 14


SENTIMENTAL ANALYSIS ON MOBILE PHONE REVIEWS

4. FEASIBILITY STUDY

The feasibility of the project is analyzed in this phase and business proposal is put forth
with a very general plan for the project and some cost estimates. During system analysis
the feasibility study of the proposed system is to be carried out. This is to ensure that the
proposed system is not a burden to the company. For feasibility analysis, some
understanding of the major requirements for the system is essential.
4.1 TYPES OF FEASIBILITY
Three key considerations involved in the feasibility analysis are
• ECONOMICAL FEASIBILITY
• TECHNICAL FEASIBILITY
• SOCIAL FEASIBILITY

4.1.1 ECONOMICAL FEASIBILITY


This study is carried out to check the economic impact that the system will have on
the organization. The amount of fund that the company can pour into the research and
development of the system is limited. The expenditures must be justified. Thus the
developed system as well within the budget and this was achieved because most of the
technologies used are freely available. Only the customized products has to be purchased.

4.1.2 TECHNICAL FEASIBILITY


This study is carried out to check the technical feasibility, that is, the technical
requirements of the system. Any system developed must not have a high demand on the
available technical resources. This will lead to high demands on the available technical
resources. This will also lead to high demands being placed on the client. The developed
system must have a modest requirement, as only minimal or null changes are required for
implementing this system.

4.1.3 SOCIAL FEASIBILITY


The aspect of study is to check the level of acceptance of the system by the user.
This includes the process of training the user to use the system efficiently. The user must
not feel threatened by the system, instead must accept it as a necessity. The level of
acceptance by the users solely depends on the methods that are employed to educate the

Department of IT, SVCE Page 15


SENTIMENTAL ANALYSIS ON MOBILE PHONE REVIEWS

user about the system and to make him familiar with it. His level of confidence must be
raised so that he is also able to make some constructive criticism, which is welcomed, as
he is the final user of the system.

Department of IT, SVCE Page 16


SENTIMENTAL ANALYSIS ON MOBILE PHONE REVIEWS

5. SYSTEM ANALYSIS
5.1 EXISTING SYSTEM
Due to the increases in demand for e-commerce with people preferring online
purchasing of goods and products, there is a vast amount information being shared.
The e-commerce websites are loaded with millions of reviews. The customer finds
difficult to precisely find the review for a particular feature of a product that he intends
to buy. There is a mixture of positive and negative reviews thereby making it difficult
for customer to predict whether product is genuine or not. Also, the reviews suffer
from spammed reviews from unauthenticated users.
DISADVANTAGES
➢ There are thousands of reviews at e-commerce websites if those reviews are analyzed
manually, it may take decades to analyze.
➢ Difficult to find the sentiment of a review whether it is a positive or negative.

5.2 PROPOSED SYSTEM


The proposed system is to build a prediction model for analyzing customer reviews to
identify sentiment from reviews. This predicted data can helpful for customer to decide
whether to purchase a product or not. The prediction model is done by implementing simple
logistic regression method as the problem type is supervised learning-regression problem.
This predictive modelling approach helps the customers to predict whether the product
quality is good or bad based on reviews collected and visualize the top products.
ADVANTAGES
➢ It helps to maintain product quality and to meet customer expectations. This helps
the organization to increase sales
➢ Reduce customer churn
➢ Positive and negative reviews can be classified.
➢ It help to detect changes in the overall opinion towards a particular brand

Department of IT, SVCE Page 17


SENTIMENTAL ANALYSIS ON MOBILE PHONE REVIEWS

6. SYSTEM DESIGN

System Design

The Systems design is the process of defining the architecture, components, modules,
interfaces, and data for a system to satisfy specified requirements. One could see it as the
application of systems theory to product development. There is some overlap with the
disciplines of systems analysis, systems architecture and systems engineering. If the
broader topic of product development "blends the perspective of marketing, design, and
manufacturing into a single approach to product development, then design is the act of
taking the marketing information and creating the design of the product to be
manufactured. Systems design is therefore the process of defining and developing systems
to satisfy specified requirements of the user

Physical Design

Physical design relates to the actual input and output processes of the system. This is laid
down in terms of how data is input into a system, how it is verified / authenticated, how it
is processed, and how it is displayed as output. In Physical design, following requirements
about the system are decided
. ➢ Input requirements
➢ Output requirements
➢ Storage requirements
➢ Processing Requirements
➢ System control and backup or recovery
The physical portion of systems design can generally be broken down into three sub-tasks:
➢ User Interface
➢ Design Data Design
➢ Process Design
User Interface Design is concerned with how users add information to the system and with
how the system presents information back to them. Data Design is concerned with how the
data is represented and stored within the system. Finally, Process Design is concerned with
how data moves through the system, and with how and where it is validated, secured and/or
transformed as it flows into, through and out of the system.

6. Design using UML Diagrams


The Unified Modeling Language allows the software engineer to express an analysis model using
the modeling notation that is governed by a set of syntactic semantic and pragmatic rules.

Department of IT, SVCE Page 18


SENTIMENTAL ANALYSIS ON MOBILE PHONE REVIEWS

6.1 UML Diagrams:


The Unified Modeling Language (UML) is a standard language for specifying,
visualizing, constructing, and documenting the artifacts of software systems, as well as for
business modeling and other non-software systems. The UML represents a collection of best
engineering practices that have proven successful in the modeling of large and complex
systems. The UML is a very important part of developing object-oriented software and the
software development process. The UML uses mostly graphical notations to express the design
of software projects. In our project nine basic UML diagrams have been explained

 Class Diagram

 Use Case Diagram


 Activity Diagram
 Component Diagram
 Deployment Diagram

6.1.1 Class Diagram


UML class diagrams model static class relationships that represent the fundamental
architecture of the system. Note that these diagrams describe the relationships between classes,
not those between specific objects instantiated from those classes. Thus the diagram applies to
all the objects in the system. A class diagram consists of the following features:
 Classes: These titled boxes represent the classes in the system and contain information
about the name of the class, fields, methods and access specifies. Abstract roles of the
class in the system can also be indicated.
 Interfaces: These titled boxes represent interfaces in the system and contain
information about the name of the interface and its methods.
 Relationship Lines that model the relationships between classes and interfaces in the
system.
❖ Generalization
• Inheritance: a solid line with a solid arrowhead that points from a sub-class to
a super class or from a sub-interface to its super-interface.

Department of IT, SVCE Page 19


SENTIMENTAL ANALYSIS ON MOBILE PHONE REVIEWS

• Implementation: a dotted line with a solid arrowhead that points from a class
to the interface that it implements.

 Association: a solid line with an open arrowhead that represents a "has a"
relationship. The arrow points from the containing to the contained class.
Associations can be one of the following two types or not specified.

• Aggregation: Represented by an association line with a hollow diamond at the


tail end. An aggregation models the notion that one object uses another object
without "owning" it and thus is not responsible for its creation or destruction.

 Dependency: a dotted line with an open arrowhead that shows one entity depends
on the behavior of another entity.

Class Diagram

Figure 6.1.1: Class Diagram

Department of IT, SVCE Page 20


SENTIMENTAL ANALYSIS ON MOBILE PHONE REVIEWS

6.1.2 Use case Diagram


A use case diagram in the Unified Modelling Language (UML) is a type of behavioural
diagram defined by and created from a Use-case analysis. Its purpose is to present a graphical
overview of the functionality provided by a system in terms of actors, their goals (represented
as use cases), and any dependencies between those use cases.

A use case is a methodology used in system analysis to identify, clarify, and organize system
requirements. The use case is made up of a set of possible sequences of interactions between
systems and users in a particular environment and related to a particular goal. It consists of a
group of elements (for example, classes and interfaces) that can be used together in a way that
will have an effect larger than the sum of the separate elements combined.

The main purpose of a use case diagram is to show what system functions are performed
for which actor. Roles of the actors in the system can be depicted.

Relationships

Generalization

In the third form of relationship among use cases, a generalization/specialization


relationship exists. The notation is a solid line ending in a hollow triangle drawn from the
specialized to the more general use case (following the standard generalization notation).

Associations

Associations between actors and use cases are indicated in use case diagrams by solid lines.
Associations are modelled as lines connecting use cases and actors to one another, with an
optional arrowhead on one end of the line.

Steps to draw use cases

1. Identifying actor
2. Identifying use cases
3. Review your use case for completeness

Department of IT, SVCE Page 21


SENTIMENTAL ANALYSIS ON MOBILE PHONE REVIEWS

Use case diagram

Figure 6.1.2: Use case diagram

6.1.3 Activity Diagram


Activity diagram is another important diagram in UML to describe dynamic aspects of
the system. Activity diagram is basically a flow chart to represent the flow form one activity
to another activity. The activity can be described as an operation of the system. So the control
flow is drawn from one operation to another. This flow can be sequential, branched or
concurrent. Activity diagrams deals with all type of flow control by using different elements
like fork, join etc.

How to draw Activity Diagram


Activity diagrams are mainly used as a flow chart consists of activities performed by
the system. But activity diagram are not exactly a flow chart as they have some additional
capabilities. These additional capabilities include branching, parallel flow, swim lane etc.

Department of IT, SVCE Page 22


SENTIMENTAL ANALYSIS ON MOBILE PHONE REVIEWS

Before drawing an activity diagram, we must have a clear understanding about the
elements used in activity diagram. An activity is a function performed by the system. After
identifying the activities, we need to understand how they are associated with constraints and
conditions. So before drawing an activity diagram we should identify the following elements:
• Activities
• Association
• Conditions
• Constraints

The following are the basic notational elements that can be used to make up a diagram:

Initial state
An initial state represents a default vertex that is the source for a single transition to the default
state of a composite state. There can be at most one initial vertex in a region. The outgoing
transition from the initial vertex may have behaviour, but not a trigger or guard. It is
represented by Filled circle, pointing to the initial state

Final state
A special kind of state that signifying the enclosing region is completed. If the enclosing
region is directly contained in a state machine and all other regions in the state machine also are
completed, then it means that the entire state machine is completed. It is represented by Hollow
circle containing a smaller filled circle, indicating the final state.

Rounded rectangle
It denotes a state. Top of the rectangle contains a name of the state. Can contain a horizontal
line in the middle, below which the activities that are done in that state are indicated.

Department of IT, SVCE Page 23


SENTIMENTAL ANALYSIS ON MOBILE PHONE REVIEWS

Arrow
It denotes transition. The name of the event (if any) causing this transition labels the arrow body.

How to draw Activity Diagram

Draw the activity flow of a system.

Describe the sequence from one activity to another.

Describe the parallel, branched and concurrent flow of the system.

Activity Diagram

Department of IT, SVCE Page 24


SENTIMENTAL ANALYSIS ON MOBILE PHONE REVIEWS

6.1.4 Component Diagram

A component diagram is used to break down a large object-oriented system into the
smaller components, so as to make them more manageable. It models the physical view of a
system such as executables, files, libraries, etc. that resides within the node.

It visualizes the relationships as well as the organization between the components present
in the system. It helps in forming an executable system. A component is a single unit of the
system, which is replaceable and executable. The implementation details of a component are
hidden, and it necessitates an interface to execute a function. It is like a black box whose behavior
is explained by the provided and required interfaces. The purpose of a component diagram is to
show the relationship between different components in a system.

How to create component diagrams?

The steps below outline the major steps to take in creating a UML Component Diagram.
1) Decide on the purpose of the diagram
2) Add components to the diagram, grouping them within other components if appropriate
3) Add other elements to the diagram, such as classes, objects an interfaces.
4) Add the dependencies between the elements of the diagram.

Component Diagram

Figure:6.1.4: Component Diagram

Department of IT, SVCE Page 25


SENTIMENTAL ANALYSIS ON MOBILE PHONE REVIEWS

6.1.5 Deployment Diagram

Deployment diagrams are used to visualize the topology of the physical components of a system,
where the software components are deployed.
Deployment diagrams are used to describe the static deployment view of a system. Deployment
diagrams consist of nodes and their relationships. A deployment diagram shows the
configuration of run time processing nodes and the components that live on them. Deployment
diagrams is a kind of structure diagram used in modelling the physical aspects of an object-

oriented system. The purpose of deployment diagrams can be described as –


• Visualize the hardware topology of a system.
• Describe the hardware components used to deploy software components.
• Describe the runtime processing nodes.

Deployment Diagram

Figure 6.1.5: Deployment Diagram

Department of IT, SVCE Page 26


SENTIMENTAL ANALYSIS ON MOBILE PHONE REVIEWS

6.2 .1 Data Flow Diagram

A data flow diagram (DFD) is a graphical representation of the "flow" of data through
an information system, modeling its process aspects. A DFD is often used as a preliminary
step to create an overview of the system without going into great detail, which can later be
elaborated.

Admin:

Figure 6.2.1: Data Flow Diagram

Department of IT, SVCE Page 27


SENTIMENTAL ANALYSIS ON MOBILE PHONE REVIEWS

6.2.2 Control Flow Diagram


A control flow diagram helps us understand the detail of a process. It shows us where
control starts and ends and where it may branch off in another direction, given certain situations.
A control flow diagram can consist of a subdivision to show sequential steps, with if-then-else
conditions, repetition, and/or case conditions. Suitably annotated geometrical figures are used
to represent operations, data, or equipment, and arrows are used to indicate the sequential flow
from one to another.

Figure 6.2.2: Control Flow Diagram

Department of IT, SVCE Page 28


SENTIMENTAL ANALYSIS ON MOBILE PHONE REVIEWS

7. SYSTEM IMPLEMENTATION
7.1 MODULES
Modules:
The system can be divided into three major modules which can be sub tasked
further. The modules are as follows:

1. Data Pre-processing and Bag of Words (BoW) module


2. Predictive modelling module
3. Exploratory Data visualizations

7.1.1 Data Pre-Processing and Bag of Words (BoW)module:


Data preprocessing is a data mining technique that involves transforming raw data into an
understandable and readable efficient format. This is a phase wherein all kinds of noises
of text data need to be filtered out. If there is any kind of markers and extra white spaces,
they need to be identified and eliminated before jumping into the processing and analytics
phase. If there is any HTML tags or punctuation marks, numbers, etc. in the text document,
they need to be meticulously identified and erased in order to arrive at highly organized
and optimized document. There are other cleaning operations to be accomplished through
a number of reviews and refinements. The raw data is pre-processed to improve quality.

7.1.2 Bag of Words (BoW) Model:

The Bag of Words (BoW) model learns a vocabulary from all of the documents, and then
models each document by counting the number of times each word appears. In this model,
a text (such as a sentence or a document) is represented as the bag (multiset) of its words,
disregarding grammar and even word order but keeping multiplicity. The BoW model is
commonly used in methods of document classification, where the (frequency of)
occurrence of each word is used as a feature for training a classifier.

7.1.3 Predictive Modelling Module:


In this module, the data which is obtained from the preprocessing and further divided
for training and testing purposes (60 % of data for training and rest of the data for testing),
is fit to the algorithm and the model is pickled, so that it can be hosted on the server. This
helps in improving efficiency and reducing computing power.

Department of IT, SVCE Page 29


SENTIMENTAL ANALYSIS ON MOBILE PHONE REVIEWS

In this module, we used Logistic Regression algorithm to build a model.


7.2 Algorithms
7.2.1 Logistic Regression Algorithm:

● We build the prediction model based on logistic regression algorithm as here the problem
is supervised learning classification problem.
● It is a mathematical model used in statistics to estimate the probability of an event
occurring having been given some previous data. This is used when the dependent
variable (Target) is categorical. The outcome can be either yes or No, 0 or 1, True or
False, etc. but instead of giving the exact value as 0 and 1, it gives the probabilistic values
which lie between 0 and 1.
● Logistic regression generally explains the relationship between one dependent binary
variable and one or more nominal, ordinal, interval or ratio-level independent variables.
● In this algorithm we use sigmoid function in order to map predicted values to
probabilities. This function maps any prediction into probabilities which values between
0 and 1.
● The performance of the logistic regression algorithm is evaluated by confusion matrix
and confusion report.
● In any classification problem, the target variable (or output), y, can take only discrete
values for a given set of features (or inputs), X.
● In this algorithm, in order to map predicted values to probabilities, we use the Sigmoid
function.

Logistic Regression flow:

Figure 7.2.1: Logistic Regression Flow

Department of IT, SVCE Page 30


SENTIMENTAL ANALYSIS ON MOBILE PHONE REVIEWS

Logistic regression model

Figure 7.2.2: Logistic Regression model graph

The plot shows a model of the relationship between a continuous predictor and the
probability of an event or outcome. The linear model clearly does not fit if this is the true
relationship between X and the probability. In order to model this relationship directly, you must
use a nonlinear function. The plot displays one such function. The S-shape of the function is
known as sigmoid.

Logit transformation

A logistic regression model applies a logit transformation to the probabilities. The logit is the
natural log of the odds.

Department of IT, SVCE Page 31


SENTIMENTAL ANALYSIS ON MOBILE PHONE REVIEWS

P is the probability of the event


In is the natural log (to the base e)
Logit is also denoted as Ln

So, the final logistic regression model formula is

Logistic Regression Graph

Figure 7.2.3: Logistic Regression Graph

Here in the diagram if the value of z goes to positive infinity then the predicted value of y will

Department of IT, SVCE Page 32


SENTIMENTAL ANALYSIS ON MOBILE PHONE REVIEWS

become 1 and if it goes to negative infinity then the predicted value of y will become 0.

Figure 7.2.4: Logistic Regression Value Graph

In our project, a confusion matrix is applied to logistic regression algorithm to check the
performance.

Confusion matrix

A confusion matrix is a summary of prediction results on a classification problem. Calculating


a confusion matrix can give you a better idea of what your classification model is getting right
and what types of errors it is making. The number of correct and incorrect predictions are
summarized with count values and broken down by each class. This is the key to the confusion
matrix.The confusion matrix shows the ways in which your classification model is confused
when it makes predictions. It gives you insight not only into the errors being made by your
classifier but more importantly the types of errors that are being made. It is this breakdown that
overcomes the limitation of using classification accuracy alone.

How to Calculate a Confusion Matrix

Below is the process for calculating a confusion Matrix.


1) You need a test dataset or a validation dataset with expected outcome values.
2) Make a prediction for each row in your test dataset.
3) From the expected outcomes and predictions count:

Department of IT, SVCE Page 33


SENTIMENTAL ANALYSIS ON MOBILE PHONE REVIEWS

a. The number of correct predictions for each class.


b. The number of incorrect predictions for each class, organized by the class that was
predicted.
These numbers are then organized into a table, or a matrix as follows:

● Expected down the side: Each row of the matrix corresponds to a predicted class.
● Predicted across the top: Each column of the matrix corresponds to an actual class.

Confusion matrix is a performance measurement for machine learning algorithms where


output can be two or more classes. It is a table with 4 different combinations of predicted
and actual values.

Figure 7.2.5: Confusion Matrix

True Positive:
Interpretation: You predicted positive and it’s true. You predicted that a woman is pregnant and
she actually is.
True Negative:

Interpretation: You predicted negative and it’s true. You predicted that a man is not pregnant and
he actually is not.
False Positive: (Type 1 Error)
Interpretation: You predicted positive and it’s false. You predicted that a man is pregnant but
he actually is not.
False Negative: (Type 2 Error)
Interpretation: You predicted negative and it’s false. You predicted that a woman is not pregnant

Department of IT, SVCE Page 34


SENTIMENTAL ANALYSIS ON MOBILE PHONE REVIEWS

but she actually is.


We describe predicted values as Positive and Negative and actual values as True and False.
Calculation of confusion matrix:
Recall:
The above equation can be explained by saying, from all the positive classes, how many we
predicted correctly. Recall should be as high as possible.

Precision:
The above equation can be explained by saying, from all the classes we have predicted as positive,
how many are actually positive. Precision should be as high as possible.

Accuracy:
From all the classes (positive and negative), how many of them we have predicted correctly.
The accuracy should be as high as possible.
F-Measure:
It is difficult to compare two models with low precision and high recall or vice versa. So to
make them comparable, we use F-Score. F-score helps to measure Recall and Precision at the
same time. It uses Harmonic Mean in place of Arithmetic Mean by punishing the extreme values
more.

Department of IT, SVCE Page 35


SENTIMENTAL ANALYSIS ON MOBILE PHONE REVIEWS

Figure 7.2.6: Confusion matrix with respective formulae


We also implement classification report for the logistic regression algorithm that we applied in
this project.
Classification Report:

A Classification report is used to measure the quality of predictions from a classification


algorithm. How many predictions are True and how many are False? More specifically, True
Positives, False Positives, True negatives and False Negatives are used to predict the metrics of
a classification report.

The report shows the main classification metrics precision, recall and f1-score on a per-class
basis. The metrics are calculated by using true and false positives, true and false negatives.

Positive and negative in this case are generic names for the predicted classes. There are four
ways to check if the predictions are right or wrong:
Positive and negative in this case are generic names for the predicted classes. There are four
ways to check if the predictions are right or wrong:

1. TN / True Negative: when a case was negative and predicted negative


2. TP / True Positive: when a case was positive and predicted positive
3. FN / False Negative: when a case was positive but predicted negative
4. FP / False Positive: when a case was negative but predicted positive

Here precision is what percentage of your predictions were correct, recall is what percentage of
positive cases did you catch and F1-Score is what percent of positive predictions were correct

Department of IT, SVCE Page 36


SENTIMENTAL ANALYSIS ON MOBILE PHONE REVIEWS

EXPLOATORY DATA VISUALIZATIONS


The visualization figures are plotted in pie-chart and Bar Graphs based on the Review
analysis.
➢ Distribution of Rating.
➢ Number of reviews for top 20 brands and top 50 brands.
➢ Number of reviews for top 20 products and top 50 products.
➢ Distribution of Review Length.
➢ Polarity Distribution

WORD CLOUD
Created a word cloud for positive and negative sentiment reviews of selected brand of mobile
phones.
Word clouds also known as text clouds work in a simple way that more a specific word appears
in a source of textual data, the bigger and bolder it appears in the word cloud. A word cloud is a
collection, or cluster, of words depicted in different sizes. The bigger and bolder the word
appears, the more often it’s mentioned within a given text and the more important it is.

Department of IT, SVCE Page 37


SENTIMENTAL ANALYSIS ON MOBILE PHONE REVIEWS

8.TESTING

The purpose of testing is to discover errors. Testing is the process of trying to discover
every conceivable fault or weakness in a work product. It provides a way to check the
functionality of components, sub-assemblies, assemblies and/or a finished product. It is the
process of exercising software with the intent of ensuring that the Software system meets its
requirements and user expectations and does not fail in an unacceptable manner. There are
various types of tests. Each test addresses a specific testing requirement

TYPES OF TESTS

8.1 Unit testing


Unit testing involves the design of test cases that validate that the internal program
logic is functioning properly, and that program inputs produce valid outputs. All decision
branches and internal code flow should be validated. It is the testing of individual software
units of the application. It is done after the completion of an individual unit before
integration. This is a structural testing, that relies on knowledge of its construction and is
invasive. Unit tests perform basic tests at component level and test a specific business
process, application, and/or system configuration. Unit tests ensure that each unique path
of a business process performs accurately to the documented specifications and contains
clearly defined inputs and expected results.

8.2 Integration testing


Integration tests are designed to test integrated software components to determine
if they actually run as one program. Testing is event driven and is more concerned with
the basic outcome of screens or fields. Integration tests demonstrate that although the
components were individually satisfaction, as shown by successfully unit testing, the
combination of components is correct and consistent. Integration testing is specifically
aimed at exposing the problems that arise from the combination of components.

Department of IT, SVCE Page 39


SENTIMENTAL ANALYSIS ON MOBILE PHONE REVIEWS

8.3 Functional test


Functional tests provide systematic demonstrations that functions tested are available
as specified by the business and technical requirements, system documentation, and user
manuals.
Functional testing is centered on the following items:
Valid Input : identified classes of valid input must be accepted.
Invalid Input : identified classes of invalid input must be rejected.
Functions : identified functions must be exercised.
Output : identified classes of application outputs must be exercised.
Systems/Procedures: interfacing systems or procedures must be invoked.
Organization and preparation of functional tests is focused on requirements, key
functions, or special test cases. In addition, systematic coverage pertaining to identify
Business process flows data fields, predefined processes, and successive processes must
be considered for testing. Before functional testing is complete, additional tests are
identified and the effective value of current tests is determined.

8.4 System Testing


System testing ensures that the entire integrated software system meets
requirements. It tests a configuration to ensure known and predictable results. An example
of system testing is the configuration-oriented system integration test. System testing is
based on process descriptions and flows, emphasizing pre-driven process links and
integration points.

8.4.1 White Box Testing


White Box Testing is a testing in which in which the software tester has knowledge
of the inner workings, structure and language of the software, or at least its purpose. It is
purpose. It is used to test areas that cannot be reached from a black box level.

8.4.2 Black Box Testing


Black Box Testing is testing the software without any knowledge of the inner
workings, structure or language of the module being tested. Black box tests, as most other
kinds of tests, must be written from a definitive source document, such as specification or

Department of IT, SVCE Page 40


SENTIMENTAL ANALYSIS ON MOBILE PHONE REVIEWS

requirements document, such as specification or requirements document. It is a testing in


which the software under test is treated as a black box. You cannot “see” into it. The test
provides inputs and responds to outputs without considering how the software works.
Unit Testing
Unit testing is usually conducted as part of a combined code and unit test phase of the software
lifecycle, although it is not uncommon for coding and unit testing to be conducted as two distinct
phases.

Test Strategy and Approach


Field testing will be performed manually and functional tests will be written in detail.
Test Objectives
• All field entries must work properly.
• Pages must be activated from the identified link.
• The entry screen, messages and responses must not be delayed.

Features to be tested
• Verify that the entries are of the correct format.
• No duplicate entries should be allowed.
• All links should take the user to the correct page.

8.5 Acceptance Testing


User Acceptance Testing is a critical phase of any project and requires significant
participation by the end user. It also ensures that the system meets the functional requirements.
Test Results: All the test cases mentioned above passed successfully. No defects encountered.

Department of IT, SVCE Page 41


SENTIMENTAL ANALYSIS ON MOBILE PHONE REVIEWS

9.SAMPLE SOURCE CODE


Sentiment.ipynb

(JUPYTER NOTEBOOK)

#imported modules
import pandas as pd
import numpy as np
import nltk
import future
import matplotlib.pyplot as plt
%matplotlib inline
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn import metrics
from sklearn.metrics import roc_auc_score, accuracy_score
from sklearn.preprocessing import label_binarize
from sklearn.linear_model import LogisticRegression
from sklearn.pipeline import Pipeline
from wordcloud import WordCloud
import seaborn as sns
from nltk.corpus import stopwords
from nltk.stem import WordNetLemmatizer
from nltk.tokenize import word_tokenize
from bs4 import BeautifulSoup
import re
import nltk
import warnings
warnings.filterwarnings("ignore", category=UserWarning, module='bs4')
# Load csv file
df = pd.read_csv('Amazon_Unlocked_Mobile.csv')

Department of IT, SVCE Page 42


SENTIMENTAL ANALYSIS ON MOBILE PHONE REVIEWS

df.head()
#description
print("\nTotal number of reviews: ",len(df))
print("\nTotal number of brands: ", len(list(set(df['Brand Name']))))
print("\nTotal number of unique products: ", len(list(set(df['Product Name']))))
#labelled data
def label_data():
rows = pd.read_csv('Amazon_Unlocked_Mobile.csv', header=0, index_col=False,
delimiter=',')
labels = []
for cell in rows['Rating']:
if cell >= 4:
labels.append('2') #Good
elif cell == 3:
labels.append('1') #Neutral
else:
labels.append('0') #Poor
rows['Label'] = labels
del rows['Review Votes']
return rows
def clean_data(data):
#replace blank values in all the cells with 'nan'
df.replace('',np.nan,inplace=True)
#delete all the rows which contain at least one cell with nan value
df.dropna(axis=0, how='any', inplace=True)
#save output csv file
df.to_csv('labelled_dataset.csv', index=False)
return data
clean_data(df)
df = pd.read_csv('labelled_dataset.csv')

Department of IT, SVCE Page 43


SENTIMENTAL ANALYSIS ON MOBILE PHONE REVIEWS

df.head()
l = df["Rating"].values
print(list(l).count(0))
print(list(l).count(1))
print(list(l).count(2))
print(list(l).count(3))
print(list(l).count(4))
print(list(l).count(5))
df3
=pd.DataFrame([["Positive",230674],["Neutral",26058],["Negative",77603]],columns=
["Polarity","Frequency"])
df = df.sample(frac=0.1, random_state=0) #uncomment to use full set of data
# Drop missing values
df.dropna(inplace=True)
# Remove any 'neutral' ratings equal to 3
df = df[df['Rating'] != 3]
# Encode 4s and 5s as 1 (positive sentiment) and 1s and 2s as 0 (negative sentiment)
df['Sentiment'] = np.where(df['Rating'] > 3, 1, 0)
df.head()
def cleanText(raw_text, remove_stopwords=False, stemming=False, split_text=False):
'''
Convert a raw review to a cleaned review
'''
text = BeautifulSoup(raw_text, 'lxml').get_text() #remove html
letters_only = re.sub("[^a-zA-Z]", " ", text) # remove non-character
words = letters_only.lower().split() # convert to lower case
if remove_stopwords: # remove stopword
stops = set(stopwords.words("english"))
words = [w for w in words if not w in stops]

Department of IT, SVCE Page 44


SENTIMENTAL ANALYSIS ON MOBILE PHONE REVIEWS

if stemming==True: # stemming
# stemmer = PorterStemmer()
stemmer = SnowballStemmer('english')
words = [stemmer.stem(w) for w in words]
if split_text==True: # split text
return (words)
return( " ".join(words))
# Split data into training set and validation
X_train, X_test, y_train, y_test = train_test_split(df['Reviews'], df['Sentiment'], \
test_size=0.1, random_state=0)
print('Load %d training examples and %d validation examples. \n'
%(X_train.shape[0],X_test.shape[0]))
print('Show a review in the training set : \n', X_train.iloc[10])
# Preprocess text data in training set and validation set
X_train_cleaned = []
X_test_cleaned = []
for d in X_train:
X_train_cleaned.append(cleanText(d))
print('Show a cleaned review in the training set : \n', X_train_cleaned[11])
for d in X_test:
X_test_cleaned.append(cleanText(d))
# Split review text into parsed sentences uisng NLTK's punkt tokenizer
# nltk.download()
tokenizer = nltk.data.load('tokenizers/punkt/english.pickle')
def parseSent(review, tokenizer, remove_stopwords=False):
'''
Parse text into sentences
'''
raw_sentences = tokenizer.tokenize(review.strip())
sentences = []

Department of IT, SVCE Page 45


SENTIMENTAL ANALYSIS ON MOBILE PHONE REVIEWS

for raw_sentence in raw_sentences:


if len(raw_sentence) > 0:
sentences.append(cleanText(raw_sentence, remove_stopwords, split_text=True))
return sentences
# Parse each review in the training set into sentences
sentences = []
for review in X_train_cleaned:
sentences += parseSent(review, tokenizer)
print('%d parsed sentence in the training set\n' %len(sentences))
print('Show a parsed sentence in the training set : \n', sentences[2])
# Fit and transform the training data to a document-term matrix using TfidfVectorizer
tfidf = TfidfVectorizer(min_df=5) #minimum document frequency of 5
X_train_tfidf = tfidf.fit_transform(X_train)
print("Number of features : %d \n" %len(tfidf.get_feature_names())) #1722
print("Show some feature names : \n", tfidf.get_feature_names()[::1000])
# Logistic Regression
lr = LogisticRegression()
lr.fit(X_train_tfidf, y_train)
X_test_cleaned[0]
# Look at the top 10 features with smallest and the largest coefficients
feature_names = np.array(tfidf.get_feature_names())
sorted_coef_index = lr.coef_[0].argsort()
print('\nTop 10 features with smallest coefficients
:\n{}\n'.format(feature_names[sorted_coef_index[:10]]))
print('Top 10 features with largest coefficients :
\n{}'.format(feature_names[sorted_coef_index[:-11:-1]]))
def modelEvaluation(predictions):
'''
Print model evaluation to predicted result
'''

Department of IT, SVCE Page 46


SENTIMENTAL ANALYSIS ON MOBILE PHONE REVIEWS

print ("\nAccuracy on validation set: {:.4f}".format(accuracy_score(y_test,


predictions)))
print("\nAUC score : {:.4f}".format(roc_auc_score(y_test, predictions)))
print("\nClassification report : \n", metrics.classification_report(y_test, predictions))
print("\nConfusion Matrix : \n", metrics.confusion_matrix(y_test, predictions))
# Evaluate on the validaton set
predictions = lr.predict(tfidf.transform(X_test_cleaned))
modelEvaluation(predictions)
#pickle file added
import pickle
with open('tfidf.pkl', 'wb') as th:
pickle.dump(tfidf, th)
with open('lr.pkl', 'wb') as lh:
pickle.dump(lr, lh)
# Plot distribution of rating
plt.figure(figsize=(12,8))
# sns.countplot(df['Rating'])
x = df['Rating'].value_counts().sort_index().plot(kind='bar')
plt.title('Distribution of Rating')
plt.xlabel('Rating')
plt.ylabel('Count')
# Plot number of reviews for top 20 brands
brands = df["Brand Name"].value_counts()
# brands.count()
plt.figure(figsize=(12,8))
brands[:20].plot(kind='bar')
plt.title("Number of Reviews for Top 20 Brands")
# Plot number of reviews for top 50 brands
brands = df["Brand Name"].value_counts()
# brands.count()

Department of IT, SVCE Page 47


SENTIMENTAL ANALYSIS ON MOBILE PHONE REVIEWS

plt.figure(figsize=(12,8))
brands[:50].plot(kind='bar')
plt.title("Number of Reviews for Top 50 Brands")
# Plot number of reviews for top 20 products
products = df["Product Name"].value_counts()
# brands.count()
plt.figure(figsize=(12,8))
products[:20].plot(kind='bar')
plt.title("Number of Reviews for Top 20 products")
# Plot distribution of review length
review_length = df["Reviews"].dropna().map(lambda x: len(x))
plt.figure(figsize=(12,8))
review_length.loc[review_length < 1500].hist()
plt.title("Distribution of Review Length")
plt.xlabel('Review length (Number of character)')
plt.ylabel('Count')
review_length = df["Reviews"].dropna().map(lambda x: len(x))
df["lenght"] = review_length
#word cloud
def create_word_cloud(brand, sentiment):
try:
df_brand = df.loc[df['Brand Name'].isin([brand])]
df_brand_sample = df_brand.sample(frac=0.1)
word_cloud_collection = ''
if sentiment == 1:
df_reviews = df_brand_sample[df_brand_sample["Sentiment"]==1]["Reviews"]
if sentiment == 0:
df_reviews = df_brand_sample[df_brand_sample["Sentiment"]==0]["Reviews"]
for val in df_reviews.str.lower():
tokens = nltk.word_tokenize(val)

Department of IT, SVCE Page 48


SENTIMENTAL ANALYSIS ON MOBILE PHONE REVIEWS

tokens = [word for word in tokens if word not in stopwords.words('english')]


for words in tokens:
word_cloud_collection = word_cloud_collection + words + ' '
wordcloud = WordCloud(max_font_size=50,background_color='white', width=500,
height=300).generate(word_cloud_collection)
plt.figure(figsize=(10,10))
plt.imshow(wordcloud)
plt.axis("off")
plt.show()
except:
pass
#display cloud
create_word_cloud(brand='Apple', sentiment=1)

APPLICATION
app1.py
import streamlit as st
import numpy as np
import pandas as pd
import plotly.express as px
from wordcloud import WordCloud
import seaborn as sns
from nltk.corpus import stopwords
from nltk.stem import WordNetLemmatizer
from nltk.tokenize import word_tokenize
import matplotlib.pyplot as plt
from bs4 import BeautifulSoup
import re
import nltk
import warnings

Department of IT, SVCE Page 49


SENTIMENTAL ANALYSIS ON MOBILE PHONE REVIEWS

from PIL import Image


import pickle
nltk.download('punkt')
nltk.download('stopwords')
df = pd.read_csv('Amazon_Unlocked_Mobile.csv')
page = st.sidebar.selectbox("Select Activity", ["Sentiment Analyser", "Visualization",
"Word cloud"])
pkl_file1 = open('lr.pkl', 'rb')
lr = pickle.load(pkl_file1)
pkl_file2 = open('tfidf.pkl', 'rb')
tfidf = pickle.load(pkl_file2)
# st.write(page)
def label_data():
rows = pd.read_csv('Amazon_Unlocked_Mobile.csv', header=0, index_col=False,
delimiter=',')
labels = []
for cell in rows['Rating']:
if cell >= 4:
labels.append('2') #Good
elif cell == 3:
labels.append('1') #Neutral
else:
labels.append('0') #Poor
rows['Label'] = labels
del rows['Review Votes']
return rows
def clean_data(data):
#replace blank values in all the cells with 'nan'
df.replace('',np.nan,inplace=True)
#delete all the rows which contain at least one cell with nan value

Department of IT, SVCE Page 50


SENTIMENTAL ANALYSIS ON MOBILE PHONE REVIEWS

df.dropna(axis=0, how='any', inplace=True)


#save output csv file
df.to_csv('labelled_dataset.csv', index=False)
return data
clean_data(df)
df = pd.read_csv('labelled_dataset.csv')
df.head()
df = df.sample(frac=0.1, random_state=0) #uncomment to use full set of data
# Drop missing values
df.dropna(inplace=True)
# Remove any 'neutral' ratings equal to 3
df = df[df['Rating'] != 3]
# Encode 4s and 5s as 1 (positive sentiment) and 1s and 2s as 0 (negative sentiment)
df['Sentiment'] = np.where(df['Rating'] > 3, 1, 0)
df.head()
def cleanText(raw_text, remove_stopwords=False, stemming=False, split_text=False):
'''
Convert a raw review to a cleaned review
'''
text = BeautifulSoup(raw_text, "html.parser").get_text() #remove html
letters_only = re.sub("[^a-zA-Z]", " ", text) # remove non-character
words = letters_only.lower().split() # convert to lower case
if remove_stopwords: # remove stopword
stops = set(stopwords.words("english"))
words = [w for w in words if not w in stops]
if stemming==True: # stemming
# stemmer = PorterStemmer()
stemmer = SnowballStemmer('english')
words = [stemmer.stem(w) for w in words]
if split_text==True: # split text

Department of IT, SVCE Page 51


SENTIMENTAL ANALYSIS ON MOBILE PHONE REVIEWS

return (words)
return( " ".join(words))
if page == "Visualization" :
st.header("Distribution of Rating")
df1 = pd.DataFrame(data = [[1,5787],[2,2023],[4,5009],[5,17998]],columns =
["Rating","Count"])
fig = px.pie(df1, values= "Count", names='Rating', title='Distribution of Rating')
st.plotly_chart(fig,use_container_width=10)
st.header("Number of Reviews for Top 20 Brands")
brands = df["Brand Name"].value_counts()
b = brands.to_frame()
b = b.reset_index()
b = b.iloc[0:20,:]
b.columns = ["Brand Name","Number of Reviews"]
fig = px.bar(b,x='Brand Name',y = "Number of Reviews",title="Number of Reviews
for Top 20 Brands")
st.plotly_chart(fig,use_container_width=10)
st.header("Number of Reviews for Top 50 Brands")
brands = df["Brand Name"].value_counts()
b = brands.to_frame()
b = b.reset_index()
b = b.iloc[0:50,:]
b.columns = ["Brand Name","Number of Reviews"]
fig = px.bar(b,x='Brand Name',y = "Number of Reviews",title="Number of Reviews for
Top 50 Brands")
st.plotly_chart(fig,use_container_width=10)
st.header("Number of Reviews for Top 20 products")
brands = df["Product Name"].value_counts()
b = brands.to_frame()
b = b.reset_index()

Department of IT, SVCE Page 52


SENTIMENTAL ANALYSIS ON MOBILE PHONE REVIEWS

b = b.iloc[0:20,:]
b.columns = ["Product Name","Number of Reviews"]
fig = px.bar(b,x='Product Name',y = "Number of Reviews",title="Number of Reviews
for Top 20 products")
st.plotly_chart(fig,use_container_width=30)
st.header("Number of Reviews for Top 50 products")
brands = df["Product Name"].value_counts()
b = brands.to_frame()
b = b.reset_index()
b = b.iloc[0:50,:]
b.columns = ["Product Name","Number of Reviews"]
fig = px.bar(b,x='Product Name',y = "Number of Reviews",title="Number of Reviews
for Top 50 products")
st.plotly_chart(fig,use_container_width=30)
st.header("Distribution of Review Length")
review_length = df["Reviews"].dropna().map(lambda x: len(x))
df["Review length (Number of character)"] = review_length
fig = px.histogram(df, x="Review length (Number of character)",title = "Distribution of
Review Length" )
st.plotly_chart(fig,use_container_width=20)
st.header("Polarity Distribution")
df3
=pd.DataFrame([["Positive",230674],["Neutral",26058],["Negative",77603]],columns=
["Polarity","Frequency"])
fig = px.bar(df3,x='Polarity',y = "Frequency",title = "Polarity Distribution")
st.plotly_chart(fig,use_container_width=20)
def create_word_cloud(brand, sentiment):
df_brand = df.loc[df['Brand Name'].isin([brand])]
df_brand_sample = df_brand.sample(frac=0.1)
word_cloud_collection = ''

Department of IT, SVCE Page 53


SENTIMENTAL ANALYSIS ON MOBILE PHONE REVIEWS

if sentiment == 1:
df_reviews = df_brand_sample[df_brand_sample["Sentiment"]==1]["Reviews"]
if sentiment == 0:
df_reviews = df_brand_sample[df_brand_sample["Sentiment"]==0]["Reviews"]
for val in df_reviews.str.lower():
tokens = nltk.word_tokenize(val)
tokens = [word for word in tokens if word not in stopwords.words('english')]
for words in tokens:
word_cloud_collection = word_cloud_collection + words + ' '
wordcloud = WordCloud(max_font_size=50, width=500,
height=300).generate(word_cloud_collection)
plt.imshow(wordcloud, interpolation="bilinear")
plt.axis("off")
plt.show()
plt.savefig('WC.jpg')
img= Image.open("WC.jpg")
return img
if page == "Word cloud" :
st.header("Word cloud")
form = st.form(key='my_form1')
brand = form.text_input(label='Enter Brand Name')
s = form.selectbox("Select The Sentiment",["Positive","Negative"])
submit_button = form.form_submit_button(label='Plot Word Cloud')
if submit_button:
if s=="Positive" :
img = create_word_cloud(brand,1 )
st.image(img)
else :
img = create_word_cloud(brand,0 )
st.image(img)

Department of IT, SVCE Page 54


SENTIMENTAL ANALYSIS ON MOBILE PHONE REVIEWS

if page == "Sentiment Analyser":


st.header("Product Review Prediction")
form = st.form(key='my_form2')
r = form.text_input(label='Enter Product Review')
submit_button = form.form_submit_button(label='Predict Review')
if submit_button :
tokenizer = nltk.data.load('tokenizers/punkt/english.pickle')
S = cleanText(r)
l = []
l.append(S)
pred = lr.predict(tfidf.transform(l))
if int(pred) == 1 :
st.header("Positive Sentiment")
else :
st.header("Negative Sentiment")

Department of IT, SVCE Page 55


SENTIMENTAL ANALYSIS ON MOBILE PHONE REVIEWS

10.SCREEN LAYOUTS

10.1 USER DETAIL PAGE:

10.2 DATASET:

Department of IT, SVCE Page 56


SENTIMENTAL ANALYSIS ON MOBILE PHONE REVIEWS

10.3 PREDICITION PAGE:

Results:

10.4 RESULT PAGE 1

Department of IT, SVCE Page 57


SENTIMENTAL ANALYSIS ON MOBILE PHONE REVIEWS

10.5 RESULT PAGE 2

Department of IT, SVCE Page 58


SENTIMENTAL ANALYSIS ON MOBILE PHONE REVIEWS

11.CONCLUSION & FUTURE WORK

Sentiment analysis deals with the classification of texts based on the sentiments they
contain. This sentiment analysis model consisting of three core steps, namely data
preparation, review analysis and sentiment classification and describes representative
techniques involved in those steps. This prediction model is built by using a logistic
regression algorithm to predict sentimental analysis on mobile phone reviews and used to
predict whether the review is positive review or negative review. At the end we have used
quality metric parameters to measure the performance of prediction model. Exploratory
visualizations are performed on mobile reviews and plotted in graphs. finally, word cloud
is created for positive and negative sentiment reviews of selected brand. Sentiment analysis
is an emerging research area in text mining and computational linguistics.

The future of sentiment analysis is going to continue to dig deeper, far past the surface of
the number of likes, comments and shares, and aim to reach, and truly understand, the
significance of social media interactions and what they tell us about the consumers behind
the screens. This forecast also predicts broader applications for sentiment analysis – brands
will continue to leverage this tool, but so will individuals in the public eye, governments,
nonprofits, education centers and many other organizations

Department of IT, SVCE Page 59


SENTIMENTAL ANALYSIS ON MOBILE PHONE REVIEWS

12.REFERENCES

➢ Data Source from Kaggle


https://www.kaggle.com/PromptCloudHQ/amazon-reviews-unlocked-mobile-phon es
➢ “Working with text Data” from sklearn
http://scikit-learn.org/stable/tutorial/text_analytics/working_with_text_data.html
➢ Sentiment Analysis and Opinion Mining by Bing Liu
(http://www.cs.uic.edu/~liub/FBS/SentimentAnalysisand-OpinionMining.html)

➢ Sentiment Analysis by Professor Dan Jurafsky.

(https://web.standford.edu/class/cs124/lec/ sentiment.pdf)

➢ S. Blair-Goldensohn, Hannan, McDonald, Neylon, Reis and Reynar 2008 – Building a


Sentiment Summarizer for Local Service Review.

(http://www. ryanmcd.com/papers/local_service_summ.pdf)

➢ Ding, Xiaowen, Bing Liu, and Philip S. Yu. "A holistic lexicon-based approach to opinion
mining." Proceedings of the 2008 International Conference on Web Search and Data
Mining. ACM, pp. 231-240, 2008.

➢ Word Cloud
https://boostlabs.com/blog/what-are-word-clouds-value-simple-
visualizations/#:~:text=Word%20clouds%20(also%20known%20as,words%20depicted%
20in%20different%20sizes.

Department of IT, SVCE Page 60

You might also like