Online - Reviews Sentiment - Analysis
Online - Reviews Sentiment - Analysis
I. INTRODUCTION
Internet and computers rule the present world. It is very fascinating how this has shaped the current
market scenario. The 21st century marketing is mostly done online. With millions of products being sold on
hundreds of platforms it gets difficult for the common consumer to decide, which service is the best in the market?
In the case of hotel, booking this problem becomes even more evident. To overcome this problem, we have created
an online travel planner that will use the technique of Sentiment analysis to generate reviews based on the ratings
given by the previous customers. A customer will mostly give his/her reviews based on the experience he/she had
during the stay. During the reviewing of any hotel various key areas are considered like
Maintenance, hospitality, food, Room cleanliness, Cost efficiency, Wi-Fi, etc. It has been proved by
research that the people tend to rely
more on recommendations from people they trust just like their friends rather than the online recommender
systems[1]. The reviews are generally given on these aspects. There are various sentiments which can be expressed
by the users are excellent, very good, good, bad, very bad. In such competitive times, every hotel wants to know
about their services from the customers. But in most cases customers hesitate to share any reviews. But with the
help of sentiment analysis it gets easier for the customer to rate every service provided by the hotel. In this module
we have collected sets of data from various sources. We have applied machine learning and natural language
processing techniques on the data to analyze the sentiments of the customers. The interface has been kept minimal
and user friendly. Since online review systems generate a lot of data, so it gets easier for the hotels to focus on
various departments easily. Text analysis techniques also helps in focusing and improvement on the weak areas.
It can also help in checking the credibility of the customers to avoid false reviews.
Organization of Paper
The paper is organized into several phases. The abstract phase gives a brief intro into the underlying
problem statement. The usage of the proposed research in the current scenario as well as the prevailing methods
are explained in the introduction part. During the conduction of this research various previous works related to
this topic were examined and are described under the related work heading.
The methodology explains the basic theorems and techniques used in our analysis like natural language
processing. The algorithm used is explained in the proposed solution heading. The end result is explained under
the result analysis heading. In the end the conclusion and the future aspects of this research are explained along
with merits and demerits. Under the references heading every reference is mentioned.
http://indusedu.org Page 25
This work is licensed under a Creative Commons Attribution 4.0 International License
Shivam Singh et al., International Journal of Research in Engineering, IT and Social Sciences, ISSN 2250-0588,
Impact Factor: 6.565, Volume 10 Issue 04, April 2020, Page 25-29
In this paper Angela S.H Lee et al. [2] proposed a method in which they collected the data from Trip
Advisor website. The data on the website was unstructured text. In total there were around 11340 reviews for 4
hotels within the four-five-star hotel categories. They used SAS text miner for text analytics. They broke the text
into nodes. These nodes created a dictionary and performed word stemming to link terms that are from the same
verbs. The output of this node will be the input for the next node, the text filer. In this node spell check is applied
to the text. The spell checker will list the text from highest to the lowest frequencies. The end result displayed
some improvements in the field of text mining. It also showed the potential of applications in various fields other
than hotel reviews.
In this paper, Sopian Aji et al. [3] proposed a system in which Sentiment Analysis based on Naïve Bayes
theorem was used. Data was manually filtered to be grouped as positive or negative. The naïve bayes theorem
used here is based on Particle Swarm Optimization technique. The accuracy of the experiment data was measured
by an application called Rapid Miner studio on Four Season resort at Bali Indonesia. The comment review data
was collected from websites like www.tripadvisor.com [18]. It obtained 200 reviews out of which 100 were
positive and 100 were negative. Positive and negative data were kept in their respective folders as a .txt file. In
Rapid Miner application for text processing the researcher used tokenize, Transform Cases, stop words
(dictionary). Naïve Bayes theorem was used to get the accuracy. Equation of confusion matrix, ROC curve and
proportion of the predicted value are obtained. It was found that the result can be improved if we use Naïve Bayes
theorem along with Particle Swarm Optimization Technique.
In this paper, George Markopoulos et al. [4] proposed a system in which first they collected the data set.
The data set was then used to train the SVM’s classifier. They selected unigrams as features and adopted two
methods. In the first method the classification algorithm calculates the frequency by using TF-IDF weighing
schemes, while in the later method the algorithm counts the occurrence of individual words which express positive
or negative sentiments. The TF-IDF bag of words approach has been proven to increase the classification accuracy
of sentiment analysis systems (Paltoglou and Thelwall 2010). The TF-IDF method was more efficient then the
Term Occurrence approach. In this approach each item is considered positive or negative based on polarity terms
that it contains.
In this paper et al. [5], Nadeem Akhtar proposed a system for Sentiment Analysis in which the data was
collected from TripAdvisor. A custom scrapper was built using python libraries ‘beautifulsoup’ and ‘urlib’. This
system took URL of any hotel’s webpage as input and checks the review one at a time. Two files are created in
this, one file contains the review text and the other file contains the review and the metadata. The reviews are then
classified into predefined categories. MALLET tool is used for topic modelling. Number of topics assumed are
15. Textual processing was done with the help of Natural Language Processing Toolkit 3.0, which is a python
library and contains text-processing functions. MALLET is based on Java used for statistical processing of natural
language and consists of tools that can perform tasks like clustering, modelling, classification etc. In this method
the data crawled is not yet widely used. The review date can be further used to find seasonal information of hotels.
In addition, the authentication of the customer can also be done to avoid false reviews, which can defame the hotel
and reduce its ratings. Therefore, this research contributed in verifying the authentication of the customers also,
Thus reducing the chances of false comments.
III. METHODOLOGY
http://indusedu.org Page 26
This work is licensed under a Creative Commons Attribution 4.0 International License
Shivam Singh et al., International Journal of Research in Engineering, IT and Social Sciences, ISSN 2250-0588,
Impact Factor: 6.565, Volume 10 Issue 04, April 2020, Page 25-29
The assumption in Bayes theorem is that each figure make an equal and independent contribution towards the
outcome.
The term ‘NAÏVE’ is used in this theorem because it is assumed that the algorithm considers the features
that is being used to make the predictions INDEPENDENT from each other. In Naïve Bayes technique, the
essential plan is to seek out the probabilities of classes given a text document by using the joint chances of words
and classes.
NATURAL LANGUAGE PROCESSING:
Natural language processing or NLP are language-processing techniques similar to speech and text by
the use of programming methods. It is an area of computer science, which comes under the field of Artificial
Intelligence or AI. It comprises of the programming of computers so that they can method and analyze large
amount of language information. It uses various Machine Learning techniques, which is used for the prediction
analysis.
NLP is extensively used in our Hotel Review system since it will allow the algorithm to generate reviews
based on the predictive text analysis of the data.
PROPOSED SOLUTION:
In our proposed work, we are using various machine-learning algorithms to predict the texts from the
data sets. These algorithms can evaluate huge amount of data. The data sets are collected from websites like
KAGGLE. We have the data sets of hotels of Chennai city. We have applied the Naïve Bayes theorem on the data
sets to get predictions about the reviews. The algorithm can be applied on a very large scale but due to limitations,
we have performed this on small data sets. The data sets, which are to be used, should be clean.
MACHINE LEARNING:
The basic implementation of the algorithms on this system consists of machine learning algorithms. It is
a sub branch of Artificial Intelligence. ML is able to predict the future results based on past data. It has several
applications in the field of Weather Forecasting, Voice assistants etc. In our system, it would help in generating
reviews for hotels. The reviews would be classified as ‘Good’ or ‘Bad’. Machine learning consists of various
algorithms like Random forest, K clustering, Support Vector Machine [1]. We have use SVM’s in our system to
classify the data.
CLASSIFICATION:
Classification in nonprofessional terms is defined as the sorting of various different items into their
respective groups or classes. In machine learning terms, it is defined as the classification to which set of categories
a particular observation or result belongs. In our hotel review system, the reviews will be classified into fields like
good and bad. Thus with the help of ML the data would be classified into their respective fields.
ALGORITHM USED:
For the process of research, we are doing sentiment classification for the hotel review with good and bad
labels. To get the optimized results, there were two different scenarios, which are been, built for testing the model
comparison, that is:
SCENARIO 1: Calculating the performance of the model by the help of pre-processing and not using pre-
processing. It is being done to know the comparative performance of the pre-processing effect on the model.
SCENARIO 2: Calculate the collection of word performance selection; 1 is frequency based, and selection 2 that
is erasing all feature with difference of the lowest positive and negative probability. This is done to know the most
optimal performance for the model classification.
This analysis is done on different stages that is pre-processing on the data, after producing the data, token
will be doing comparison by the help of feature extraction, feature selection 1 and feature selection 2. Feature
extraction uses the bag of word. While feature selection is using two methods that is feature selection 1 with is
frequency based or erasing feature that has low word appearance frequency and feature selection 2 with erasing
feature with the minimum difference of good and bad value. The classification stage using Naïve Bayes, and
evaluation with ten-fold cross validation.
1. Begin Procedure
2. Import the dataset from various online sources.
3. Use fit_transform on CountVectorizer function to split words of each data of datasets.
4. Convert the data into a frequency table.
5. Use fit_transform or the TfidfTransformer function on the data to set the weight of each word of the
data
6. Use the MultinomialNB method (imported from naive bayes package) to train and test the weighted data.
7. Print the results.
http://indusedu.org Page 27
This work is licensed under a Creative Commons Attribution 4.0 International License
Shivam Singh et al., International Journal of Research in Engineering, IT and Social Sciences, ISSN 2250-0588,
Impact Factor: 6.565, Volume 10 Issue 04, April 2020, Page 25-29
In Figure 2 it can be clearly observed that how the naïve Bayes algorithm works on the given data. The test data
is termed as good or bad based on the data sets chosen. It also showcases the usability of the algorithm.
SVM are used in coordination with naïve bayes to obtain the results.
VI. REFERENCES
1. P. Sanjay Bhargava, G. Nagarjuna Reddy, R.V. Ravi Chand, K.Pujitha, Anjali Mathur International Journal of Innovative
Technology and Exploring Engineering (IJITEE) ISSN: 2278-3075, Volume-8 Issue-6, April 2019
2. Angela S.H Lee, Zaharin Yusoff, Zuraini Zainol, Pillai V.
3. nternational Journal of Engineering & Technology, 7 (4.31) (2018) 341-347
4. Sopian Aji, Warjiyono, Dany Pratmanto,Angga Ardiansyah, Andrian Eko Widodo, Husni Faqih, Suleman, Fandhilah
5. ICCSET 2018, October 25-26, Kudus,Indonesia Copyright 2018 EAI DOI 10.4108/eai.24-10-2018.2280546
6. George Markopoulos, George Mikros,,Anastasia Iliadi, Michalis Lionto Springer International Publishing Switzerland 2015 V.
Katsoni (ed.), Cultural Tourism in a Digital Era, Springer Proceedings in Business and Economics, DOI 10.1007/978- 3-319-
15859-4_31
7. Nadeem Akhtara, Nashez Zubaira, Abhishek Kumara, Tameem Ahmad 7th Internationalc conference on Advances in Computing &
Communications, ICACC-2017, 22-24 August 2017, Cochin, India.
8. Tran Sy BANG and Virach SORNLERTLAMVANICH, “Sentiment
Classification for Hotel Booking Review Based on Sentence
Dependency Structure and Sub-Opinion Analysis”, pg:-910-916
9. Youngseok Choi & Habin Lee, “Data properties and the performance of sentiment classification for electronic commerce
applications”,pg 994-1012
10. Wararat Songpan, “The Analysis and Prediction of Customer Review Rating Using Opinion Mining”, pg:-71-77
11. Sumbal Riaz, Mehvish Fatima, M. Kamran, M. Wasif Nisar,
“Opinion mining on large scale data using sentiment analysis and kmeans clustering”.
http://indusedu.org Page 28
This work is licensed under a Creative Commons Attribution 4.0 International License
Shivam Singh et al., International Journal of Research in Engineering, IT and Social Sciences, ISSN 2250-0588,
Impact Factor: 6.565, Volume 10 Issue 04, April 2020, Page 25-29
http://indusedu.org Page 29
This work is licensed under a Creative Commons Attribution 4.0 International License