08931816
08931816
net/publication/337974148
CITATIONS READS
16 15,184
6 authors, including:
All content following this page was uploaded by Abdalraheem Alsmadi on 19 December 2019.
Irbid, Jordan
Email: [email protected],{maalshbool,yijararweh}@just.edu.jo
Abstract—Shifting from traditional marketing into online mar- know the true quality of the product that results from their
keting has allowed people to share their experiences about various experience, and it may vary from person to another, and
aspects of those products using textual comments known as therefore these reviews are the main source of the decision to
Product Reviews. As a result of this shifting, people are able
to access various websites where they can find reviews for all buy the product or not [3], [4].
kind of products, even the rare ones. Thus, these reviews act
as a supplementary information and help people to make the However, when we consider a huge amount of reviews from
right decision before buying products. Reviews that influence many customers for one product, it will be very difficult for
one’s decision are considered influential reviews, as they provide any new customer to follow all these reviews, and it will be
truthful experiences. Given the list of reviews for a certain
product, each user can vote for any given review as helpful or difficult to distinguish between helpful reviews that provide a
unhelpful. As a result, each review would be given a number real content and those that are not helpful and contain nothing.
that represents how many users found this review helpful. This So, it is very important to study such reviews and analyze them
would indicate how influential each review is. As a result, buyers in order to know the helpful comments to be recommended to
rely on these reviews and those who wrote these reviews. This new customers , who would like to see that product and buy it.
study emphasizes on the importance of using user votes as an
important source of information for new users. The contribution
of this work lies in two aspects. First, it provides a comprehensive To a certain extent, this task may seem somewhat difficult
statistical analysis of a previously-published dataset containing even on the human itself, because it naturally differs from
Amazon reviews. Second, this study insists on the importance of the idea of the sentiment analysis by categorizing the reviews
using user votes. This study is the first phase for many future as positive or negative. This is because we may find some
interesting directions. It was shown that the relationship between
the number of reviews and the percentage of votes is an inverse positive reviews and negative reviews on the same product
relationship. that are both helpful. Therefore, this task may be one of the
Index Terms—Influential Reviews, Helpful Reviews, User Feed- most challenging tasks of natural language processing.
back, Viral Marketing, Review Voting.
Clearly, previous customers reviews are very important to
I. I NTRODUCTION new customers. Such feedback plays a big role in evaluating
In the electronic trading systems, which are interested in products and increasing the percentage of sales of that product,
online selling and marketing, all forms of customers reviews especially to customers who connect the purchase of some
of any product are considered essential [1]. Nowadays, there products with the reviews of the product [3], [4].
are many types of online shopping. One main type is the use of Since the e-shopping sites allow customers to evaluate
social networking sites such as Facebook, Twitter, Instagram, products after confirming the purchase, Amazon’s customer
and other social networking sites to market products of a can rate the product of five degrees/stars as follows: starting
particular company through its pages. The second main type is from one star which refers to “I hated it”, two stars refer to
the use of the company’s web page. In both cases, customers “I didn’t like it”, three stars refer to “It was OK”, four stars
can express their views on products and share their experiences refer to “I liked it”, and five stars refer to “I loved it”. After
with others by putting reviews in comments that can be that evaluation, the customer should write a review of the
seen by everyone. As a result, these stores and e-commerce product indicating his/her experience and expressing his/her
websites has increased their sales and revenues significantly. opinion about that product and why he/she is giving that rating.
For example, Amazon earned a revenue of $232 billion in
2018 [2]. In a later step, future users can vote for each of the existing
Therefore, new customers has started to rely very heavily reviews as helpful or unhelpful. In this study, Amazon
on these reviews from previous users of that product, and reviews dataset, which was published by [5], [6] will be
studied thoroughly. This dataset provides details about the The proposed model was evaluated and achieved F1 = 54.9
customers reviews. For example, each review consists of a % and an accuracy of 65 %.
set of tags such as: “reviewer ID”, “aisn”, “reviewerName”,
“helpful”, “reviewText”, “overall”, “summary”, and finally While in [13], [14], Wei et al. proposed another deep
“reviewTime”. All these tags will be described in section learning model for helpfulness prediction but in two ways,
III. In this study, we will provide interesting statistical their research aims to classify the best and worst reviews.
information about the dataset. These information would Due to the limitations on resources, they pick up 50,000
represent the fist phase in many future phases in this topic. reviews randomly from the previously mentioned “5-core”
dataset; they divided the dataset into two parts, 80 % for
The rest of this paper is organized as follows: The related training and 20 % for testing. They proposed a Recurrent
work are presented in section II. Dataset description and Neural Network (RNN), their model identify good product
preparation are described in section III. Section IV discuss the reviews with 80.50 % accuracy with 88 % area under the
utilisation of this dataset, while conclusion and future work are curve (AUC), and identify bad product reviews with 75.70 %
provided in section V. accuracy with 83 % area under the curve (AUC).
II. L ITERATURE R EVIEWS Some works extracted user hidden interest from user
In recent years, the interest in evaluation and analysis of reviews or rates, such as [15], [16], [17], [18], [19], [20].
online customers reviews has become a major part of the Another work has done by Qu et al. in [8], they proposed
researches. a Convolutional Neural Network (CNN) with introducing
The total number of votes on a review is the key to the two approaches for reviews helpfulness assessment. The two
adoption of the review to build up any prediction model, approaches were by using different initialization of word
and it is a clear basic fact, for this reason, some researchers vectors (Random initialization and Glove (Global Vectors)
exclude reviews that have never received any votes such as [21], [22], [23]), and the second approach using different
[7], in their research, Hong et al. discarded all reviews that review words length. They choose two main categories in
are free of votes, or filter out all reviews that have a low their experiments “Books”, and “Electronics”, with exclude
number of votes such as [8] Qu et al. in their research they reviews with less than six votes. They consider the review as
discarded reviews with fewer than 6 total votes. helpful if the percentage of the helpful votes on the review
exceed 50 % from the votes, and as unhelpful otherwise.
Some researches such as [9], rely on extracting the emotions The proposed model was evaluated and achieved for the
from the text of the review to predict the helpful reviews Electronics category an accuracy of 76.96 % and 77 % for
using “Geneva Affect Label Coder” (GALC) lexicon [10], F1 measure, for the Books category 75.17 %, 75 % for the
they apply several traditional supervised learning methods accuracy and F1 measure respectively.
like Support Vector Machine (SVM), Naive Bayes (NB), and
Random Forest (RF) on three different datasets; Amazon
III. DATASET D ESCRIPTION AND P REPARATION
reviews dataset with 303,937 reviews, Trip advisor dataset
with 68,049 reviews, and Yelp dataset with 229,908 reviews. A. Dataset Description
They extracted a set of features from the three different In this study, the Amazon review dataset is used, which
datasets, like the emotionality of the reviews, sentiment comes from Stanford Network Analysis Platform (SNAP)
analysis, Part of Speech tagging (POS), text statistics like and is publicly available at [5], [6]. Amazon reviews dataset
“Flesch Reading Ease” (FLES) measure [11], “1EN1”, contains about 83 million unique reviews, covering 24 main
“RATE”, and “TF-IDF”. Regarding Amazon dataset, they product categories, spanning from May 1996 to July 2014.
considered the review as a helpful when it has 60% of votes Table I shows the main 24 product categories, and the total
as a helpful review. The proposed model was evaluated with amount of reviews and the number of products covered.
Support Vector Machine method on Amazon dataset, and Each record in the dataset is related to one review for a
achieve about 68 % accuracy, 68 %, and 80 % for F1 and specific product in the related category. Each record contains
AUC respectively. the following nine tags: reviewerID, asin, reviewerName,
helpful, reviewText, overall, summary, unixReviewTime, and
In [12] Nguy cast the problem as a binary classification reviewTime. Clearly, each tag expresses a certain value. The
problem, the author proposed a Recurrent Neural Network following is the description of each tag.
(RNN) and Long Short Term Memory (LSTM), the reviews 1) reviewerID: represents the reviewer ID in Amazon web-
were labeled with a ratio greater than 50 % as helpful review site.
and unhelpful review otherwise. Reviews extracted from three 2) asin: represents the identification numbers that identify
categories from small subsets called “5-cores” provided from the items.
the authors of Amazon dataset were specially prepared for 3) reviewerName: represents the name of the reviewer.
experimentation; “Digital Music”, “Musical Instruments”, 4) helpful: represents the ratio of customers who found the
“Patio, Lawn and Garden” resulting with 88,239 reviews. review helpful, e.g [ 20/32 ]
556
2019 Sixth International Conference on Social Networks Analysis, Management and Security (SNAMS)
TABLE I: Amazon reviews dataset description. its helpfulness level. The helpfulness level is based on the
Amazon reviews dataset
ratio of users who votes for this review. The analysis was
made on the first four categories in the dataset, which are
Reviews Products
Category
Contained covered “Books”, “Electronics”, “Movies and TV’s”, and “CDs ad
Vinyl”.
Books 22,507,155 2,370,585
Through the statistical analysis, the helpfulness level is
Electronics 7,824,482 498,196
divided into six classes. The first class represent unvoted
Movies and TV 4,607,047 208,321 reviews. The second class represents reviews that obtained
CDs and Vinyl 3,749,004 492,799 one to five votes. The third class represents the reviews that
Clothing, Shoes and Jewelry 5,748,920 1,503,384 obtained five to ten votes. The fourth class represents the
Home and Kitchen 4,253,926 436,988
reviews that got 10 - 50 votes. The fifth class represents the
reviews that got 50 - 100 votes, and final class represents the
Kindle Store 3,205,467 434,702
reviews that got more than 100 votes.
Sports and Outdoors 3,268,695 532,197
Cell Phones and Accessories 3,447,249 346,793
C. Text Normalization
Health and Personal Care 2,982,326 263,032
Toys and Games 2,252,771 336,072
In order to prepare the dataset, and to extract the reviews
text with their helpfulness rating from the original dataset
Video Games 1,324,753 50,953
(JSON files), a special parser was implemented in order to
Tools and Home Improvement 1,926,047 269,120 extract the specific tags that we need. For each comment,
Beauty 2,023,070 259,204 only the review text and the helpfulness rates were extracted.
Apps for Android 2,638,173 61,551 As we mentioned in figure 1, each object in the JSON files
Office Products 1,243,186 134,838
represents one review with its information by nine tags. Only
the “helpful” and the “reviewText”tags were considered.
Pet Supplies 1,235,316 110,707
The remaining other tags which are ( “reviewerID”, “asin”,
Automotive 1,373,768 331,090 “reviewerName”, “overall”, “summary”, “unixReviewTime”,
Grocery and Gourmet Food 1,297,156 171,760 and “reviewTime”)were ignored because they are useless for
Patio, Lawn and Garden 993,490 109,094 this task.
Baby 915,446 71,317
In order to define and determine the helpful reviews, a
Digital Music 836,006 279,899
study was proposed by Ghose et al. in [24] that compared
Musical Instruments 500,176 84,901
different experts opinions about the helpfulness of the review.
Amazon Instant Video 583,933 30,648 The results showed that the review is considered as helpful
if the percentage of the helpful votes on the review exceed
60 % from all votes. In this work and due to the divergence
5) reviewText: represents the text of the review. of customers views, the review would be considered helpful
6) overall: represent the rating value of the product. if the percentage of the helpful votes on the review exceeds
7) summary: represents the summary of the review. 75 % from the total amount of votes. On the other hand, the
8) unixReviewTime: represents the time of the review review would be considered unhelpful if the percentage of
as(Unix time). the helpful votes on the review less than 35 % to ensure that
9) reviewTime: represents the time and date of the review. the considered review is extremely unhelpful review.
Figure 1 represents one sample review from the dataset.
IV. D ISCUSSION
When reviewers write their reviews, they often explain their
evaluation of that product based on several factors. Some of
these common factors are as follows: their satisfaction with
that product or not, whether the product met their requirements
or not, the speed of delivery of the product, does the real
product characteristics matched with what is stated in the
advertisement.
Fig. 1: A review sample from Amazon dataset. Bellow an example of two reviews quoted from two differ-
ent categories from Amazon reviews dataset:
B. Reviews Helpfulness Voting Distribution • From “Movies and TV” category:
In order to analyze the dataset , a statistical analysis is Review Text: “Not too bad at the beginning, but you
conducted. It is concerned on ranking each review based on expect it to build to something more than it does. Kind of
557
2019 Sixth International Conference on Social Networks Analysis, Management and Security (SNAMS)
a let down with the ending.”, “helpful”: [1, 9], “overall”: helpful reviews. On the other hand, there are 20,999 reviews
2.0. that got 100 % votes as unhelpful reviews.
• From “CD’s and Vinyl” category:
Review Text: “I’ve never felt such sadness for something
that happened so long ago because of music! I felt my
Scottish blood stirring as I listened to how my ances-
tors were forced from their homes. All Scotsmen (and
women) should have this CD as part of their collection.”,
“helpful”: [4, 4], “overall”: 5.0.
From the previous quoted reviews, the texts contained in each
review and the total number of votes for each can be clearly
noticed. For example, in the first review from the “Movies
and TV” category, we find that one person out of nine has
voted up and recommended this review as a helpful and the
rest thought else. Also, in the second review from the “CD’s
and Vinyl” category, we find that four people out of four have
Fig. 3: Reviews voting distribution for Books category.
voted up and recommended this review as a helpful. If we
read the first review carefully, we can observe that it doesn’t
provide any helpful content and no gain from it, that’s why it Regarding the Electronics category, as can be shown in
got eight votes out of nine as unhelpful review. On the other Table I the number of reviews for this category is about 7.8
hand, the second review referring to the voting rate, it seems million reviews. After analyzing the data, there was about 58
to be a helpful review and of course it does. % of the reviews, which are about 4.5 million reviews and
As shown in Figure 2, the customers can vote for a any represent almost half of the reviews of the category, have
review as helpful or not. zero votes.The second class, which includes reviews that
got from 1 to 5 votes, represents about 33 % of the total
reviews, which means about 2.5 million reviews. The rest of
the classes got different ratios, and it can be seen in Figure
4. On a related note, the highest number of votes for one
review in this category was 31,453 votes, whereas 30,735
votes among them were for helpful reviews. Also, for this
category, considering the reviews that got more than 10 total
votes, there are 61,944 reviews got 100 % votes as a helpful
reviews, on the other hand, there are 6,007 reviews got 100
% votes as unhelpful reviews.
558
2019 Sixth International Conference on Social Networks Analysis, Management and Security (SNAMS)
rest of the classes got different ratios, and it can be seen in reviews in each of the four main categories have got zero
Figure 5. As a notes on the side, the highest number of votes votes. It is worth mentioning that it is not necessarily that
for one review in this category was 21,749 votes, whereas these reviews that have zero votes are actually not useful, or
20,241 vote of them as a helpful reviews. Also, for this do not contain useful values. In contrast, some of them have
category, considering the reviews that got more than 10 total a high value but have not been receiving attention and did not
votes, there are 41,800 reviews got 100 % votes as a helpful appear to readers. Also, to avoid negativity, people tend to
reviews. On the other hand, there are 8,821 reviews got 100 be neutral in their opinion sometimes if they have a negative
% votes as unhelpful reviews. attitude toward a particular thing. On the other hand, nearly
39 % of reviews from the four main categories received a
number of votes ranging from one to five votes.
R EFERENCES
As for the CDs and Vinyl category, table I shows that
the number of reviews for this category is about 3.7 million [1] Y. Liu, X. Huang, A. An, and X. Yu, “Arsa: a sentiment-aware model
for predicting sales performance using blogs,” in Proceedings of the
reviews, and after analyzing the data, there was about 36 % 30th annual international ACM SIGIR conference on Research and
of the reviews which are about 1.3 million reviews have not development in information retrieval. ACM, 2007, pp. 607–614.
voted on them and free of votes.The second class, which [2] “Marketwatch.com. (2019). amazon.com inc..[online] available at:
includes reviews that got from 1 to 5 votes, represents about https://www.marketwatch.com/investing/stock/amzn/financials [accessed
10 mar. 2019].”
45 % of the total reviews, which means about 1.6 million [3] W. Duan, B. Gu, and A. B. Whinston, “The dynamics of online word-
reviews. The rest of the classes got different ratios, and it of-mouth and product salesan empirical investigation of the movie
can be seen in the figure 6 below. As a notes on the side, industry,” Journal of retailing, vol. 84, no. 2, pp. 233–242, 2008.
[4] B. Fang, Q. Ye, D. Kucukusta, and R. Law, “Analysis of the perceived
the highest number of votes for one review in this category value of online tourism reviews: Influence of readability and reviewer
was 2,013 votes, whereas 1,955 vote of them as a helpful characteristics,” Tourism Management, vol. 52, pp. 498–506, 2016.
review. Also, for this category, considering the reviews that [5] R. He and J. McAuley, “Ups and downs: Modeling the visual evolution
of fashion trends with one-class collaborative filtering,” in proceedings
got more than 10 total votes, there are 60,529 reviews got of the 25th international conference on world wide web. International
100 % votes as a helpful reviews. On the other hand, there World Wide Web Conferences Steering Committee, 2016, pp. 507–517.
are 6,952 reviews got 100 % votes as unhelpful reviews. [6] J. McAuley, C. Targett, Q. Shi, and A. Van Den Hengel, “Image-based
recommendations on styles and substitutes,” in Proceedings of the 38th
International ACM SIGIR Conference on Research and Development in
Information Retrieval. ACM, 2015, pp. 43–52.
[7] Y. Hong, J. Lu, J. Yao, Q. Zhu, and G. Zhou, “What reviews are
satisfactory: novel features for automatic helpfulness voting,” in Pro-
ceedings of the 35th international ACM SIGIR conference on Research
and development in information retrieval. ACM, 2012, pp. 495–504.
[8] X. Qu, X. Li, and J. R. Rose, “Review helpfulness assessment based on
convolutional neural network,” arXiv preprint arXiv:1808.09016, 2018.
[9] L. Martin and P. Pu, “Prediction of helpful reviews using emotions
extraction,” in Twenty-Eighth AAAI Conference on Artificial Intelligence,
2014.
[10] K. R. Scherer, “What are emotions? and how can they be measured?”
Social science information, vol. 44, no. 4, pp. 695–729, 2005.
[11] J. P. Kincaid, R. P. Fishburne Jr, R. L. Rogers, and B. S. Chissom,
“Derivation of new readability formulas (automated readability index,
fog count and flesch reading ease formula) for navy enlisted personnel,”
Fig. 6: Reviews voting distribution for CDs and Vinyl 1975.
category. [12] B. Nguy, “Evaluate helpfulness in amazon reviews using deep learning.”
Stanford University, USA, 2016.
[13] J. Wei, J. Ko, and J. Patel, “Predicting amazon product review helpful-
From the results of the statistical analysis, it can be ness,” IEEE Transactions on Neural Networks, vol. 5, no. 1, pp. 3–14,
observed that on average, approximately 50 % of the total 2016.
559
2019 Sixth International Conference on Social Networks Analysis, Management and Security (SNAMS)
560