JOU Classification of sentiment reviews using n-gram machine learning
JOU Classification of sentiment reviews using n-gram machine learning
a r t i c l e i n f o a b s t r a c t
Article history: With the ever increasing social networking and online marketing sites, the reviews and blogs obtained
Received 3 March 2015 from those, act as an important source for further analysis and improved decision making. These reviews
Revised 15 March 2016
are mostly unstructured by nature and thus, need processing like classification or clustering to provide a
Accepted 16 March 2016
meaningful information for future uses. These reviews and blogs may be classified into different polarity
Available online 24 March 2016
groups such as positive, negative, and neutral in order to extract information from the input dataset.
Keywords: Supervised machine learning methods help to classify these reviews. In this paper, four different machine
Sentiment analysis learning algorithms such as Naive Bayes (NB), Maximum Entropy (ME), Stochastic Gradient Descent (SGD),
Naive Bayes (NB) and Support Vector Machine (SVM) have been considered for classification of human sentiments. The
Maximum Entropy (ME) accuracy of different methods are critically examined in order to access their performance on the basis
Stochastic Gradient Descent (SGD) of parameters such as precision, recall, f-measure, and accuracy.
Support Vector Machine (SVM)
N-gram © 2016 Elsevier Ltd. All rights reserved.
IMDb dataset
http://dx.doi.org/10.1016/j.eswa.2016.03.028
0957-4174/© 2016 Elsevier Ltd. All rights reserved.
118 A. Tripathy et al. / Expert Systems With Applications 57 (2016) 117–126
accuracy. The results obtained in this paper indicate, the higher Dave et. al. have used a tool for synthesizing reviews, then
values of accuracy when compared with studies made by other shifted them and finally sorted them using aggregation sites Dave,
authors. Lawrence, and Pennock (2003). These structured reviews are used
for testing and training. From these reviews features are identified
The structure of the paper is defined as follows: and finally scoring methods are used to determine whether the re-
Section 2 presents literature survey. Section 3, indicates the views are positive or negative. They have used a classifier to clas-
methodology about the classification algorithm and its details. In sify the sentences obtained from web-search through search query
Section 4, the proposed approach is explained. Section 5, indicates using product name as search condition.
the implementation of the proposed approach. In Section 6, per- Matsumoto et.al., have used the syntactic relationship among
formance evaluation of the proposed approach is carried out. The words as a basis of document level sentiment analysis Matsumoto,
last section i.e., Section 7 concludes the paper and presents the Takamura, and Okumura (2005). In their paper, frequent word sub-
scope for future work. sequence and dependency sub-trees are extracted from sentences,
which act as features for SVM algorithm. They extract unigram, bi-
2. Literature survey gram, word subsequence and dependency subtree from each sen-
tences in the dataset. They used two different datasets for conduct-
The literature on sentiment analysis indicates that a good ing the classification i.e., IMDb dataset IMDb (2011) and Polarity
amount of study has been carried out by various authors based dataset Pang and Lee (2004). In case of IMDb dataset, the train-
on document level sentiment classification. ing and testing data are provided separately but in Polarity dataset
10-fold cross validation technique is considered for classification as
2.1. Document level sentiment classification there is no separate data designated for testing or training.
Zhang et.al. have proposed the classification of Chinese com-
Pang et.al., have considered the aspect of sentiment classifica- ments based on word2vec and SVMperf Zhang, Xu, Su, and Xu
tion based on categorization study, with positive and negative sen- (2015). Their approach is based on two parts. In first part, they
timents Pang, Lee, and Vaithyanathan (2002). They have under- have used word2vec tool to cluster similar features in order to
taken the experiment with three different machine learning al- capture the semantic features in selected domain. Then in second
gorithms, such as, NB, SVM, and ME. The classification process is part, the lexicon based and POS based feature selection approach is
undertaken using the n-gram technique like unigram, bigram, and adopted to generate the training data. Word2vec tool adopts Con-
combination of both unigram and bigram. They have used bag-of- tinuous Bag-of-Words (CBOW) model and continuous skip-gram
word features framework to implement the machine learning al- model to learn the vector representation of words Mikolov, Chen,
gorithms. As per their analysis, NB algorithm shows poor result Corrado, and Dean (2013). SVMperf is an implementation of SVM
among the three algorithms and SVM algorithm yields the result for multi-variate performance measures, which follows an alterna-
in a more convincing manner. tive structural formulation of SVM optimization problem for binary
Salvetti et.al., have discussed on Overall Opinion Polarity (OvOp) classification Joachims (2006).
concept using machine learning algorithms such as NB and Markov Liu and Chen have proposed different multi-label classification
model for classification Salvetti, Lewis, and Reichenbach (2004). In on sentiment classification Liu and Chen (2015). They have used
this paper, the hypernym provided by wordnet and Part Of Speech eleven multilevel classification methods compared on two micro-
(POS) tag acts as lexical filter for classification. Their experiment blog dataset and also eight different evaluation matrices for analy-
shows that the result obtained by wordnet filter is less accurate in sis. Apart from that, they have also used three different sentiment
comparison with that of POS filter. In the field of OvOp, accuracy is dictionary for multi-level classification. According to the authors,
given more importance in comparison with that of recall. In their the multi-label classification process perform the task mainly in
paper, the authors presented a system where they rank reviews two phases i.e., problem transformation and algorithm adapta-
based on function of probability. According to them, their approach tion Zhang and Zhou (2007). In problem transformation phase, the
shows better result in case of web data. problem is transformed into multiple single-label problems. During
Beineke et.al., have used NB model for sentiment classifica- training phase, the system learns from these transformed single la-
tion. They have extracted pair of derived features which are lin- bel data, and in the testing phase, the learned classifier makes pre-
early combinable to predict the sentiment Beineke, Hastie, and diction at a single label and then translates it to multiple labels. In
Vaithyanathan (2004). In order to improve the accuracy result, they algorithm adaption, the data is transformed as per the requirement
have added additional derived features to the model and used la- of the algorithm.
beled data to estimate relative influence. They have followed the Luo et.al., have proposed an approach to convert the text data
approach of Turney which effectively generates a new corpus of la- into low dimension emotional space (ESM) Luo, Zeng, and Duan
bel document from the existing document Turney (2002). This idea (2016). They have annotated small size words, which have definite
allows the system to act as a probability model which is linear in and clear meaning. They have also used Ekman Paul’s research to
logistics scale. The authors have chosen five positive and negative classify the words into six basic categories such as anger, fear, dis-
words as anchor words which produce 25 possible pairs and they gust, sadness, happiness and surprise Ekman and Friesen (1971).
used them for the coefficient estimation. They again have considered two different approaches for assign-
Mullen and Collier have applied SVM algorithm for sentiment ing weight to words by emotional tags. The total weight of all
analysis where values are assigned to few selected words and then emotional tags are calculated and based on these values, the mes-
combined to form a model for classification Mullen and Collier sages are classified into different groups. Although their approach
(2004). Along with this, different classes of features having close- yields reasonably a good result for stock message board, the au-
ness to the topic are assigned with the favorable values which thors claim that it can be applied in any dataset or domain.
help in classification. The authors have presented a comparison of Niu et.al., have proposed a Multi-View Sentiment Analysis
their proposed approach with data, having topic annotation and (MVSA) dataset, including a set of image-text pair with manual
hand annotation. The proposed approach has shown better result annotation collected from Twitter Niu, Zhu, Pang, and El Saddik
in comparison with that of topic annotation where as the results (2016). Their approach of sentiment analysis can be categorized
need further improvement, while comparing with hand annotated into two parts, i.e., lexicon based and statistic learning. In case
data. of lexicon based analysis, a set of opinion words or phrases are
A. Tripathy et al. / Expert Systems With Applications 57 (2016) 117–126 119
Table 1
Comparison of sentiment techniques.
Pang et.al. Pang et al. (2002) Classify the dataset using different machine Naive Bayes (NB), Unigram: SVM (82.9), Bigram: Internet Movie Database
learning algorithms and n-gram model Maximum Entropy (ME), ME (77.4), Unigram + Bigram (IMDb)
Support Vector Machine : SVM (82.7)
(SVM)
Salvetti et.al. Salvetti et al. Accessed overall opinion polarity(OvOp) Naive Bayes (NB) and NB: 79.5, MM: 80.5 Internet Movie Database
(2004) concept using machine learning Markov Model (MM) (IMDb)
algorithms
Beineke et.al. Beineke et al. Linearly combinable paired feature are Naive Bayes NB: 65.9 Internet Movie Database
(2004) used to predict the sentiment (IMDb)
Mullen and Collier Mullen and Values assigned to selected words then Support Vector Machine SVM: 86.0 Internet Movie Database
Collier (2004) combined to form a model for (SVM) (IMDb)
classification
Dave et.al. Dave et al. (2003) Information retrieval techniques used for SVMlite , Machine learning Naive Bayes : 87.0 Dataset from Cnet and
feature retrieval and result of various using Rainbow, Naive Amazon site
metrics are tested Bayes
Matsumoto et.al. Matsumoto Syntactic relationship among words used Support Vector Machine Unigram: 83.7, Bigram: 80.4, Internet Movie Database
et al. (2005) as a basis of document level sentiment (SVM) Unigram+Bigram : 84.6 (IMDb), Polarity dataset
analysis
Zhang et.al. Zhang et al. (2015) Use word2vec to capture similar features SVMperf Lexicon based: 89.95, POS Chinese comments on
then classify reviews using SVMperf based: 90.30 clothing products
Liu and Chen Liu and Chen Used multi-label classification using eleven Eight different evaluation Average highest Precision: 75.5 Dalian University of
(2015) state-of-art multi-label, two micro-blog matrices Technology Sentiment
dataset, and eight different evaluation Dictionary (DUTSD),
matrices on three different sentiment National Taiwan
dictionaries. University Sentiment
Dictionary (NTUSD),
Howset Dictionary (HD)
Luo et.al. Luo et al. (2016) Ekman Paul’s research approach is used to Support Vector Machine SVM: 78.31, NB: 63.28, DT: Stock message text
convert the text into low dimensional (SVM), Naive Bayes (NB), 79.21 data(The Lion forum)
emotional space (ESM), then classify Decision Tree (DT)
them using machine learning techniques
Ekman and Friesen (1971)
Niu et.al. Niu et al. (2016) Used Lexicon based analysis to transform BOW feature with TF and Text: 71.9, Visual Feature: 68.7, Manually annotated Twitter
data into required format and then use TF-IDF approach Multi-view:75.2 data
statistical learning methods to classify
the reviews
Proposed Approach Converting text reviews into numeric Support Vector Machine NB: 86.23, ME: 88.48, SVM: Internet Movie Database
matrices using countvectorizer and (SVM), Naive Bayes (NB), 88.94, SGD: 85.11 (IMDb)
TF-IDF, which then given input to Maximum Entropy (ME),
machine learning algorithms for Stochastic Gradient
classification Descent (SGD), N-gram
considered which have pre-defined sentiment score. While in cation using unigram, bigram, trigram, and their combinations
statistic learning, various machine learning techniques are used for classification of movie reviews.
with dedicated textual features. ii. Also a number of authors have used Part-of-Speech (POS) tags
Table 1 provides a comparative study of different approaches for classification purpose. But it is observed that the POS tag
adopted by different authors, contributed to sentiment classifica- for a word is not fixed and it changes as per the context of
tion. their use. For example, the word ‘book’ can have the POS ‘noun’
when used as reading material where as in case of “ticket book-
ing” the POS is verb. Thus, in order to avoid confusion, instead
2.2. Motivation for proposed approach of using POS as a parameter for classification, the word as a
whole may be considered for classification.
The above mentioned literature survey helps to identify some iii. Most of the machine learning algorithms work on the data rep-
possible research areas which can be extended further. The follow- resented as matrix of numbers. But the sentiment data are al-
ing aspects have been considered for carrying out further research. ways in text format. Therefore, it needs to be converted to
number matrix. Different authors have considered TF or TF-IDF
i. Most of the authors apart from Pang et al. (2002) and to convert the text into matrix on numbers. But in this paper,
Matsumoto et al. (2005), have used unigram approach to clas- in order to convert the text data into matrix of numbers, the
sify the reviews. This approach provides comparatively better combination of TF-IDF and CountVectorizer have been applied.
result, but fail in some cases. The comment “The item is not The rows of the matrix of numbers represents a particular text
good,” when analyzed using unigram approach, provides the file where as its column represent each word / feature present
polarity of sentence as neutral with the presence of one pos- in that respective file which is shown in Table 3.
itive polarity word ‘good’ and one negative polarity word ‘not’.
But when the statement is analyzed using bigram approach, it 3. Methodology
gives the polarity of sentence as negative due to the presence
of words ‘not good’, which is correct. Therefore, when a higher Classification of sentiments may be categorized into two types,
level of n-gram is considered, the result is expected to be bet- i.e., binary sentiment classification and multi-class sentiment clas-
ter. Thus, analyzing the research outcome of several authors, sification Tang, Tan, and Cheng (2009). In binary classification type,
this study makes an attempt to extend the sentiment classifi- each document di in D, where D = {d1 , ......, dn } is classified as a la-
120 A. Tripathy et al. / Expert Systems With Applications 57 (2016) 117–126
bel C, where C = {Positive, Negative} is a predefined category set. value in terms of exponential function can be expressed as
In multi class sentiment analysis, each document di is classified as
1
a label in C∗ , where C∗ = {strong positive, positive, neural, nega- PME (c|d ) = exp( λi,c fi,c (d, c )) (2)
Z (d )
tive, strong negative}. It is observed in the literature survey, that a i
good number of authors have applied binary classification method where PME (c|d) refers to probability of document ‘d’ belonging
for sentiment analysis. to class ‘c’, fi, c (d, c) is the feature / class function for feature fi
The movie reviews provided by the reviewers are mainly in and class c, λi, c is the parameter to be estimated and Z(d) is
text format; but for classification of sentiment of the reviews using the normalizing factor.
the machine learning algorithms, numerical matrices are required. In order to use ME, a set of features is needed to be selected.
Thus, the task of conversion of text data in reviews into numerical For text classification purpose, word counts are considered as
matrices are carried out using different methods such as features. Feature / class function can be instantiated as follows:
• CountVectorizer: It converts the text document collection into
0 i f c = c
a matrix of integers Garreta and Moncecchi (2013). This method fi,c (d, c ) = N (d,i ) (3)
N (d )
otherwise
helps to generate a sparse matrix of the counts.
• Term frequency - Inverse document frequency (TF-IDF): It re-
where fi,c (d, c ) refers to features in word-class combination in
flects the importance of a word in the corpus or the collection
class ‘c’ and document ‘d’, N(d, i) represents the occurrence of
Garreta and Moncecchi (2013). TF-IDF value increases with in-
feature ‘i’ in document ‘d’ and ‘N(d)’ number of words in ‘d’.
crease in frequency of a particular word in the document. In or-
As per the expression, if a word occurs frequently in a class,
der to control the generality of more common words, the term
the weight of word-class pair becomes higher in comparison to
frequency is offset by the frequency of words in corpus. Term
other pairs. These highest frequency word-class pairs are con-
frequency is the number of times a particular term appears in
sidered for classification purpose.
the text. Inverse document frequency measures the occurrence
iii. Stochastic gradient descent (SGD) method: This method is
of any word in all documents.
used when the training data size is observed to be large. In
SGD method instead of computing the gradient, each iteration
In this paper, the combination of methods i.e., CountVectorizer
estimates the value of gradient on the basis of single randomly
and TF-IDF have been applied to transform the text document into
picked example considered by Bottou (2012).
a numerical vector, which is then considered as input to supervised
machine learning algorithm. wt+1 = wt − γt ∇w Q (zt , wt ) (4)
Table 2
Confusion matrix. Dataset
Correct labels
Positive Negative
Preprocessing : Stop word,
Positive TP (True positive) FP (False positive)
Negative FN (False negative) TN (True negative) Numeric and special
character removal
Table 3
Matrix generated under CountVectorizer scheme.
good’,‘a good one”’ where a set of words having count equal recall, which have more influence on final result.
to three is considered.
2 ∗ P recision ∗ Recall
F − Measure = (8)
3.2. Performance evaluation parameters P recision + Recall
• Accuracy: It is the most common measure of classification pro-
The parameters helpful to evaluate performance of supervised cess. It can be calculated as the ratio of correctly classified ex-
machine learning algorithm is based on the element from a ma- ample to total number of examples.
trix known as confusion matrix or contingency table. It is used TP + TN
in supervised machine learning algorithm to help in assessing Accuracy = (9)
TP + TN + FP + FN
performance of any algorithm. From classification point of view,
terms such as “True Positive(TP)”, “False Positive (FP)”, “True Nega- 3.3. Dataset used
tive(TN)”, “False Negative (FP)” are used to compare label of classes
in this matrix as shown in Table 2 Mouthami, Devi, and Bhaskaran In this paper, the acl Internet Movie Database (IMDb) dataset
(2013). True Positive represents the number of reviews those are is considered for sentiment analysis IMDb (2011). It consists of
positive and also classified as positive by the classifier, where as 12,500 positively labeled test reviews, and 12,500 positively la-
False Positive indicates positive reviews, but classifier does not beled train reviews. Similarly, there are 12,500 negative labeled
classify it as positive. Similarly, True Negative represents the re- test reviews, 12,500 negative labeled train reviews. Apart from la-
views which are negative also classified as negative by the clas- beled supervised data, an unsupervised dataset is also present with
sifier, where as False Negative are negative reviews but classifier 50,0 0 0 unlabeled reviews.
does not classify it as negative.
Based on the values obtained from confusion matrix, other pa- 4. Proposed-approach
rameters such as “precision”, “recall”, “f-measure”, and “accuracy”
are found out for evaluating performance of any classifier. The reviews of IMDb dataset is processed to remove the stop
• Precision: It measures the exactness of the classifier result. It is words and unwanted information from dataset. The textual data is
the ratio of number of examples correctly labeled as positive to then transformed to a matrix of number using vectorization tech-
total number of positively classified example. niques. Further, training of the dataset is carried out using machine
learning algorithm. Steps of the approach is discussed in Fig. 1.
TP
P recision = (6)
TP + FP Step 1: The aclIMDb dataset consisting of 12,500 positive and
• Recall: It measures the completeness of the classifier result. It is 12,500 negative review for training and also 12,500 posi-
the ratio of total number of positively labeled example to total tive and 12,500 negative reviews for testing IMDb (2011),
examples which are truly positive. is taken into consideration.
Step 2: The text reviews sometimes consist of absurd data, which
TP need to be removed, before considered for classification.
Recall = (7)
TP + FN The usually identified absurd data are:
• F-Measure: It is the harmonic mean of precision and recall. It • Stop words: They do not play any role in determining
• Numeric and special character: In the text reviews, it is tor of one class from other classes, where the sepa-
often observed that there are different numeric (1,2,...5 ration is maintained to be large as possible Hsu et al.
etc.) and special characters (@, #, $,% etc.) present, (2003).
which do not have any effect on the analysis. But they Step 5: As mentioned in step 1, the movie reviews of acl IMDb
often create confusion during conversion of text file to dataset is considered for analysis, using the machine learn-
numeric vector. ing algorithms discussed in step 4. Then different variation
Step 3: After the preprocessing of text reviews, they need to be of the n-gram methods i.e., unigram, bigram, trigram, uni-
converted to a matrix of numeric vectors. The following gram + bigram, unigram + trigram, and unigram + bigram
methodologies are considered for conversion of text file to + trigram are applied to obtain the result which is shown
numeric vectors: in Section 5.
• CountVectorizer: It converts the text reviews into a ma- Step 6: The results obtained from this analysis are compared
trix of token counts. It implements both tokenization with the results available in other literatures is shown in
and occurrence counting. The output matrix obtained Section 6.
after this process is a sparse matrix. This process of
conversion may be explained using the following exam-
ple: 5. Implementation
• Calculation of CountVectorizer Matrix: An xam-
ple is considered to explain the steps of calculat- • Application of NB method: The confusion matrix and various
ing elements of the matrix Garreta and Moncecchi evaluation parameters such as precision, recall, f-measure, and
(2013) which helps in improving the understand- accuracy values obtained after classification using NB n-gram
ability. Suppose, three different documents contain- techniques are shown in Table 4.
ing following sentences are taken for analysis: As shown in Table 4, it can be analyzed that the accuracy value
Sentence 1: “Movie is nice”. obtained using bigram is better than value obtained using tech-
Sentence 2: “Movie is Awful”. niques such as unigram and trigram. NB method is a prob-
Sentence 3: “Movie is fine”. abilistic method, where the features are independent of each
A matrix may be formed with different values for its other. Hence, when analysis is carried out using “single word
elements size 4∗ 6, as there exists 3 documents and (unigram)” and “double word (bigram)”, the accuracy value ob-
5 distinct features. In the matrix given in Table 3, tained is comparatively better than that obtained using trigram.
the elements are assigned with value of ‘1’, if the But when ‘triple word (trigram)’ is being considered for analy-
feature is present or else in case of the absence of sis of features, words are repeated a number of times; thus, it
any feature, the element is assigned with value ‘0’. affects the probability of the document. For example: for the
• TF-IDF: It suggests the importance of the word to the statement “it is not a bad movie”, the trigram “it is not”, and
document and whole corpus. Term frequency informs “is not a” show negative polarity, where as the sentence repre-
about the frequency of a word in a document and IDF sents positive sentiment. Thus, the accuracy of classification de-
informs about the frequency of the particular word in creases. Again, when the trigram model is combined with un-
whole corpus Garreta and Moncecchi (2013). igram or bigram or unigram + bigram, the impact of trigram
• Calculation of TF-IDF value: An example may be makes the accuracy value comparatively low.
considered to improve understandability. If a movie • Application of ME method: The confusion matrix and evalua-
review contains 10 0 0 words wherein the word tion parameters such as precision, recall, f-measure, and accu-
“Awesome” appears 10 times. The term frequency racy values obtained after classification using ME n-gram tech-
(i.e., TF) value for the word “Awesome” may be niques are shown in Table 5.
found as 10/10 0 0 = 0.01. Again, suppose there are As represented in the Table 5, it may be analyzed that the ac-
1 million reviews in the corpus and the word “Awe- curacy value obtained using unigram is better than that of bi-
some” appears 10 0 0 times in whole corpus. Then, gram and trigram. As ME algorithm based on conditional distri-
the inverse document frequency (i.e., IDF) value is bution and word-class pair help to classify the review, unigram
calculated as log(1,0 0 0,0 0 0/1,0 0 0) = 3. Thus, the TF- method which considers single word for analysis, provides best
IDF value is calculated as 0.01 ∗ 3 = 0.03. result in comparison with other methods. In both bigram and
Step 4: After the text reviews are converted to matrix of numbers, trigram methods, the negative or positive polarity word appears
these matrices are considered as input for the following more than once; thus, affecting the classification result. The bi-
four different supervised machine learning algorithms for gram and trigram methods when combined with unigram and
classification purpose. between themselves, the accuracy values of various combina-
• NB method: Using probabilistic classifier and pattern tions are observed to be low.
learning, the set of documents are classified McCallum • Application of SVM method: The confusion matrix and evalu-
et al. (1998). ation parameters such as precision, recall, f-measure, and accu-
• ME method: The training data are used to set constraint racy values obtained after classification using SVM n-gram tech-
on conditional distribution Nigam et al. (1999). Each niques are shown in Table 6.
constraint is used to express characteristics of training As exhibit in Table 6, it may be analyzed that the accuracy
the data. These constraints then are used for testing the value obtained using unigram is better than the value ob-
data. tained using bigram and trigram. As SVM method is a non-
• SGD method: SGD method is used when the training probabilistic linear classifier and trains model to find hyper-
data size is mostly large in nature. Each iteration es- plane in order to separate the dataset, the unigram model
timates the gradient on the basis of single randomly which analyzes single words for analysis gives better result. In
picked example Bottou (2012). bigram and trigram, there exists multiple word combinations,
• SVM method: Data are analyzed and decision bound- which, when plotted in a particular hyperplane, confuses the
aries are defined by having hyper planes. In two cate- classifier and thus, it provides a less accurate result in compar-
gory case, the hyper plane separates the document vec- ison with the value obtained using unigram. Thus, the less ac-
A. Tripathy et al. / Expert Systems With Applications 57 (2016) 117–126 123
Table 4
Confusion matrix, evaluation parameter and accuracy for naive bayes n-gram classifier.
Table 5
Confusion matrix, evaluation parameter and accuracy for maximum entropy n-gram classifier.
curate bigram and trigram, when combined with unigram and 6. Performance evaluation
with each other also, provide a less accurate result.
• Application of SGD method: The confusion matrix and evalua- The comparative analysis based on results obtained using pro-
tion parameters such as precision, recall, f-measure, and accu- posed approach to that of other literatures using IMDb dataset and
racy values obtained after classification using SGD n-gram tech- n-gram approaches are shown in Table 8.
niques are shown in Table 7. Pang et. al., have used machine learning algorithm viz., NB, ME
method, and SVM method using n-gram approach of unigram, bi-
As illustrate in Table 7, it can be analyzed that the accuracy ob- gram and combination of unigram and bigram. Salvetti et.al. and
tained using unigram is better than that of bigram and trigram. In Beineke et.al. have implemented the NB method for classification;
SDG method, the gradient is estimated on single randomly picked but only the unigram approach is used for classification. Mullen
reviews using learning rate to minimize the risk. In unigram, a sin- and Collier, have proposed SVM method for classification; with un-
gle word is randomly picked to analyze, but in bigram and trigram igram approach only. Matsumoto et.al. have also implemented the
both the combination of the words adds noise, which reduces the SVM for classification and used the unigram, bigram, and combi-
value of accuracy. Thus, when the bigram and trigram model is nation of both i.e., unigram and bigram for classification.
combined with other model, their less accuracy value affects the
accuracy of the total system.
124 A. Tripathy et al. / Expert Systems With Applications 57 (2016) 117–126
Table 6
Confusion matrix, evaluation parameter and accuracy for support vector machine n-gram classifier.
Table 7
Confusion matrix, Evaluation parameter and accuracy for stochastic gradient descent n-gram classifier.
In this present paper, four different algorithms viz., NB, they have bought. But now-a-days people share those views
ME method, SVM, and SGD using n-gram approaches like uni- through reviews or blogs.
gram, bigram, trigram, unigram+bigram, bigram+trigram, and un- • The reviews can be collected and given input to the proposed
igram+bigram+trigram are carried out. Result obtained in the approach for qualitative decisions.
present approach is observed to be better than the result available • The proposed approach classifies the reviews into either pos-
in the literature where both IMDb dataset and n-gram approach itive or negative polarity; hence is able to guide the managers
are used. properly by informing them about the shortcoming or good fea-
tures of the product which they need to incorporate, to sustain
the market competition.
6.1. Managerial insights based on result
The managerial insight based on the obtained result can be ex- 7. Conclusion and future work
plained as follows:
This paper makes an attempt to classify movie reviews us-
• It was almost an observed practice that, sellers send question- ing various supervised machine learning algorithms, such as Naive
naires to the customers, about the feed back of the product Bayes (NB), Maximum Entropy (ME), Stochastic Gradient De-
A. Tripathy et al. / Expert Systems With Applications 57 (2016) 117–126 125
Table 8
Comparative result of values on “Accuracy” result obtained with different literature using IMDb Dataset and ngram approach.
Method Pang et.al. Salvetti et.al. Beineke et.al. Mullen & Collier Matsumoto et.al. Proposed approach
indicate that the algorithm is not considered by the author in their respective paper
scent(SGD), and Support Vector machine (SVM). These algorithms ment. All of above mentioned limitations may be considered for
are further applied using n-gram approach on IMDb dataset. It is the future work, in order to improve the quality of sentiment clas-
observed that as the value of ‘n’ in n-gram increases the classifi- sification.
cation accuracy decreases i.e., for unigram and bigram, the result
obtained using the algorithm is remarkably better; but when tri- References
gram, four-gram, five-gram classification are carried out, the value
of accuracy decreases. Beineke, P., Hastie, T., & Vaithyanathan, S. (2004). The sentimental factor: improving
review classification via human-provided information. In Proceedings of the 42nd
As discussed in Section 2.2, instead of using unigram and POS
annual meeting on association for computational linguistics (p. 263). Association
tag, the use of unigram, bigram, trigram, and their combination for Computational Linguistics.
have shown a better result. Again, use of TF-IDF and CountVector- Bottou, L. (2012). Stochastic gradient descent tricks. In Neural networks: tricks of the
trade (pp. 421–436). Springer.
izer techniques as a combination for converting the text into ma-
Dave, K., Lawrence, S., & Pennock, D. M. (2003). Mining the peanut gallery: opinion
trix of numbers also help to obtain the value of accuracy in an extraction and semantic classification of product reviews. In Proceedings of the
improved manner, when machine learning techniques are used. 12th international conference on World Wide Web (pp. 519–528). ACM.
The present study has also some limitations as mentioned be- Ekman, P., & Friesen, W. V. (1971). Constants across cultures in the face and emo-
tion.. Journal of Personality and Social Psychology, 17(2), 124.
low: Feldman, R. (2013). Techniques and applications for sentiment analysis. Communica-
tions of the ACM, 56(4), 82–89.
• The Twitter comments are mostly small in size. Thus, the pro- Garreta, R., & Moncecchi, G. (2013). Learning scikit-learn: machine learning in python.
posed approach may have some issues while considering these Berlin Heidelberg: Packt Publishing Ltd.
reviews. Gautam, G., & Yadav, D. (2014). Sentiment analysis of twitter data using machine
learning approaches and semantic analysis. In Contemporary computing (IC3),
• Different reviews or comments contain symbols like ( , , , 2014 seventh international conference on (pp. 437–442). IEEE.
Hastie, T., Tibshirani, R., & Friedman, J. (2009). Unsupervised learning. New York:
Springer.
Hsu, C. W., Chang, C. C., & Lin, C. J. (2003). A practical guide to support vector clas-
) which help in presenting the sentiment, but these being sification. Simon Fraser University, 8888 University Drive, Burnaby BC, Canada,
V5A 1S6.
images are not taken into consideration in this study for analy- IMDb, Internet movie database sentiment analysis dataset (IMDB), 2011,
sis. Joachims, T. (2006). Training linear svms in linear time. In Proceedings of the 12th
• In order to give stress on a word, it is observed that some per- ACM SIGKDD international conference on knowledge discovery and data mining
(pp. 217–226). ACM.
sons often repeat the last character of the word a number of Liu, S. M., & Chen, J.-H. (2015). A multi-label classification based approach for sen-
times such as “greatttt, Fineee”. These words do not have a timent classification. Expert Systems with Applications, 42(3), 1083–1093.
proper meaning; but they may be considered and further pro- Liu, B. (2012). Sentiment analysis and opinion mining. Synthesis Lectures on Human
Language Technologies, 5(1), 1–167.
cessed to identify sentiment. However, this aspect is also not
Luo, B., Zeng, J., & Duan, J. (2016). Emotion space model for classifying opinions in
considered in this paper. stock message board. Expert Systems with Applications, 44, 138–146.
Matsumoto, S., Takamura, H., & Okumura, M. (2005). Sentiment classification using
In this paper, after removal of stop words, other words are con- word sub-sequences and dependency sub-trees. In Advances in knowledge dis-
sidered for classification. The list of words finally obtained are ob- covery and data mining (pp. 301–311). Berlin Heidelberg: Springer.
McCallum, A., Nigam, K., et al. (1998). A comparison of event models for naive bayes
served to be very large in a good number of cases; thus in future,
text classification. In AAAI-98 workshop on learning for text categorization: 752
different feature selection mechanism may be identified to select (pp. 41–48). Citeseer.
the best features from the set of features and based on which, the Mikolov, T., Chen, K., Corrado, G., & Dean, J. (2013). Efficient estimation of word
classification process may be carried out. It may also happen that representations in vector space. arXiv preprint arXiv:1301.3781.
Mouthami, K., Devi, K. N., & Bhaskaran, V. M. (2013). Sentiment analysis and clas-
the accuracy value may improve, if some of the hybrid machine sification based on textual reviews. In Information communication and embedded
learning techniques are considered for classification of the senti- systems (ICICES), 2013 international conference on (pp. 271–276). IEEE.
126 A. Tripathy et al. / Expert Systems With Applications 57 (2016) 117–126
Mullen, T., & Collier, N. (2004). Sentiment analysis using support vector machines Salvetti, F., Lewis, S., & Reichenbach, C. (2004). Automatic opinion polarity classifi-
with diverse information sources. In EMNLP: 4 (pp. 412–418). cation of movie. Colorado research in linguistics, 17, 2.
Nigam, K., Lafferty, J., & McCallum, A. (1999). Using maximum entropy for text clas- Tang, H., Tan, S., & Cheng, X. (2009). A survey on sentiment detection of reviews.
sification. In IJCAI-99 workshop on machine learning for information filtering: 1 Expert Systems with Applications, 36(7), 10760–10773.
(pp. 61–67). Turney, P. D. (2002). Thumbs up or thumbs down?: semantic orientation applied to
Niu, T., Zhu, S., Pang, L., & El Saddik, A. (2016). Sentiment analysis on multi-view unsupervised classification of reviews. In Proceedings of the 40th annual meeting
social data. In Multimedia modeling (pp. 15–27). Springer. on association for computational linguistics (pp. 417–424). Association for Com-
Pang, B., & Lee, L. (2004). A sentimental education: sentiment analysis using subjec- putational Linguistics.
tivity summarization based on minimum cuts. In Proceedings of the 42nd annual Zhang, D., Xu, H., Su, Z., & Xu, Y. (2015). Chinese comments sentiment classifica-
meeting on association for computational linguistics (p. 271). Association for Com- tion based on word2vec and svm perf. Expert Systems with Applications, 42(4),
putational Linguistics. 1857–1863.
Pang, B., Lee, L., & Vaithyanathan, S. (2002). Thumbs up?: sentiment classification Zhang, M.-L., & Zhou, Z.-H. (2007). Ml-knn: a lazy learning approach to multi-label
using machine learning techniques. In Proceedings of the ACL-02 conference on learning. Pattern recognition, 40(7), 2038–2048.
Empirical methods in natural language processing-Volume 10 (pp. 79–86). Associ-
ation for Computational Linguistics.