(2018) Cross-Domain Aspect Extraction For Sentiment Analysis
(2018) Cross-Domain Aspect Extraction For Sentiment Analysis
PII: S0167-9236(18)30138-6
DOI: doi:10.1016/j.dss.2018.08.009
Reference: DECSUP 12983
To appear in: Decision Support Systems
Received date: 10 April 2018
Revised date: 27 July 2018
Accepted date: 21 August 2018
Please cite this article as: Ricardo Marcondes Marcacini, Rafael Geraldeli Rossi, Ivone
Penque Matsuno, Solange Oliveira Rezende , Cross-domain aspect extraction for
sentiment analysis: a transductive learning approach. Decsup (2018), doi:10.1016/
j.dss.2018.08.009
This is a PDF file of an unedited manuscript that has been accepted for publication. As
a service to our customers we are providing this early version of the manuscript. The
manuscript will undergo copyediting, typesetting, and review of the resulting proof before
it is published in its final form. Please note that during the production process errors may
be discovered which could affect the content, and all legal disclaimers that apply to the
journal pertain.
ACCEPTED MANUSCRIPT
T
a Institute
of Mathematics and Computer Sciences (ICMC)
University of São Paulo (USP)
IP
Av. Trabalhador São Carlense, 400, 13566-590, São Carlos, SP, Brazil
b Federal University of Mato Grosso do Sul (UFMS)
Av. Ranulpho Marques Leal, 3484, 79613-000, Três Lagoas, MS, Brazil
CR
US
Abstract
support the aspect extraction of another domain where there are no labeled
aspects. Existing cross-domain transfer learning approaches learn classifiers
PT
from labeled aspects in the source domain and then apply these classifiers in
the target domain, i.e., two separate stages that may cause inconsistency due
CE
∗ Corresponding author
Email addresses: [email protected] (Ricardo Marcondes Marcacini),
[email protected] (Rafael Geraldeli Rossi), [email protected] (Ivone Penque
Matsuno), [email protected] (Solange Oliveira Rezende)
extraction from heterogeneous networks, where the linguistic features are used
as a bridge for this propagation. Our algorithm is based on a transductive
learning process, where we explore both labeled and unlabeled aspects during
the label propagation. Experimental results show that the CD-ALPHN out-
performs the state-of-the-art methods in scenarios where there is a high-level
T
of inconsistency between the source and target domains — the most common
IP
scenario in real-world applications.
CR
Keywords: Cross Domain, Opinion Mining, Aspect Extraction
1. Introduction
US
Opinion Mining and Sentiment Analysis have become very popular for au-
tomating the knowledge extraction from user reviews about products and ser-
AN
vices (Liu, 2012; Cambria, 2016; Breck & Cardie, 2017). In general, the goal
is to determine the consumer’s opinion on some topic through the sentiment
M
polarity expressed in textual reviews (e.g. positive, negative or neutral) (Liu &
Zhang, 2012). While the first studies on sentiment analysis attempted to extract
ED
the polarity from the entire text review (document-level sentiment analysis) or
from the text sentences (sentence-level sentiment analysis), more recent studies
investigate aspect-based sentiment analysis (ABSA) (Feldman, 2013; Lau et al.,
PT
2014; Schouten & Frasincar, 2016; Akhtar et al., 2017). In this case, consumer
reviews are analyzed at a high level of detail, where the opinion about each
CE
more complex (Mukherjee & Liu, 2012; Wang et al., 2014; Matsuno et al., 2017)
because we need to deal with a challenging question during the sentiment analy-
sis process: how to automatically extract the aspects of each product or service
from consumer reviews?
Existing works for aspect extraction are based on linguistic features patterns
(Rana & Cheah, 2016), where natural language processing techniques are used
to identify grammatical classes and syntactic relations of the text reviews. In
2
ACCEPTED MANUSCRIPT
this case, a set of previously labeled aspects (training set) is used in a super-
vised inductive learning task, which induces a classification model considering
only labeled examples. Next, aspects are extracted from the new text reviews
according to the linguistic features presented to the classifier. For example, a
supervised rule-based classifier that learned the following pattern “IF a word h
T
is a noun AND h is in a relationship with a verb t THEN h is extracted as an
IP
aspect.” is able to identify the aspect called “camera” from the review presented
CR
in Figure 1, where h=“camera” and t=“is”. The relationship between h and t is
defined by the adjective “nice”.
US
AN
Figure 1: Part-of-Speech (PoS) and Syntactic Relations extracted from the sentence “The
camera is nice”.
M
1999). Transductive learning directly classifies unlabeled data and make use of
this known unlabeled data to improve classification performance (Kong et al.,
AC
2013).
Another challenge related to the aspect extraction for sentiment analysis is
to deal with specific characteristics and limitations of the application domain
(Duric & Song, 2012; Deng et al., 2017). For example, reviews collected from so-
cial networks may contain short texts with informal language (e.g. neologisms,
slang, and jargon terms), while specialized forums may contain more technical
language and longer texts. Therefore, an important research question is how
3
ACCEPTED MANUSCRIPT
can we effectively exploit the already labeled aspects in some domain to aid
the aspect extraction in another domain? Studies addressing this question are
known as Cross-Domain Transfer Learning and have been reported as a promis-
ing solution to problems where frequent data labeling is impossible or expensive
(Pan & Yang, 2010; Schouten & Frasincar, 2016; Zhang et al., 2016; Rana &
T
Cheah, 2017).
IP
The problems mentioned above can be attenuated with the use of trans-
CR
ductive semi-supervised learning. While supervised inductive learning obtains a
classification model to classify unknown examples with possibly different feature
space and distributions, transductive semi-supervised learning already knows
US
the predictive space and can make use of it to improve the classification perfor-
mance. Besides, when cross-domain transfer learning is applied, usually the ex-
AN
amples to be classified are already collected. Thus, transductive semi-supervised
learning is more adequate for this type of situation. However, it is necessary to
deal with two important research questions for transductive learning in trans-
M
fer learning scenario: (1) How to properly structure information from different
domains into a unified representation? (2) How to effectively exploit this uni-
ED
fied representation to transfer knowledge from the source domain to the target
domain? These questions motivated us to investigate and propose a specific
PT
proaches learn classifiers from labeled aspects in the source domain and then
apply these classifiers in the target domain, i.e., two separate stages that may
cause inconsistency due to different feature spaces or different marginal prob-
ability distributions, we propose a unified transductive learning process from
both source and target feature spaces. The main contributions are two-fold:
4
ACCEPTED MANUSCRIPT
T
knowledge from different domains for cross-domain transfer learning.
IP
• We present a transductive learning algorithm for heterogeneous networks,
CR
where labeled and unlabeled nodes are exploited in a label propagation
process. In this case, the source domain label information (labeled as-
US
pects) are propagated to the linguistic features nodes. Next, linguistic
feature labels are propagated to the target domain (aspect candidates
nodes) — which then propagates label information back to the network.
AN
This propagation process continues until the convergence, i.e. when the
label information of the network nodes are not changed significantly. Thus,
M
(SVM). Experimental evaluation results show that our approach is very com-
petitive, outperforming MLP and SVM methods in scenarios where there is a
high-level of inconsistency between the source and target domains — the most
AC
5
ACCEPTED MANUSCRIPT
T
IP
2. Related Work
CR
Cross-Domain Transfer Learning aims to utilize labeled data from other
domains to help current learning task (Pan & Yang, 2010). The domain with
US
labeled data is called the source domain, while the domain without labeled data
is called the target domain. In real-world applications, different domains may
have different feature spaces as well as different underlying data distributions —
AN
thereby requiring appropriate learning approaches to address these drawbacks
(Long et al., 2014a).
M
Most of the existing studies for cross-domain transfer learning are known as
inductive transfer learning (Pan & Yang, 2010; Lu et al., 2015). In this case,
ED
the goal is to identify features that are useful in both the target domain and the
source domain, thereby obtaining a shared feature space for the cross-domain
transfer learning problem. In some studies, the best features for both source and
PT
target domains are selected and reweighted to improve the final classification
(Chen et al., 2014; Lu et al., 2015). Inductive transfer learning is also presented
CE
as a two-stage cross-domain transfer learning (Wu & Tan, 2011; Rana & Cheah,
2017). In the first stage, a set of features that are common to both domains is
AC
extracted, and in the second stage, useful features specific to the target domain
are selected to learn a classifier (Wu & Tan, 2011; Rana & Cheah, 2017).
Different strategies to learn classifier models have been proposed for cross-
domain transfer learning (Lu et al., 2015). The most common strategy (with
promising results) is to train well-known classifiers such as Multilayer Percep-
tron, SVM, Naive-Bayes, and kNN, considering the feature space that is common
to both domains. Other strategies are based on the consensus of several classi-
6
ACCEPTED MANUSCRIPT
fiers with different data sampling from both the source and target domain (Luo
et al., 2008; Zhuang et al., 2010). Moreover, there are strategies that use active
learning to improve the transfer learning process, i.e., they require human feed-
back when a set of classifiers disagree about the class of some instance in the
target domain (Li et al., 2013; Wu et al., 2017).
T
The vast majority of existing studies focus primarily on the transfer of senti-
IP
ment polarity between different domains, while the use of cross-domain transfer
CR
learning for aspect extraction is underexplored (Al-Moslmi et al., 2017). Some
recent initiatives explore linguistic features extracted from the aspects to obtain
a common feature space between the two domains (Zhang et al., 2016; Rana &
US
Cheah, 2017). However, these studies also use the two-stage inductive learning
transfer approach, which works well only under the following assumption: the
AN
source and target domain data are drawn from the same feature space and the
same distribution (Pan & Yang, 2010). In other words, inductive learning ap-
proaches will fail when there is a high level of inconsistency between the source
M
and target domains — which is the most common case in real-world applications
(Pan & Yang, 2010; Long et al., 2015).
ED
more satisfactory results (Rohrbach et al., 2013; Long et al., 2014b; Chang et al.,
2017). Thus, graph nodes represent examples of both source and target domain
CE
and the edges represent the relations between examples. Some nodes are labeled
and the learning process involves identifying the label of unlabeled nodes ac-
cording to the topological properties of the graph. Both the graph construction
AC
and the graph-based learning algorithm are challenging tasks, especially in rela-
tion to the scalability problem of the graph-based transductive learning (Ryan
& Michailidis, 2017). For example, a common technique for graph building is
via nearest neighbor-based algorithm, which presents quadratic complexity of
time and space.
Transductive learning is a promising approach that explores both labeled and
unlabeled data (e.g. source and target domains) during the training process and
7
ACCEPTED MANUSCRIPT
T
tive learning assigns weights or relevance scores to examples for each one of the
IP
classes and the examples are classified considering these weights.
CR
It is worth mentioning that transductive learning has shown to be helpful
for cross-domain transfer learning problems (Wu & Tan, 2011; Al-Moslmi et al.,
2017; Chang et al., 2017; Ryan & Michailidis, 2017). This observation has
US
motivated us to employ related approaches for aspect extraction in sentiment
analysis, which in this particular setting is a challenge not addressed in the
AN
literature.
Networks
ED
sentation between the source and target domains. In the second step, we discuss
our proposed algorithm for transductive learning, which uses label propagation
CE
8
ACCEPTED MANUSCRIPT
T
a heterogeneous network, i.e., a network composed by different types of nodes
IP
and relations. Formally, our proposed network-based representation is defined
CR
by G = (V, E, W ), where V represents the set of nodes, E represents the edges
connecting the nodes, and W represents the weights of the edges. The set V
is composed of aspects from the source domain (AS ), aspects from the target
US
domain (AT ) and linguistic features L extracted by a Part-of-Speech (PoS)
process as shown in Figure 1. Thus, V = {AS ∪ AT ∪ L}.
AN
Figure 2 illustrates the general scheme of the heterogeneous network pro-
posed in this work. The set of edges E connect the linguistic features L to the
nodes in sets AS and AT . Thus, if two aspects participate in the same syntactic
M
relationship, then the two nodes that represent these aspects will be connected
to the node representing the linguistic feature of this syntactic relation. The
ED
words of an aspect usually appear in many text reviews, and thus the set W
indicates the weight of the edges — calculated by the frequency of occurrence
PT
between the (candidate) aspect nodes and the linguistic feature nodes. Lin-
guistic features are common to both source and target domains and can be
CE
9
ACCEPTED MANUSCRIPT
Source Domain
Labeled Aspects
T
Aspect = NO
Bridge
Unlabeled Aspects
IP
Linguistic Features
CR
Target Domain
Candidate Aspects
US
Figure 2: A general scheme of the heterogeneous network proposed for a unified representation
of the feature spaces between the source domain (labeled aspect nodes) and the target domain
AN
(candidate aspect nodes), in which linguistic features are used as bridge nodes.
domain. Rectangular forms represent the terms, and a circular form represents
the linguist features. Linguistic features are composed of the PoS tag, type of
PT
(noun - NN) contains the input relation nsubj; and the term nice (adjective -
JJ) contains the output relation nsubj. Thus, two nodes with their respective
linguistic features are generated: NN:subj:in and JJ:subj:out. Also in the
AC
same example, we can notice that the weight of the edge between the term the
and the linguistic feature DT:det:in is 2 since the term the occurs two times
in Sentence #1 and in both time belongs to the relation DT:det:in.
In Table 2 we present four sentences about wine reviews that we consider
as target domain in order to build a complete network considering aspect and
non-aspects from the source domain, aspect candidates from the target domain,
10
ACCEPTED MANUSCRIPT
Table 1: Examples of the network generations for each review about computers - domain
source.
ID Sent. Sentence - Dependencies - Network
T
1
IP
the battery of computer is nice
2 1 1 1 1 1 1 1 1
CR
DT: NN: NN: IN: NN: DT: NN: VBZ: JJ:
det: det: nsubj: case: nmod: det: case: cop: nsubj:
in out in in in out out in out
2
the video
US
resolution is low
AN
1 1 1 1 1 1 1
and linguistic features from both domains. The complete network with labeled
ED
Labeled aspects
from the source resolution battery video computer is is low nice
domain
CE
(computers)
Linguistic features
from both domain
and target source NN: NN: NN: NN: NN: NN: VBZ: JJ: JJ: JJ: JJ: JJ:
nsubj: det: compound: compound: nmod: case: cop: nsubj: cop: advmod: amod: neg:
in out out in in out in out out out in out
AC
Aspect candidates
from the target
Domain (wines)
acidity palate texture aroma tomatoey oak is is is tasteful expressive pleasant good
Figure 3: Illustration of the proposed network for cross-domain aspect extraction considering
the sentences presented in Tables 1 and 2.
11
ACCEPTED MANUSCRIPT
Table 2: Examples of the network generations for each review about wines - target domain.
ID Sent. Sentence - Dependencies - Network
T
the palate is overly expressive
1 1 1 1 1 1 1
IP
DT: NN: VBZ: RB: JJ: JJ: JJ:
det: nsubj: cop: advmod: nsubj: cop: advmod:
in in in in out out out
CR
The tomatoey acidity is tasteful.
US
1 1 1 1 1 1 1 1
1 1 1 1 1 1 1
4
PT
1 1 1 1 1 1 1 1 1
heterogeneous network contains (1) nodes with aspects labeled as “YES” and
“NO” for source domain data, (2) nodes with aspect candidates that are terms
(e.g. noun, verb, adjective, and adverb) extracted from the target domain, and
(3) nodes with linguistic features extracted from the text reviews of both do-
mains — which we use as a bridge to propagate the labels of the source domain
to the nodes of the target domain. Moreover, all nodes in the network contain
12
ACCEPTED MANUSCRIPT
T
Thus, even nodes with linguistic features will receive a vector of label informa-
IP
tion, although these nodes are only used as a bridge to transfer such information
CR
to the target domain.
US
We present a transductive learning algorithm based on graph regularization
to propagate label information from the source domain to the target domain,
AN
where the linguistic features are used as a bridge for this propagation. Our
graph regularization algorithm was inspired by the successful use of a similar
algorithm introduced in Zhu et al. (2003) and has the basic premise that nodes
M
that share the same neighbor nodes tend to be classified with the same label.
The regularization function to be minimized representing the aspect label
ED
representing candidate aspects of the target domain and l is a node from the
set L representing linguistic features. The edge weights between linguistic fea-
tures and aspects are represented by was ,l (for labeled aspects) and wat ,l (for
CE
candidate aspects). The matrix W will store the edge weights of each network
node.
AC
Each node of the network has a vector of labels f , which represents the per-
tinence level of a node to each one of a class cl ∈ C. In the aspect extraction
scenario, we have C = {Aspect = Y es, Aspect = N o}. The matrix F sum-
marizes the pertinence level of all the nodes and represents the solution of the
regularization function. In case of labeled nodes, there is also a real label vector
y. Such vector has the value 1 in the position corresponding to the class and
0 in the others. In our proposal, only nodes in the set AS will be previously
13
ACCEPTED MANUSCRIPT
labeled and therefore have y vector. The matrix Y will store all the y vector of
the labeled aspects from the source domain.
X 1 X X 2
Q(F) = was ,l fas − fl (1)
2
T
AS ,AT ,L⊂V as ∈AS l∈L
X X 2
+ 12 wat ,l fat − fl
IP
at ∈AT l∈L
X
+ lim µ (fas − yas )
CR
µ→∞
as ∈AS
While the first two terms of Equation 1 determine that nearby nodes share
similar label information, the last term determines that the f information of
US
the labeled aspects is not modified during the label propagation process. This
means that the node information of the labeled aspects must always be close to
AN
the true label information of the aspect nodes from source domain (y).
The minimization of the regularization function in Equation 1 can be ob-
tained via an iterative algorithm since the use of solvers for quadratic program-
M
minimized making the label vector of an object oi (foi ) equals to the harmonic
average of the label vector of the neighboring objects. Thus, this step of the
CE
14
ACCEPTED MANUSCRIPT
In this case, the values of the submatrices PAS AS , PAS AT , PAT AS , PAS AS ,
and PLL are 0 since there are no relations among objects of the same type.
Hence, CD-ALPHN performs the transductive classification as presented in Al-
gorithm 1, in which the input is composed of the node sets AS , AT , and L of
T
the heterogeneous network, a matrix YAT with the real label information of
IP
the aspects from the source domain, a matrix W of edge weights, and a di-
CR
agonal matrix D with the degree (sum of the edge weights) of each node, i.e.,
P
dvi ,vi = vj ∈V wvi ,vj .
These steps are repeated until stopping criteria are reached. We adopted as
US
stopping criteria the maximum number of iterations, and the minimum mean
squared difference between the values of matrix F in consecutive iterations.
AN
Both stopping criteria are possible since the differences of the values of the
matrix F in consecutive iterations will decrease, i.e., the values of the matrix F
will converge as presented in the following paragraphs.
M
n−1
(n)
X
FL = (PAT L PLAT )i PLAS YAS +
PT
i=0
(0)
(PLAT PAT L )n−1 PLAT FAT (3)
CE
n−1
(n)
X
FAT = (PAT L PLAT )i PAT L PLAS YAS +
i=0
AC
(0)
(PAT L PLAT )n FAT (4)
Since each row of the matrix P is row-normalized, i.e., the sum of the val-
ues of the row is 1, the sum of the values of a row of the resulting matrix
(PAT L PLAT ) or (PAT L PLAT ) always will be lesser than 1. Thus, there exist a
15
ACCEPTED MANUSCRIPT
T
Algorithm 1: Cross-Domain Aspect Label Propagation through Hetero-
IP
geneous Network (CD-ALPHN)
Input: AS , AT , L, Y, W, D
CR
1 begin
2 P ← (D−1 ) · W
3 repeat
US
4 foreach AS , AT , L ⊂ V do
5 FAS ← YAS ; /* Setting the f vector of the labeled
AN
aspects from the source domain equal to the real
labels */
6 FL ← PL,AS · FAS + PL,AT · FAT ; /* Propagating labels
M
domain */
9 end
10 until stopping criteria;
AC
11 end
12 return F(AT )
16
ACCEPTED MANUSCRIPT
γ such that
|AT |
X
(PAT L PLAT [i, j]) ≤ γ < 1 ,
j=1
T
|AT |
IP
X
(PAT L PLAT [i, j])(n) ≤ γ (n) . (5)
j=1
CR
For n → ∞,
lim (PAT L PLAT )n = 0 , (6)
US
n→∞
fa ,c
class(ai ) = arg maxcl ∈C P r(cl ) · X i l (7)
faj ,cl
PT
aj ∈A
presented in Figure 3. Through this illustration, we can notice that some fea-
tures that do not occur in the source domain will be useful to classify aspect
candidates in the target domain. For instance, neither aspect in the source do-
main is linked to a linguistic node NN:coupuound:out. However, some nodes
in the target domain, acidity and palate, propagate the labels to the node
NN:coupuound:out, which posteriorly propagates its label to the node aroma.
17
ACCEPTED MANUSCRIPT
We highlight that the node aroma would not be classified as aspect in this ex-
amples if only the linguistic nodes which appear in the source domain would
be considered. The same occurs for the domain target nodes pleasant and
good. Those nodes do not have any relation to the linguistic nodes of the source
domain, and they were classified through the label propagation of the target
T
node tasteful to some linguistic node connected to pleasant and good in the
IP
target domain.
CR
Labeled aspects
from the source resolution battery video computer is is low nice
domain
(computers)
Linguistic features
US
from both domain
and target source NN: NN: NN: NN: NN: NN: VBZ: JJ: JJ: JJ: JJ: JJ:
nsubj: det: compound: compound: nmod: case: cop: nsubj: cop: advmod: amod: neg:
in out out in in out in out out out in out
AN
Aspect candidates
from the target
Domain (wines)
acidity palate texture aroma tomatoey oak is is is tasteful expressive pleasant good
M
Figure 4: Illustration of the label propagation through CD-ALPHN algorithm and considering
ED
4. Experimental Evaluation
18
ACCEPTED MANUSCRIPT
4.1. Datasets
We used seven text review benchmark datasets that are widely cited in the
T
aspect-based sentiment analysis literature. Table 3 presents an overview of the
IP
datasets, with the number of reviews (#Reviews), the total number of aspects
CR
(#Aspects), the total number of unique terms of the dataset after the text
preprocessing (#TotalTerms), and the average number of terms per document
(#AvgTerms). Datasets D1 to D5 were obtained from Hu & Liu (2004) and
US
datasets D6 and D7 were obtained from Pontiki et al. (2014).
Table 3: Description of the text review datasets used in the experimental evaluation.
AN
ID Dataset #Reviews #Aspects #TotalTerms #AvgTerms
All texts were preprocessed using the Stanford CoreNLP (Manning et al.,
2014) natural language processing tool to extract the linguistic features.
AC
19
ACCEPTED MANUSCRIPT
to classify aspects in the target domain. We used five popular classifiers and we
selected the best configurations considering the following parameters:
T
• kNN (Instance-based classifier): we analyzed the values {1, 3, 5, 7, 9, 11,
IP
13, 15} for the number of nearest neighbors (k parameter) with cosine
CR
similarity.
US
the values {8, 16, 32} for the number of neurons in the hidden layer and
the sigmoid function as the activation function.
AN
• NB (Naive-Bayes Classifier): this classifier is parameter free.
10−1 , 100 , 101 , 102 , 103 } for the complexity C parameter with polynomial
kernel.
ED
formed as follows: given a dataset to represent the target domain, all other
datasets are combined to compose the source domain. For example, if dataset
D7 represents the target domain, then the source domain is represented by the
AC
labeled aspects (and their linguistic features) extracted from datasets D1, D2,
D3, D4, D5, and D6. Thus, the proposed experimental evaluation is similar to
a real-world scenario where labeled aspects from different source domains can
be used to extract aspects of a new target domain.
Our experimental setup also considers the level of inconsistency between the
source and target domains. In this case, we consider the number of aspects of
20
ACCEPTED MANUSCRIPT
the target domain (AT ) that occur in the source domain (AS ) to compute the
level of inconsistency (β parameter), as defined in Equation 8.
|AT ∩ AS |
β = 100 × (8)
|AT |
T
IP
When the cross-domain transfer learning process is performed with low-level
of inconsistency, the target domain feature space is well represented in the source
CR
domain (there is a significant amount of shared aspects between both domains).
For example, we have low-level of inconsistency when the source domain contains
aspects about some Digital Camera models (dataset D1) and the target domain
US
contains aspects about another Digital Camera models (dataset D2). Although
they are different models of Digital Camera, the aspects of this type of product
AN
are very similar. On the other hand, when the target domain consists of a
completely new product or service with different aspects, then we have a high-
level of inconsistency.
M
(20% < β ≤ 50%), and High-Level (β < 20%). We underlined the source
domain datasets that have aspects shared with the target domain.
PT
formance, which is the harmonic mean between the Precision (Equation 10) and
Recall (Equation 11) measures, where:
AC
• F N (False Negative) indicates the number of terms that are aspects and
which were not inferred as aspects.
21
ACCEPTED MANUSCRIPT
Table 4: Level of inconsistency (β) considering the number of aspects of the target domain
that occur in the source domain.
Source Domains Target Domain Level of Inconsistency
T
MP3 Player (D4)
Digital Camera 1 (D1) Very Low-Level
DVD Player (D5)
IP
Laptop (D6)
Restaurant (D7)
CR
Digital Camera 1 (D1)
Cellular Phone (D3)
MP3 Player (D4)
Digital Camera 2 (D2) Very Low-Level
DVD Player (D5)
US
Laptop (D6)
Restaurant (D7)
Digital Camera 1 (D1)
Digital Camera 2 (D2)
AN
MP3 Player (D4)
Cellular Phone (D3) Mid-Level
DVD Player (D5)
Laptop (D6)
Restaurant (D7)
M
Laptop (D6)
Restaurant (D7)
Digital Camera 1 (D1)
Digital Camera 2 (D2)
AC
22
ACCEPTED MANUSCRIPT
P recision × Recall
F1 = 2 × (9)
P recision + Recall
TP
P recision = , (10)
TP + FP
T
IP
TP
Recall = , (11)
TP + FN
CR
4.4. Results and Discussion
US
traction process. We highlighted the best algorithm for each dataset with a bold
text in cells. Below we discuss the experimental results according to the level
of inconsistency of the cross-domain process.
AN
Table 5: F1-Measure results of the aspect extraction process for each approach.
Target Domain CD-ALPHN J48 kNN MLP NB SVM
M
two stages was sufficient for the transfer learning process, since there are
similar feature space and probability distribution. Neural Network (MLP)
classifier obtained the best results.
23
ACCEPTED MANUSCRIPT
process was sufficient for the transfer learning process. Support Vector
Machine (SVM) classifier obtained the best results.
T
However, the target domain contains a significant amount of new aspects.
IP
In this scenario, the transductive learning process proposed in our CD-
ALPHN algorithm obtains competitive results and obtained the best as-
CR
pect extraction performance. The results are similar to traditional two-
stage approaches, especially when compared with MLP and SVM classi-
US
fiers.
1 All algorithms were implemented using the Java language and run in the same computing
environment (Debian Operating System, Intel i7 Processor and 64GB of RAM).
24
ACCEPTED MANUSCRIPT
T
IP
4.5. Example of Application
CR
To complement the experimental analysis based on benchmark datasets, we
discuss the practical use of the CD-ALPHN approach integrated into a deci-
sion support system. Thus, we developed a data analytics system for aspect-
US
based sentiment analysis called “Websensors-SentimentAnalysis” (Websensors
for Aspect-based Sentiment Analysis), which uses the concept of “information
AN
as a sensor” (Marcacini et al., 2017). Websensors are proposed as an instance
of the Web Science research area, with the differential of exploring data science
techniques and analytical intelligence to explore the relationship between the
M
digital world and our physical world (Phethean et al., 2016). Websensors have
been used successfully in real-world applications such as time series forecasting
ED
(Marcacini et al., 2016) and urban violence events (Florence et al., 2017). In this
paper, we argue that sentiment analysis is a promising application to explore
PT
relationships between the virtual world (text reviews) and the real world (prod-
ucts and consumers) — concept also investigated by authors who study human
behavior in online ecommerce systems to support decision making (Zhang &
CE
Table 6: Average execution time in seconds of each cross-domain aspect extraction approach
AC
25
ACCEPTED MANUSCRIPT
Benyoucef, 2016). In this case, in addition to the text reviews, geographic infor-
mation (e.g., consumer location) and time information (e.g., date of publication)
are also collected to improve the information sensor.
Figure 5 illustrates one of the main interface of the Websensors-Sentiment-
Analysis from 93 real reviews (written in Portuguese) about a Brazilian restau-
T
rant (source domain), where the aspect extraction are performed using our CD-
IP
ALPHN approach from computers and laptops reviews (target domain). Ini-
CR
tially, we summarize all aspects according to the positive and negative polarities
(Figure 5A), thereby providing an overview of the product or service. When
the user selects a polarity of interest (Positive or Negative), the Websensors-
US
SentimentAnalysis system presents: the text reviews with the highlighted as-
pects (Figure 5B), a temporal evolution of the polarity according to the fre-
AN
quency of positive or negative aspects over time (Figure 5C), and a geographical
mapping according to the locality of the consumers that submitted the reviews
(Figure 5D).
M
C
ED
A
PT
B D
CE
AC
26
ACCEPTED MANUSCRIPT
T
IP
CR
US
Figure 6: Example of a heat map that represents the geographical occurrence of a selected
AN
aspect according to the location of the consumers.
extracted aspects and then get an overview on which text reviews this aspect
M
Once the aspects are extracted, several other functionalities of a data ana-
lytics system can be implemented, such as notification of the occurrence of a
CE
new aspect, reports of the top-k aspects most commented in the reviews, and
alerts with the aspects that need attention (increase of negative polarity). We
believe that these functionalities can be used as guidelines for the development
AC
27
ACCEPTED MANUSCRIPT
5. Concluding Remarks
T
between different domains through heterogeneous networks, and (2) by using
IP
a cross-domain transfer learning process through label propagation with trans-
ductive learning.
CR
Experimental results validate the effectiveness (F1-Measure measure) and
efficiency (computational time) of our approach. Our proposed CD-ALPHN
US
(Cross-Domain Aspect Label Propagation through Heterogeneous Networks)
was compared to the state-of-the-art approaches for cross-domain tasks and ob-
tained competitive results considering all scenarios. Moreover, CD-ALPHN is
AN
potentially useful in scenarios where there is a high-level of inconsistency be-
tween the source and target domains, outperforming the state-of-the-art MLP
M
tency, since reviews are collected from different data sources and different writ-
ing styles. In addition to the experimental evaluation with benchmark datasets,
we discussed how the proposed approach can be employed in decision support
PT
Directions for future work involve the enrichment of the model representation
based on heterogeneous networks. We plan to include the sentiment polarity
AC
28
ACCEPTED MANUSCRIPT
All the datasets used in this work, as well as the source code of our CD-
ALPHN approach, are available at http://websensors.net.br/absa/.
Acknowledgment
T
The authors acknowledge the Brazilian Research Agencies FUNDECT-MS
IP
[grant number 147/2016 - SIAFEM 25907], FINEP, CNPq, CAPES, and FAPESP
[grant number 2014/08996-0 and 2017/08804-2] for their support to this work.
CR
The authors also thank the NVIDIA for donating computer equipment (GPU
Grant Academic Program).
US
References
AN
Akhtar, M. S., Gupta, D., Ekbal, A., & Bhattacharyya, P. (2017). Feature
selection and ensemble construction: A two-step method for aspect based
sentiment analysis. Knowledge-Based Systems, 125 , 116–135.
M
Al-Moslmi, T., Omar, N., Abdullah, S., & Albared, M. (2017). Approaches
ED
Belkin, M., Niyogi, P., & Sindhwani, V. (2006). Manifold regularization: A geo-
metric framework for learning from labeled and unlabeled examples. Journal
of Machine Learning Research, 7 , 2399–2434.
CE
Breck, E., & Cardie, C. (2017). Opinion mining and sentiment analysis. In The
Oxford Handbook of Computational Linguistics. (2nd ed.).
AC
Breve, F. A., Zhao, L., Quiles, M. G., Pedrycz, W., & Liu, J. (2012). Particle
competition and cooperation in networks for semi-supervised learning. IEEE
Transasctions on Knowledge and Data Engineering, 24 , 1686–1698.
29
ACCEPTED MANUSCRIPT
Chang, W.-C., Wu, Y., Liu, H., & Yang, Y. (2017). Cross-domain kernel induc-
tion for transfer learning. In Proceedings of the Thirty-First AAAI Conference
on Artificial Intelligence (AAAI-17) (pp. 1763–1769).
Chapelle, O., Schölkopf, B., & Zien, A. (Eds.) (2006). Semi-Supervised Learning.
T
MIT Press.
IP
Chen, Z., Mukherjee, A., & Liu, B. (2014). Aspect extraction with automated
CR
prior knowledge learning. In Proceedings of the 52nd annual meeting of the
Association for Computational Linguistics (pp. 347–358).
Deng, S., Sinha, A. P., & Zhao, H. (2017). Adapting sentiment lexicons to
US
domain-specific social media texts. Decision Support Systems, 94 , 65–76.
Duric, A., & Song, F. (2012). Feature selection for sentiment analysis based on
AN
content and syntax models. Decision support systems, 53 , 704–711.
Gupta, M., Kumar, P., & Bhasker, B. (2015). A new relevance measure for
CE
Hu, M., & Liu, B. (2004). Mining and summarizing customer reviews. In Pro-
ceedings of the 10th ACM SIGKDD International Conference on Knowledge
Discovery and Data Mining (KDD-2004) (pp. 168–177).
30
ACCEPTED MANUSCRIPT
Kong, X., Ng, M. K., & Zhou, Z.-H. (2013). Transductive multilabel learn-
T
ing via label set propagation. IEEE Transactions on Knowledge and Data
IP
Engineering, 25 , 704–719.
Lau, R. Y., Li, C., & Liao, S. S. (2014). Social analytics: learning fuzzy product
CR
ontologies for aspect-oriented sentiment analysis. Decision Support Systems,
65 , 80–94.
US
Li, S., Xue, Y., Wang, Z., & Zhou, G. (2013). Active learning for cross-domain
sentiment classification. In Proceedings of the 23rd International Joint Con-
AN
ference on Artificial Intelligence (IJCAI) (pp. 2127–2133).
Liu, B., & Zhang, L. (2012). A survey of opinion mining and sentiment analysis.
ED
Long, M., Wang, J., Ding, G., Pan, S. J., & Philip, S. Y. (2014a). Adaptation
PT
Long, M., Wang, J., Ding, G., Shen, D., & Yang, Q. (2014b). Transfer learning
with graph co-regularization. IEEE Transactions on Knowledge and Data
Engineering, 26 , 1805–1818.
AC
Long, M., Wang, J., Sun, J., & Philip, S. Y. (2015). Domain invariant transfer
kernel learning. IEEE Transactions on Knowledge and Data Engineering, 27 ,
1519–1532.
Lu, J., Behbood, V., Hao, P., Zuo, H., Xue, S., & Zhang, G. (2015). Transfer
learning using computational intelligence: a survey. Knowledge-Based Sys-
tems, 80 , 14–23.
31
ACCEPTED MANUSCRIPT
Luo, P., Zhuang, F., Xiong, H., Xiong, Y., & He, Q. (2008). Transfer learning
from multiple source domains via consensus regularization. In Proceedings of
the 17th ACM conference on Information and knowledge management (pp.
103–112). ACM.
T
Manning, C., Surdeanu, M., Bauer, J., Finkel, J., Bethard, S., & McClosky,
IP
D. (2014). The Stanford CoreNLP natural language processing toolkit. In
Proceedings of 52nd annual meeting of the association for computational lin-
CR
guistics: system demonstrations (pp. 55–60).
US
sensors and dtw distance for knn time series forecasting. In Pattern Recogni-
tion (ICPR), 2016 23rd International Conference on (pp. 2521–2525). IEEE.
AN
Marcacini, R. M., Rossi, R. G., Nogueira, B. M., Martins, L. V., Cherman, E. A.,
& Rezende, S. O. (2017). Websensors analytics: Learning to sense the real
world using web news events. In Proceedings of 23th Brazilian Symposium on
M
Multimedia and the Web. Workshop on tools and applications. (pp. 169–173).
ED
Matsuno, I. P., Rossi, R. G., Marcacini, R. M., & Rezende, S. O. (2017). Aspect-
based sentiment analysis using semi-supervised learning in bipartite heteroge-
neous networks. Journal of Information and Data Management, 7 , 141–154.
PT
Pan, S. J., & Yang, Q. (2010). A survey on transfer learning. IEEE Transactions
on knowledge and data engineering, 22 , 1345–1359.
Phethean, C., Simperl, E., Tiropanis, T., Tinati, R., & Hall, W. (2016). The
role of data science in web science. IEEE Intelligent Systems, 31 , 102–107.
32
ACCEPTED MANUSCRIPT
Rana, T. A., & Cheah, Y.-N. (2016). Aspect extraction in sentiment analysis:
comparative analysis and survey. Artificial Intelligence Review , 46 , 459–483.
T
Rana, T. A., & Cheah, Y.-N. (2017). A two-fold rule-based model for aspect
IP
extraction. Expert Systems with Applications, 89 , 273–285.
CR
Rohrbach, M., Ebert, S., & Schiele, B. (2013). Transfer learning in a trans-
ductive setting. In Proceedings of Advances in Neural Information Processing
Systems (NIPS) (pp. 46–54).
US
Rossi, R. G., Lopes, A. A., & Rezende, S. O. (2014). A parameter-free label
propagation algorithm using bipartite heterogeneous networks for text classi-
AN
fication. In Proceedings of the Symposium on Applied Computing (pp. 79–84).
ACM.
M
Subramanya, A., & Bilmes, J. (2008). Soft-supervised learning for text classi-
fication. In Proceedings of the Conference on Empirical Methods in Natural
Language Processing (pp. 1090–1099). Association for Computational Lin-
guistics.
Wang, T., Cai, Y., Leung, H.-f., Lau, R. Y., Li, Q., & Min, H. (2014). Product
aspect extraction supervised with online domain knowledge. Knowledge-Based
Systems, 71 , 86–100.
33
ACCEPTED MANUSCRIPT
Wu, F., Huang, Y., & Yan, J. (2017). Active sentiment domain adaptation. In
Proceedings of the 55th Annual Meeting of the Association for Computational
Linguistics (Volume 1: Long Papers) (pp. 1701–1711). volume 1.
Wu, Q., & Tan, S. (2011). A two-stage framework for cross-domain sentiment
T
classification. Expert Systems with Applications, 38 , 14269–14275.
IP
Zhang, K. Z., & Benyoucef, M. (2016). Consumer behavior in social commerce:
CR
A literature review. Decision Support Systems, 86 , 95 – 108.
Zhang, P., Wang, J., Wang, Y., & Wang, Y. (2016). A statistical approach to
opinion target extraction using domain relevance. In Proceedings of the 2nd
US
IEEE International Conference on Computer and Communications (ICCC)
(pp. 273–277). IEEE.
AN
Zhou, D., Bousquet, O., Lal, T. N., Weston, J., & Schölkopf, B. (2004). Learning
with local and global consistency. In Proceedings of the Advances in Neural
M
using gaussian fields and harmonic functions. In Proceedings of the 20th In-
ternational conference on Machine learning (ICML-03) (pp. 912–919).
PT
Zhuang, F., Luo, P., Xiong, H., Xiong, Y., He, Q., & Shi, Z. (2010). Cross-
domain learning from multiple sources: A consensus regularization perspec-
AC
34
ACCEPTED MANUSCRIPT
Biography
PT
Engineering.
RI
Ivone Penque Matsuno is a doctoral candidate at the Institute of Mathematics and Computer
SC
Science at the University of São Paulo, Brazil. Her research interests include sentiment analysis
and opinion mining. Her research has appeared in the Journal of Information and Data
Management, and in conference proceedings, such as Brazilian Symposium on Multimedia and
NU
the Web and International Conference on Engineering Design.
MA
Solange Oliveira Rezende is a full professor of Computer Science at the University of São
Paulo, Brazil. She has a PhD in Mechanical Engineering from University of São Paulo and a
AC
postdoctoral degree in Computer Science at the University of Minnesota, USA. Her research
interests include data and text mining, machine learning, and recommendation systems. She
has published papers in a number of international journals, such as Pattern Recognition
Letters, Journal of Information and Data Management, Knowledge-Based Systems, Intelligent
Data Analysis, Information Retrieval Journal, and Information Processing & Management.
ACCEPTED MANUSCRIPT
Highlights
T
IP
CR
US
AN
M
ED
PT
CE
AC
Figure 1
Figure 2
Figure 3
Figure 4
Figure 5
Figure 6