Identifying Groups of Fake Reviewers Using A Semisupervised Approach
Identifying Groups of Fake Reviewers Using A Semisupervised Approach
Abstract— Online product reviews have become increasingly five million apps on the iOS and Google Play Store together,
important in digital consumer markets where they play a opinion spamming will become worse and more sophisticated
crucial role in making purchasing decisions by most consumers. in the future. Hence, detecting fake reviews and fake reviewers
Unfortunately, spammers often take advantage of online reviews
by writing fake reviews to promote/demote certain products. becomes extremely challenging.
Most of the previous studies have focused on detecting fake Crowdsourcing sites play a vital role in such scenarios.
reviews and individual fake reviewer-ids. However, to target a Though generic crowdsourcing sites exist in the market,
particular product, fake reviewers work collaboratively in groups some specialized fraudsters’ crowdsourcing sites have emerged
and/or create multiple fake ids to write reviews and control the whose primary focus is only on performing fraud activities.
sentiments of the product. This article addresses the problem of
finding such fake reviewer groups. More specifically, we propose Since the incentive structure appeals to fraudsters to perform
a top-down framework for candidate fake reviewer groups’ malicious behaviors, fraudulent sites hire such people and
detection based on the DeepWalk approach on reviewers’ graph make communities commit fraud, such as spread false opin-
data and a (modified) semisupervised clustering method, which ions, promote activities, instigate controversy and debates to
can incorporate partial background knowledge. We validate our build hype about certain topics, or control the sentiments about
proposed framework on a real review dataset from the Google
Play Store, which has partial ground-truth information about certain products to promote/demote them. Such communities
2207 fraud reviewer-ids out of all 38 123 reviewer-ids in the often work in groups to fully control of sentiment of target
dataset. Our experimental results demonstrate that the proposed products and distribute total effort. Moreover, to prevent being
approach is able to identify the candidate spammer groups detected, some group members review nontarget products and
with reasonable accuracy. The proposed approach can also be review like normal users to deceive spam detection tools.
extended to detect groups of opinion spammers in social media
(e.g. fake comments or fake postings) with temporal affinity, Therefore, such fraudster groups are more harmful than the
semantic characteristics, and sentiment analysis. individual fraudster.
In recent years, there is a rapid increase in such commu-
Index Terms— Computational social science, fake reviewer
groups’ detection, semisupervised clustering, spammer detection. nities that perform fraudulent activities in groups. Fraudsters
first create multiple unique accounts, then use crowdsourcing
sites, try to connect with developers, and ultimately perform
I. I NTRODUCTION
fraudulent activities by performing fraudulent public opinion.
Authorized licensed use limited to: Consortium - Algeria (CERIST). Downloaded on October 31,2022 at 10:14:50 UTC from IEEE Xplore. Restrictions apply.
1370 IEEE TRANSACTIONS ON COMPUTATIONAL SOCIAL SYSTEMS, VOL. 8, NO. 6, DECEMBER 2021
Authorized licensed use limited to: Consortium - Algeria (CERIST). Downloaded on October 31,2022 at 10:14:50 UTC from IEEE Xplore. Restrictions apply.
RATHORE et al.: IDENTIFYING GROUPS OF FAKE REVIEWERS USING A SEMISUPERVISED APPROACH 1371
detect small-size and tighter groups. Unlike GSRank, GSBP useful candidate groups; and 3) FIM is unable to capture
does not take into consideration the review content as its overlapping groups. Moreover, the majority of the existing
features. Akoglu et al. [11] proposed a Markov random methods manually label the fake reviews or reviewers based
field (MRF)-based model, FRAUDEagle, which ranks the on various spam indicators. To the best of our knowledge, this
individual reviewers based on the spamicity, by exploiting is the first study that validates the candidate spammer groups
the network effect among reviewers and products. To detect the identified using our proposed framework against the actual
spammer groups, they used a graph clustering technique on (ground-truth) fake reviewer groups.
the subgraphs containing top-ranked spammers and products.
Rayana and Akoglu [30] proposed SpEagle that exploits both III. P ROPOSED M ETHODOLOGY
the relational data (review graph with only the review nodes)
In this section, first, we define the problem statement and
and metainformation (e.g., review content, timestamp, and
then discuss our proposed computing framework.
ratings) as prior probabilities to detect spammer groups and
fake reviews.
Choo et al. [24] performed sentiment analysis on reviews’ A. Problem Definition
interaction data among users to differentiate nonspam com- Let G = (R, E) denote a weighted coreview graph, where R
munities from spam ones using community structures. The is the node-set representing reviewer-ids, and E is the edge-set
authors obtained comparable results on the Amazon dataset whose weights represent the number of apps coreviewed by the
with respect to state-of-the-art approaches. Their results sug- reviewer-ids of the associated vertices. For example, an edge
gested that opinion group spammers have strong positive (r1 , r2 ) with edge weight w means that reviewer accounts r1
communities. Ye and Akoglu [18] proposed a new two-step and r2 coreview the same w number of apps. Therefore, given
model, called GroupStrainer, to identify the products targeted the partially labeled data about the relationship between a set
by a group of spammers. First, they compute the likelihood of of reviewer spammers (or group members) Rg ⊂ R, the task is
the products being spam campaign targets using a new network to identify all the candidate spammer groups from unlabeled
footprint score (NFS) measure and then cluster spammers on nodes R ⊂ R \ Rg , where the number of candidate spammer
a two-hope subgraph induced by top-ranking products. groups is unknown.
Dhawan et al. [21] proposed a model, called DeFrauder, In the proposed framework, as shown in Fig. 2, we detect
which first detects candidate fraud groups by leveraging the candidate spammer groups merely based on the topological
underlying graph and incorporating behavioral signals and structure of the coreview graph. First, we obtain representation
then ranks each reviewer group based on the spam score. for each node (reviewer-ids) in G using the DeepWalk method,
Bitarafan and Dadkhah [22] proposed a heterogeneous infor- and then, we partition these reviewer-ids, represented by
mation network (HIN)-based approach that first identifies feature vectors, into different candidate spammer groups by
candidate spammer groups using biconnected components in employing a modified version of a semisupervised cluster-
the reviewer graph and then classifies the groups based on ing algorithm, known as PCKMeans (Pairwise Constrained
several group spammer indicators. K -Means).
Most of the existing methods to discover candidate spammer
groups exploit the FIM technique. However, there are several
limitations [20] with FIM-based methods for candidate groups’ B. DeepWalk
detection, such as: 1) a low support value causes high compu- We leverage the DeepWalk approach to get the high-level
tational complexity as frequent itemsets grow exponentially; representation of our coreview graph of reviewers. It is a nat-
2) a high support threshold value results in loss of many ural language-based modeling approach that aims to generate
Authorized licensed use limited to: Consortium - Algeria (CERIST). Downloaded on October 31,2022 at 10:14:50 UTC from IEEE Xplore. Restrictions apply.
1372 IEEE TRANSACTIONS ON COMPUTATIONAL SOCIAL SYSTEMS, VOL. 8, NO. 6, DECEMBER 2021
a meaningful representation between vertices. It achieves this In our modification, the first λ centroids are computed from
utilizing two different techniques: Random-Walk followed by ML and CL constraints as mentioned above; however, the next
Word2vec [31]. k − λ centroids are chosen using the maximin (MM) sampling
Through Random-Walk, it performs random sampling on scheme. The MM sampling scheme iteratively selects the
the graph to generate a sequence of Reviewer-ids. The sam- points that are further from each other in the input data. Thus,
pling is governed by some hyperparameters that include the we initialize the first λ samples with the λ centroids, and
following: then, the remaining k − λ points (centroids) are chosen using
1) number of walks; MM sampling, in turn, to have maximum separation from the
2) minimum path length (number of edges along the path); existing centers. These cluster centers are well scattered over
3) minimum distance (sum of edge weights along the path). the sample space.
Word2vec: It accepts random walk output generated in the In the second (cluster assignment) step, each point is
previous step as its input. It is basically a neural network, assigned to its cluster such that its distance to the cluster
which uses the skip-gram [32] technique to generate the vector centroid is minimized while satisfying as many ML and CL
representation of each node. This representation has semantic constraints as possible by the assignment. Let M and C denote
meaning, wherein similar reviewer accounts will get similar the set of ML and CL constraints, respectively. Also, let
vector representation and vice versa. The Word2vec model is li ∈ {1, 2, . . . , k} be the cluster assignment of a point x i and
governed by following two hyperparameters: 1) representation 1 be the Indicator function with 1[true] = 1 and 1[false] = 0;
size (number of features for the path) and 2) window size then, the PCKMeans seeks to minimize the following modified
objective function:
Authorized licensed use limited to: Consortium - Algeria (CERIST). Downloaded on October 31,2022 at 10:14:50 UTC from IEEE Xplore. Restrictions apply.
RATHORE et al.: IDENTIFYING GROUPS OF FAKE REVIEWERS USING A SEMISUPERVISED APPROACH 1373
Authorized licensed use limited to: Consortium - Algeria (CERIST). Downloaded on October 31,2022 at 10:14:50 UTC from IEEE Xplore. Restrictions apply.
1374 IEEE TRANSACTIONS ON COMPUTATIONAL SOCIAL SYSTEMS, VOL. 8, NO. 6, DECEMBER 2021
TABLE I
C LASSIFICATION A CCURACY ON PARTIALLY L ABELED D ATA
Authorized licensed use limited to: Consortium - Algeria (CERIST). Downloaded on October 31,2022 at 10:14:50 UTC from IEEE Xplore. Restrictions apply.
RATHORE et al.: IDENTIFYING GROUPS OF FAKE REVIEWERS USING A SEMISUPERVISED APPROACH 1375
TABLE II
C OMPARISON OF T HREE U NSUPERVISED AND F IVE S EMISUPERVISED C LUSTERING A LGORITHMS BASED ON THE AVERAGE PA (%)
constraints, we randomly select two instances from the par- constraints in this experiment. In this way, the partially
tially available data and check their labels (which are used labeled data used to generate constraints in this experiment
only for constraints generation and evaluation and not for do not have any representation from the remaining eight
clustering). If they are from the same class, we place the pair in clusters. We generated a total of 200 constraints (100 ML
the ML constraint set, else in the CL constraint set. Note that and 100 CL) and used them in all four semisupervised clus-
we could generate hundreds of thousands of constraints from tering algorithms as an input. We set the constraint violation
the partially labeled data available to us; however, not all the cost w = 1.
constraints will be useful. Therefore, we use the active learning The similarity of output partitions to ground-truth labels
technique, as mentioned in Section III-C2, to select the most is measured using the partition accuracy (PA). The PA of
informative constraints from the total available constraints. a clustering algorithm is the ratio of the number of objects
with matching ground-truth labels and algorithmic labels to
F. Validation of Proposed Framework Using Ground-Truth the total number of objects in the data. The value of (%) PA
Data, X l ranges from 0 to 100, and a higher value implies a better
Before applying the proposed framework to the entire match to the ground-truth partition. Before the PA can be
dataset X to identify candidate spammer group, we first calculated, we ensure that the algorithmic labels obtained from
validate it on the 2207 fraud reviewer-accounts’ data, the clustering algorithms correspond to the same subsets in
X l , for which we have ground-truth information readily the ground truth. For a fair comparison, we set the number
available. We also compare the semisupervised clustering of clusters k = 23 as an input for all seven comparison
algorithm, (modified) PCKMeans (used in the proposed algorithms. The PA is computed only on those datapoints for
framework) with three standard unsupervised clustering algo- which we did not use ground-truth information for constraints
rithm, namely, k-means, DBSCAN [40], Ward’s (hierarchical) generation.
algorithm [41], and four other semisupervised clustering algo- Table II shows the comparison of three unsupervised and
rithms, namely, seeded k-means (Seeded-KM) [27], constraint five semisupervised clustering algorithms based on the average
k-means (CKM) [27], pairwise-constraint based k-means (ten trials) PA(%). We observed that: 1) among unsupervised
(COPKM) [42], PCKMeans [33], and constraint iVAT clustering algorithms, Ward’s clustering algorithm achieves the
(ConiVAT) [43]. For detailed explanation about these algo- highest clustering accuracy; 2) all five semisupervised algo-
rithms, we refer readers to [26], [43], [44]. rithms boost the clustering accuracy; and 3) among five semi-
In many practical cases, the partially labeled data have supervised clustering algorithms, PCKMeans and Modified
information about only some datapoints from certain clusters PCKMeans achieve reasonably better accuracy than other three
and may not have any representation from the remaining clustering algorithms. Though the accuracy of PCKMeans and
clusters present in the full data. Therefore, to have the same Modified PCKMeans is comparable, an additional boost of 2%
characteristics in the partially labeled data, we consider only in the latter algorithm is achieved through our modifications,
40% datapoints from 15 clusters (classes) of X l to generate as discussed in Section III.
Authorized licensed use limited to: Consortium - Algeria (CERIST). Downloaded on October 31,2022 at 10:14:50 UTC from IEEE Xplore. Restrictions apply.
1376 IEEE TRANSACTIONS ON COMPUTATIONAL SOCIAL SYSTEMS, VOL. 8, NO. 6, DECEMBER 2021
Fig. 6. Partially labeled data: Silhouette score for different number of clusters, Fig. 7. Entire data: Silhouette score for different number of clusters, with
with an optimal number (vertical dashed line) of clusters (fraudster groups) an optimal number (vertical dashed line) of clusters (fraudster groups) in the
in the labeled data. entire data.
Authorized licensed use limited to: Consortium - Algeria (CERIST). Downloaded on October 31,2022 at 10:14:50 UTC from IEEE Xplore. Restrictions apply.
RATHORE et al.: IDENTIFYING GROUPS OF FAKE REVIEWERS USING A SEMISUPERVISED APPROACH 1377
not essential in our semisupervised approach, but it boosts 2207 reviewer-ids as ground-truth information. We validated
the performance of fake reviewer groups’ detection. A key our proposed framework on the partial ground data (2207 fake
application of our approach in the evaluation of a new review reviewer-ids belonging to 23 unique reviewers) to identify
for genuineness—when the associated reviewer-id is a member fraud reviewer groups from reviewers’ graph data with ∼67%
of a spammer group—is the significant reduction in the search accuracy. Moreover, our experimental results on the entire
space from the entire collection of past reviews to the set of (both labeled and unlabeled) data demonstrate that the pro-
reviews by the reviewer-ids of the spammer group. posed framework is able to identify candidate spammers
In our work, we used the DeepWalk embedding approach groups with satisfactory accuracy, without review text data
under the assumption that the network is static. Since the analysis.
DeepWalk method is a transductive technique, it must be
retrained whenever a new node is added. In most social R EFERENCES
networks, nodes and edges accrue to a growing network
as new data arrive. Therefore, to extend our approach for [1] D. Kaemingk. (2019). 20 Online Review Stats to Know in 2019.
Accessed: Aug. 25, 2020. [Online]. Available: https://www.qualtrics.
dynamic networks, we plan to leverage a real-time streaming com/blog/online-review-stats/
graph-embedding technique [46], [47] in our future work. [2] R. Kats. (2020). How are Consumers Spending Some Their Time? Read-
The augmentation of stream data processing in our approach ing Reviews. Lots of Reviews. Accessed: Aug. 25, 2020. [Online]. Avail-
able: https://www.emarketer.com/content/how-are-consumers-spending-
will dynamically reconfigure spammer groups in real-time some-of-their-time-reading-reviews-lots-of-reviews
with newly created reviewer-ids that will enable real-time [3] K. Saleh. (2015). The Importance of Online Customer Review [Info-
detection of fake reviewer groups. Future online reviews from graphic]. Accessed: Aug. 25, 2020. [Online]. Available: https://www.
invespcro.com/blog/the-importance-of-online-customer-reviews-
reviewer accounts of fraudsters detected by this approach will infographic/
be quarantined at online forums to improve the authenticity [4] N. Jindal and B. Liu, “Review spam detection,” in Proc. 16th Int. Conf.
of online reviews. In this work, the hyperparameters for the World Wide Web, 2007, pp. 1189–1190.
[5] N. Jindal and B. Liu, “Opinion spam and analysis,” in Proc. Int. Conf.
DeepWalk approach are learned directly using the partial Web Search Web Data Mining, 2008, pp. 219–230.
ground-truth data in a semisupervised fashion. In our future [6] E.-P. Lim, V.-A. Nguyen, N. Jindal, B. Liu, and H. W. Lauw, “Detecting
work, we intend to explore other graph embedding techniques, product review spammers using rating behaviors,” in Proc. 19th ACM
Int. Conf. Inf. Knowl. Manage., 2010, pp. 939–948.
such as node2vec [48], which can learn representations that [7] F. H. Li, M. Huang, Y. Yang, and X. Zhu, “Learning to identify review
organize nodes based on their network roles and/or communi- spam,” in Proc. 22nd Int. Joint Conf. Artif. Intell., 2011, pp. 1–6.
ties they belong to. Also, in the future, if we have additional [8] M. Ott, Y. Choi, C. Cardie, and J. T. Hancock, “Finding deceptive
opinion spam by any stretch of the imagination,” 2011, arXiv:1107.4557.
information, such as temporal information, ratings, and review [Online]. Available: http://arxiv.org/abs/1107.4557
text, then our proposed approach would be able to give better [9] S. Feng, L. Xing, A. Gogar, and Y. Choi, “Distributional footprints of
results for fake review groups’ detection. deceptive product reviews,” in Proc. ICWSM, 2012, vol. 12, no. 98,
p. 105.
Also, we can adapt our proposed approach to detect opinion [10] G. Fei, A. Mukherjee, B. Liu, M. Hsu, M. Castellanos, and R. Ghosh,
spammers on social media. Most spammers work collabora- “Exploiting burstiness in reviews for review spammer detection,” in
tively in groups to spread fake opinions through social media, Proc. ICWSM, 2013, vol. 13, pp. 175–184.
such as Facebook and Twitter, to reach a receptive audience. [11] L. Akoglu, R. Chandy, and C. Faloutsos, “Opinion fraud detection in
online reviews by network effects,” in Proc. ICWSM, 2013, vol. 13,
Mostly, they use the same pieces of false information to nos. 2–11, p. 29.
reply to or comment on the posting from real social media [12] A. Heydari, M. A. Tavakoli, N. Salim, and Z. Heydari, “Detec-
users on certain topics or issues. Our proposed approach tion of review spam: A survey,” Expert Syst. Appl., vol. 42, no. 7,
pp. 3634–3642, May 2015.
can directly be extended to establish clusters of spammer- [13] M. Crawford, T. M. Khoshgoftaar, J. D. Prusa, A. N. Richter, and
ids that have correlated semantic characteristics and temporal H. A. Najada, “Survey of review spam detection using machine learning
affinity. Furthermore, suitable natural language processing techniques,” J. Big Data, vol. 2, no. 1, p. 23, Dec. 2015.
[14] A. Mukherjee, B. Liu, and N. Glance, “Spotting fake reviewer groups
(NLP) techniques will enhance the accuracy of semantic in consumer reviews,” in Proc. 21st Int. Conf. World Wide Web, 2012,
correlation. pp. 191–200.
[15] C. Xu, J. Zhang, K. Chang, and C. Long, “Uncovering collusive
VI. C ONCLUSION spammers in Chinese review websites,” in Proc. 22nd ACM Int. Conf.
Conf. Inf. Knowl. Manage., 2013, pp. 979–988.
Online review spamming has been increasingly becoming a [16] M. Allahbakhsh et al., “Collusion detection in online rating systems,”
serious issue to the online rating system. Therefore, identifying in Proc. Asia–Pacific Web Conf. Berlin, Germany: Springer, 2013,
group spamming activities is an important problem to prevent pp. 196–207.
[17] C. Xu and J. Zhang, “Combating product review spam campaigns via
online customers from being influenced by fake reviews. multiple heterogeneous pairwise features,” in Proc. SIAM Int. Conf. Data
In this article, we proposed a top-down framework to identify Mining, Jun. 2015, pp. 172–180.
candidate fake reviewers’ groups from social media data. [18] J. Ye and L. Akoglu, “Discovering opinion spammer groups by network
footprints,” in Proc. Joint Eur. Conf. Mach. Learn. Knowl. Discovery
The proposed framework first uses a DeepWalk approach to Databases. Cham, Switzerland: Springer, 2015, pp. 267–282.
represent different reviewers in the form of feature vectors and [19] J. Soni and N. Prabakar, “Effective machine learning approach to detect
then employs a semisupervised clustering method to identify groups of fake reviewers,” in Proc. 14th Int. Conf. Data Sci. (ICDATA),
Las Vegas, NV, USA, 2018, pp. 3–9.
candidate fake reviewers’ groups using partial background [20] Z. Wang, T. Hou, D. Song, Z. Li, and T. Kong, “Detecting review
knowledge. spammer groups via bipartite graph projection,” Comput. J., vol. 59,
We conducted experiments on a real-world review dataset no. 6, pp. 861–874, Jun. 2016.
[21] S. Dhawan, S. C. R. Gangireddy, S. Kumar, and T. Chakraborty, “Spot-
that consists of 640 Google Play Store apps reviewed by ting collective behaviour of online frauds in customer reviews,” 2019,
38 123 reviewer-ids and partial background knowledge about arXiv:1905.13649. [Online]. Available: http://arxiv.org/abs/1905.13649
Authorized licensed use limited to: Consortium - Algeria (CERIST). Downloaded on October 31,2022 at 10:14:50 UTC from IEEE Xplore. Restrictions apply.
1378 IEEE TRANSACTIONS ON COMPUTATIONAL SOCIAL SYSTEMS, VOL. 8, NO. 6, DECEMBER 2021
[22] A. Bitarafan and C. Dadkhah, “SPGD_HIN: Spammer group detection Punit Rathore received the master’s degree in
based on heterogeneous information network,” in Proc. 5th Int. Conf. instrumentation engineering from IIT Kharagpur,
Web Res. (ICWR), Apr. 2019, pp. 228–233. Kharagpur, India, in 2011, and the Ph.D. degree
[23] J. Soni, N. Prabakar, and H. Upadhyay, “Feature extraction through from the Department of Electrical and Electronics
deepwalk on weighted graph,” in Proc. 15th Int. Conf. Data Sci. Engineering, University of Melbourne, Melbourne,
(ICDATA), 2019, pp. 1–7. VIC, Australia, in 2019.
[24] E. Choo, T. Yu, and M. Chi, “Detecting opinion spammer groups through He was a Research Fellow with the School
community discovery and sentiment analysis,” in Proc. IFIP Annu. of Computing, National University of Singapore
Conf. Data Appl. Secur. Privacy. Cham, Switzerland: Springer, 2015, (NUS), Singapore. He is currently a Post-Doctoral
pp. 170–187. Researcher with the Senseable City Laboratory,
[25] J. K. Rout, A. Dalmia, K.-K.-R. Choo, S. Bakshi, and S. K. Jena, “Revis- Department of Urban Studies and Planning, Massa-
iting semi-supervised learning for online deceptive review detection,” chusetts Institute of Technology (MIT) Cambridge, MA, USA. His research
IEEE Access, vol. 5, pp. 1319–1327, 2017. interests include big data clustering, spatiotemporal data mining, the Internet
[26] Y. Qin, S. Ding, L. Wang, and Y. Wang, “Research progress on semi- of Things, and urban analytics.
supervised clustering,” Cogn. Comput., vol. 11, no. 5, pp. 599–612,
Oct. 2019.
[27] S. Basu, A. Banerjee, and R. Mooney, “Semi-supervised clustering by Jayesh Soni is currently pursuing the Ph.D. degree
seeding,” in Proc. 19th Int. Conf. Mach. Learn. (ICML, 2002, pp. 27–34. with the School of Computing and Information Sci-
[28] A. Mukherjee, V. Venkataraman, B. Liu, and N. S. Glance, “What yelp ences, Florida International University, Miami, FL,
fake review filter might be doing?” in Proc. ICWSM, 2013, pp. 409–418. USA.
[29] A. Mukherjee, B. Liu, J. Wang, N. Glance, and N. Jindal, “Detecting His current research on anomaly detection at
group review spam,” in Proc. 20th Int. Conf. Companion World Wide the system level by leveraging artificial intelli-
Web, 2011, pp. 93–94. gence techniques is supported by the Department of
[30] S. Rayana and L. Akoglu, “Collective opinion spam detection: Bridging Defense, Test Resource Management Center, USA.
review networks and metadata,” in Proc. 21th ACM SIGKDD Int. Conf. His research interests include cyber-security, big
Knowl. Discovery Data Mining, Aug. 2015, pp. 985–994. data, and parallel processing.
[31] T. Mikolov, K. Chen, G. Corrado, and J. Dean, “Efficient estimation of
word representations in vector space,” 2013, arXiv:1301.3781. [Online].
Available: http://arxiv.org/abs/1301.3781
[32] T. Mikolov, I. Sutskever, K. Chen, G. S. Corrado, and J. Dean, “Distrib- Nagarajan Prabakar received the M.Eng. degree
uted representations of words and phrases and their compositionality,” in automation from the Indian Institute of Science,
in Proc. Adv. Neural Inf. Process. Syst., 2013, pp. 3111–3119. Bangalore, in 1979, and the Ph.D. degree in com-
[33] S. Basu, A. Banerjee, and R. J. Mooney, “Active semi-supervision for puter science from The University of Queensland,
pairwise constrained clustering,” in Proc. SIAM Int. Conf. Data Mining, Brisbane, QLD, Australia, in 1985.
Apr. 2004, pp. 333–344. He is currently an Associate Professor with the
[34] R. J. Hathaway, J. C. Bezdek, and J. M. Huband, “Scalable visual School of Computing and Information Sciences,
assessment of cluster tendency for large data sets,” Pattern Recognit., Florida International University, Miami, FL, USA.
vol. 39, no. 7, pp. 1315–1324, Jul. 2006. His research interests include machine learning-
[35] P. Rathore, D. Kumar, J. C. Bezdek, S. Rajasegarar, and M. Palaniswami, based object detection, anomaly detection for system
“A rapid hybrid clustering algorithm for large volumes of high dimen- security, and distributed optimization for real-world
sional data,” IEEE Trans. Knowl. Data Eng., vol. 31, no. 4, pp. 641–654, problems.
Apr. 2018.
[36] T. H. Cormen, C. E. Leiserson, R. L. Rivest, and C. Stein, Introduction
to Algorithms. Cambridge, MA, USA: MIT Press, 2009. Marimuthu Palaniswami (Life Fellow, IEEE)
[37] S. Abu-El-Haija, B. Perozzi, R. Al-Rfou, and A. A. Alemi, “Watch received the M.E. degree in electrical, electronic and
your step: Learning node embeddings via graph attention,” in Proc. Adv. control engineering (EECE) from the Indian Institute
Neural Inf. Process. Syst., 2018, pp. 9180–9190. of Science, Bengaluru, India, in 1979, the M.Eng.Sc.
[38] A. Epasto and B. Perozzi, “Is a single embedding enough? Learning degree in EECE from the University of Melbourne,
node representations that capture multiple social contexts,” in Proc. VIC, Australia, in 1983, and the Ph.D. degree from
World Wide Web Conf., 2019, pp. 394–404. the University of Newcastle, Callaghan, NSW, Aus-
[39] L. van der Maaten and G. Hinton, “Visualizing data using t-SNE,” tralia, 1987.
J. Mach. Learn. Res., vol. 9, pp. 2579–2605, Nov. 2008. He is currently a Professor with the University
[40] M. Ester, H.-P. Kriegel, J. Sander, and X. Xu, “A density-based algorithm of Melbourne. He is representing Australia as a
for discovering clusters in large spatial databases with noise,” in Proc. Core Partner in EU FP7 projects such as SENSEI,
KDD, 1996, vol. 96, no. 34, pp. 226–231. SmartSantander, IoT Initiative, and SocIoTal. He has been funded by several
[41] J. H. Ward, “Hierarchical grouping to optimize an objective function,” Australian Research Council (ARC) and industry grants (over 40 million)
J. Amer. Stat. Assoc., vol. 58, no. 301, pp. 236–244, Mar. 1963. to conduct research in sensor network, IoT, health, environmental, machine
[42] K. Wagstaff et al., “Constrained k-means clustering with background learning areas. His current research interests include sensor networks, IoT,
knowledge,” in Proc. ICML, vol. 1, 2001, pp. 577–584. machine learning, pattern recognition, and signal processing and control.
[43] P. Rathore, J. C. Bezdek, P. Santi, and C. Ratti, “ConiVAT: Clus-
ter tendency assessment and clustering with partial background
knowledge,” 2020, arXiv:2008.09570. [Online]. Available: http://arxiv. Paolo Santi received the Laurea degree and the
org/abs/2008.09570 Ph.D. degree in computer science from the Univer-
[44] E. Bair, “Semi-supervised clustering methods,” Wiley Interdiscipl. Rev., sity of Pisa, Pisa, Italy, in 1994.
Comput. Statist., vol. 5, no. 5, pp. 349–361, Sep. 2013. He is currently a Principal Research Scientist with
[45] P. J. Rousseeuw, “Silhouettes: A graphical aid to the interpretation the MIT Senseable City Laboratory, and a Senior
and validation of cluster analysis,” J. Comput. Appl. Math., vol. 20, Research with the Istituto di Informatica e Telemat-
pp. 53–65, Nov. 1987. ica, CNR, Pisa. His research interests include wire-
[46] W. Hamilton, Z. Ying, and J. Leskovec, “Inductive representation less multihop networks, vehicular networks, smart
learning on large graphs,” in Proc. Adv. Neural Inf. Process. Syst., 2017, mobility, and intelligent transportation.
pp. 1024–1034. Dr. Santi is a member of the IEEE Computer
[47] X. Liu, P.-C. Hsieh, N. Duffield, R. Chen, M. Xie, and X. Wen, Society and has recently been recognized as a Dis-
“Real-time streaming graph embedding through local actions,” in Proc. tinguished Scientist by the Association for Computing Machinery. He has been
Companion World Wide Web Conf., May 2019, pp. 285–293. involved in the technical and organizing committee of several conferences in
[48] A. Grover and J. Leskovec, “node2vec: Scalable feature learning for the field. He is/has been an Associate Editor of the IEEE T RANSACTIONS
networks,” in Proc. 22nd ACM SIGKDD Int. Conf. Knowl. Discovery ON M OBILE C OMPUTING , the IEEE T RANSACTIONS ON PARALLEL AND
Data Mining, Aug. 2016, pp. 855–864. D ISTRIBUTED S YSTEMS , and Computer Networks.
Authorized licensed use limited to: Consortium - Algeria (CERIST). Downloaded on October 31,2022 at 10:14:50 UTC from IEEE Xplore. Restrictions apply.