Identifying Fake Facebook Profiles Using Data Mining Techniques
Identifying Fake Facebook Profiles Using Data Mining Techniques
net/publication/336336912
CITATIONS READS
2 530
2 authors:
Some of the authors of this publication are also working on these related projects:
All content following this page was uploaded by Ahmad Altamimi on 20 October 2019.
Abstract. Facebook, the popular online social network, has changed our lives.
Users can create a customized profile to share information about themselves with
others that have agreed to be their ‘friend’. However, this gigantic social network
can be misused for carrying out malicious activities. Facebook faces the problem
of fake accounts that enable scammers to violate users’ privacy by creating fake
profiles to infiltrate personal social networks. Many techniques have been
proposed to address this issue. Most of them are based on detecting fake
profiles/accounts, considering the characteristics of the user profile. However,
the limited profile data made publicly available by Facebook makes it ineligible
for applying the existing approaches in fake profile identification. Therefore, this
research utilized data mining techniques to detect fake profiles. A set of
supervised (ID3 decision tree, k-NN, and SVM) and unsupervised (k-Means and
k-medoids) algorithms were applied to 12 behavioral and non-behavioral
discriminative profile attributes from a dataset of 982 profiles. The results
showed that ID3 had the highest accuracy in the detection process while k-
medoids had the lowest accuracy.
1 Introduction
In 2003, Mark Zuckerberg started work on a new concept, which eventually
turned into the global social network known as Facebook. Since then, Facebook
has expanded over the whole world, reaching more than 2.3 billion monthly
active users as of December 2018 [1]. A tool such as this changes the way
people interact with each other.
Received October 13th, 2018, Revised December 12th, 2018, Accepted for publication August 20th, 2019.
Copyright © 2019 Published by ITB Journal Publisher, ISSN: 2337-5787, DOI: 10.5614/itbj.ict.res.appl.2019.13.2.2
108 Mohammed Basil Albayati & Ahmad Mousa Altamimi.
platform for so many people, the privacy of users can be the target of scammers
[3], for example by creating fake profiles using false information to impersonate
the victim in order to steal valuable information or using the user’s contacts for
abusive actions such as financial fraud [8,9].
The rest of this paper is structured as follows. Section 2 presents related works,
the material and methodology are discussed in Section 3, while Section 4
illustrates the experiment and the obtained results. In Section 5, a discussion
about the experimental results is given. Finally, the conclusion of this paper is
given in Section 6.
2 Related Works
Many approaches have been proposed for detecting the phenomenon of fake
profiles on online social networks. Most of them employed supervised
algorithms to analyze fake profiles from different perspectives. The authors of
[11] proposed a model that employs supervised algorithms (SVM, Naïve Bayes,
and Decision Tree) to exploit profile attributes (e.g. ‘number of friends’,
‘education and work’, ‘gender’, and others). The proposed model was
implemented using Python scripts on a dataset of 975 profiles extracted from
one Facebook account. However, collecting profiles from one account may lead
to inaccurate results and may give mistaken observations.
In contrast, the authors of [12] collected their dataset using the Facebook API.
The dataset consisted mainly of behavioral attributes (‘user online activities’
and ‘user interactions’). These attributes were characterized through a set of 17
attributes, after which a total of 12 supervised machine-learning techniques
were applied to the dataset. The results showed an accuracy of 79%, which is
not sufficient. The works [13,14] used a similar approach. Ref. [13] for example
utilized three supervised algorithms (Naive Bayes, Jrip, and Decision Tree J48)
to identify spam profiles on Facebook and Twitter based on a set of 14 generic
features (attributes). Moreover, the algorithms were also used to discover the
impact of each attribute on the classification process. On the other hand, Ref.
[14] proposed add-on software implemented in the Firefox browser. SW utilized
Mining Techniques for Detecting Fake Facebook Profiles 109
Our work is different from the presented related works in several aspects.
Firstly, our work utilized both supervised and unsupervised learning techniques
in order to detect fake Facebook profiles. Secondly, most of the presented works
utilized behavioral-based attributes, for example [13,14], whereas other works
used non-behavioral attributes, for example [17] for detecting fake LinkedIn
profiles using supervised mining techniques. In this work, the two types of
attributes (behavioral and non-behavioral) were considered using both
supervised and unsupervised techniques.
3 Methodology
5 Discussion
As shown in the previous section, the supervised algorithms outperformed the
unsupervised algorithms. However, before justifying these results, some
important points should be mentioned. Firstly, the model depends on the
informative attributes to make a decision. These attributes are illustrated in
Figure 1. As can be seen, the ‘mutual friends’ attribute is the most informative,
while the ‘introduction’ attribute is the least informative. Secondly, we note that
some attributes had the same values in both real and fake profiles. For example,
fake profiles typically have zero tags, zero posts, and high liking activity.
Unfortunately, many real profiles have the same values, which misleads the
classification techniques.
Figure 2 (1-5) illustrates the histogram charts for the interfered attributes with
respect to the two-class labels (fake and real). Thus, the algorithm that is
capable of handling the interfered attributes correctly will make the most
accurate decisions. Accordingly, in the next subsections, we will justify the
performance by explaining how each technique resolved the interfered
attributes.
114 Mohammed Basil Albayati & Ahmad Mousa Altamimi.
Moreover, the unsupervised algorithms had low accuracy rates because these
clustering techniques handle the dataset as a single unit and group profiles with
similar attributes in one cluster. Because of this, a problem with the interfered
attributes emerged, where some of the informative attributes were not clustered
into different clusters. Thus, these techniques could not correctly cluster profiles
into fake and real.
6 Conclusion
This work considered the detection of fake Facebook profiles using data-mining
techniques. A model was proposed that utilizes 5 supervised and unsupervised
techniques with 12 discriminative (behavioral and non-behavioral) attributes.
RapidMiner Studio 8.0.1 was employed to conduct an experiment to evaluate
the accuracy of the model based on a dataset with 982 profiles (781 real, and
201 fake). The supervised algorithms outperformed the unsupervised algorithms
and showed high and promising accuracy rates in all experiments. More
specifically, the ID3 decision tree exhibited the highest accuracy among all
algorithms and all unsupervised algorithms showed a relatively similar low
accuracy. A deep explanation of these results was given at the end of this paper.
Acknowledgment
The authors are grateful to the Applied Science Private University, Amman-
Jordan, for the full financial support granted to cover the publication fee of this
research article.
116 Mohammed Basil Albayati & Ahmad Mousa Altamimi.
References
[1] Smith, A.N., Fischer, E. & Yongjian, C., How Does Brand-related User-
generated Content Differ Across YouTube, Facebook, and Twitter?,
Journal of Interactive Marketing, 26(2), pp. 102-113, 2012.
[2] Romero, D.M., Galuba, W., Asur, S. & Bernardo, A., Influence, and
Passivity in Social Media, in Proceedings of the 20th International
Conference Companion on World Wide Web, ACM, pp. 113-114,2011.
[3] Obar, J.A. & Wildman, S.S., Social Media Definition, and The
Governance Challenge: An Introduction to the Special Issue, 2015. DOI:
10.1016/j.telpol.2015.07.014.
[4] Kaplan, A.M. & Haenlein, M., Users of the World, Unite! The
Challenges and Opportunities of Social Media, Business Horizons, 53(1),
pp. 59-68, 2010.
[5] Eugene, A., Castillo, C., Donato, D., Gionis, A. & Mishne, G., Finding
High-Quality Content in Social Media, In Proceedings of the 2008
International Conference on Web Search and Data Mining, ACM, pp.
183-194, 2008
[6] O’Keeffe, Schurgin, G. & Pearson, K.C., The Impact of Social Media on
Children, Adolescents, and Families, Pediatrics. 127(4), pp. 800-804,
2011.
[7] Qian, T., Gu, B. & Whinston, A.B., Content Contribution for Revenue
Sharing and Reputation in Social Media: A Dynamic Structural Model,
Journal of Management Information Systems, 29(2), pp. 41-76, 2012.
[8] Kontaxis, Georgios, Polakis, I., Ioannidis, S. & Markatos, E.P., Detecting
Social Network Profile Cloning, In Pervasive Computing and
Communications Workshops (PERCOM Workshops), 2011 IEEE
International Conference on, pp. 295-300. IEEE, 2011.
[9] Wani, M.A, Jabin, S. & Ahmad, N., A Sneak into the Devil’s Colony-
Fake Profiles in Online Social Networks, arXiv preprint
arXiv:1705.09929 ,2017.
[10] RapidMiner. https://rapidminer.com/ (3rd August 2019).
[11] Kumar, N. & Reddy, R.N., Automatic Detection of Fake Profiles in
Online Social Networks." Ph.D. diss., National Institute of Technology
Rourkela, 2012.
[12] Gupta, A. & Kaushal, R., Towards Detecting Fake User Accounts in
Facebook, In Asia Security and Privacy (ISEASP), 2017 ISEA, pp. 1-6.
IEEE, 2017.
[13] Ahmed, F. & Abulaish, M., A Generic Statistical Approach for Spam
Detection in Online Social Networks, Computer Communications,
36(10), pp. 1120-1129, 2013.
Mining Techniques for Detecting Fake Facebook Profiles 117
[14] Fire, M., Kagan, D., Elyashar & Elovici, Y, Friend or Foe? Fake Profile
Identification in Online Social Networks, Social Network Analysis and
Mining, 4(1), pp. 194-210, 2014.
[15] Xiaoyun, W., Lai, C.M., Hong, Y., Hsieh, C.J. & Wu, S.F., Multiple
Accounts Detection on Facebook Using Semi-Supervised Learning on
Graphs, arXiv preprint arXiv:1801.09838, 2018.
[16] Bimal, V., Bashir, M.A., Crovella, M., Guha, S., Gummadi, K.P.,
Krishnamurthy, B. & Mislove, A., Towards Detecting Anomalous User
Behavior in Online Social Networks, In USENIX Security Symposium,
pp. 223-238, 2014.
[17] Shalinda, A. & Dutta, K., Identifying Fake Profiles in LinkedIn,
In PACIS, pp. 278. 2014.
[18] Nazir. A., Raza, S., Chuah, C.N., Schipper, B. & Davis, C.A.,
Ghostbusting Facebook: Detecting and Characterizing Phantom Profiles
in Online Social Gaming Applications, In WOSN, 2010.
[19] Yousuf, B.S. & Abulaish, M., Community-Based Features for Identifying
Spammers in Online Social Networks, In Advances in Social Networks
Analysis and Mining (ASONAM), 2013 IEEE/ACM International
Conference on, pp. 100-107. IEEE, 2013.
[20] Jiawei, H., Pei, J. & Kamber, M., Data Mining: Concepts and
Techniques, Elsevier, 2011.
[21] Karel, H., Templ, M. & Filzmoser, P., Imputation of Missing Values for
Compositional Data Using Classical and Robust Methods,
Computational Statistics & Data Analysis, 54(12), pp. 3095-3107, 2010.