Detection of Fake and Clone Accounts in Twitter Using Classification and
Detection of Fake and Clone Accounts in Twitter Using Classification and
Abstract—Online Social Network (OSN) is a network hub the social networks and easily fall prey to these attacks. The
where people with similar interests or real world relationships risks are more dangerous if the victims are children. In Profile
interact. As the popularity of OSN is increasing, the security and Cloning attack, the profile information of existing users are
privacy issues related to it are also rising. Fake and Clone profiles
are creating dangerous security problems to social network users.
stolen to create duplicate profiles and these profiles are
Cloning of user profiles is one serious threat, where already misused for spoiling the identity of original profile owners[1-
existing user’s details are stolen to create duplicate profiles and 6]. There are two types of Profile Cloning namely - Same Site
then it is misused for damaging the identity of original profile and Cross Site Profile Cloning[1,7-9].
owner. They can even launch threats like phishing, stalking, If user credentials are taken from one Network to create a
spamming etc. Fake profile is the creation of profile in the name clone profile in same Network then it is called Same Site
of a person or a company which does not really exist in social
media, to carry out malicious activities. In this paper, a detection profile cloning[1,10-12]. In Cross Site profile cloning, attacker
method has been proposed which can detect Fake and Clone takes the user information from one Network to create a
profiles in Twitter. Fake profiles are detected based on set of duplicate profile in other Network in which the user is not
rules that can effectively classify fake and genuine profiles. For having any account[1,13-15].
Profile Cloning detection two methods are used. One using As the registration process in social networks have become
Similarity Measures and the other using C4.5 decision tree
very simple in order to attract more and more users, the
algorithm. In Similarity Measures, two types of similarities are
considered – Similarity of Attributes and Similarity of Network creation of fake profiles are also increasing in an alarming rate.
relationships. C4.5 detects clones by building decision tree by An attacker creates a fake profile in order to connect to a
taking information gain into consideration. A comparison is made victim to cause malicious activities. And also to spread fake
to check how well these two methods help in detecting clone news and spam messages.
profiles. The paper organized as below. Section II describes the
literature survey. Section III explains the proposed
Index Terms—Clone, C4.5, Fake, Identity Theft, Online Social
methodology. Section IV discusses the results. At last, Section
Networks, OSN
V concludes the paper with the conclusion.
Authorized licensed use limited to: Cornell University Library. Downloaded on September 02,2020 at 04:21:21 UTC from IEEE Xplore. Restrictions apply.
Brodka, Mateusz Sobas and Henric Johnson in their paper
[3] have proposed two novel methods for detecting cloned
profiles. The first method is based on the similarity of attribute
values from original and cloned profiles and the second
method is based on the network relationships. A person who
doubts that his profile has been cloned will be chosen as a
victim. Then treating name as primary key, a search is made
for profiles with the same name as that of victim, using query
search. Potential clone (Pc) and the Victim profile (Pv) are
compared and similarity S is calculated. If S(Pc, Pv) >
Threshold, then profile is suspected to be a clone. In the
verification step, the user does it manually as he knows which
is his original profile and which one is a duplicate.
Fig. 1. Architecture of proposed system.
Cresci S, Di Pietro R, Petrocchi M, Spognardi A, Tesconi
M [4], in their paper have reviewed some of the most relevant They usually make large number of tweets or sometimes
existing features and rules (proposed by Academia and Media) the profiles would not have made any tweets etc. The rules are
for fake Twitter accounts detection. They have used these rules applied on the profile, for each matching rule, a counter is
and features to train a set of machine learning classifiers. Then incremented, if the counter value is greater than pre-defined
they have come up with Class A classifier which can threshold, then the profile is termed as fake.
effectively classify original and fake accounts. B. Clone Profile Detection using Similarity Measures
Ahmed El Azab, Amira M Idrees, Mahmoud A Mahmoud,
This module detects clones based on Attribute and Network
Hesham Hefny [5], have proposed a classification method for
similarity. User profile is taken as input. User identifying
detecting fake accounts on Twitter. They have collected some
information are extracted from the profile. Profiles which are
effective features for the detection process from different
having attributes matching to that of user’s profile are
research and have filtered and weighted them in first stage.
searched. Similarity index is calculated and if the similarity
Various experiments are conducted to get minimum set of
index is greater than the threshold, then the profile is termed as
attributes which gives accurate results. From 22 attributes,
clone, else normal[1].
only seven attributes were selected which can effectively
detect fake accounts and have applied these factors on
i) Attribute Similarity
classification techniques. A comparison of the classification
Attribute similarity is calculated based on the similarity
techniques based on results are made and the one which
of attribute values between the profiles. The attributes that are
provides most accurate result is selected.
considered for similarity measurement are Name,
ScreenName, Language, Location and Time_zone. Two
III. PROPOSED SYSTEM
similarity measures are used to measure the similarity between
Fake and clone profiles have become a very serious social the attributes – Cosine similarity and Levenshtein distance.
threat. As information like phone number, email id, school or Cosine similarity is used to find similarity between words and
college name, company name, location etc are readily exposed Levenshtein distance is used to find similarity between two
in social networks, hackers can easily hack this information to
sequences.
create fake or clone profiles. They then try to cause various
Cosine similarity formula is given by equation (1)
attacks like phishing, spamming, cyberbullying etc. They even
try to defame the legitimate owner or the organisation. So, a cos (θ) = (1)
detection method has been proposed which can detect both
fake and clone profiles in order to make the social life of the where Ai and Bi are two non-zero vectors [1].
users more secure. The architecture of proposed system is as Two vectors have a cosine similarity of 1 if they are
shown in Fig. 1. with the same orientation; have a similarity of 0 if they are at
The proposed architecture consists of modules for Fake 90° and -1 if they are diametrically opposed [1]. Levenshtein
Profile detection and Clone Profile detection. distance is a similarity measuring metric to find similarity
between two sequences.
A. Fake Profile Detection
If two sequences are given, the Levenshtein distance
This module is used to detect fake Twitter profiles. Here between them is the minimum number of insert, delete or
fake profiles are detected based on rules that effectively substitution operations required to change one sequence into
distinguish fake profiles from genuine ones. Some of the rules another. Mathematically, the Levenshtein distance between
that are used to detect fake profiles are - usually fake profiles two strings a, b of length i and j respectively is given by
do not have profile name or image. They do not include any equation (2)
description about the account. The geo-enabled field will be
false as they do not want to expose their location in tweets.
0068
Authorized licensed use limited to: Cornell University Library. Downloaded on September 02,2020 at 04:21:21 UTC from IEEE Xplore. Restrictions apply.
purchased from three different Twitter online markets namely
fastfollowerz.com, intertwitter.com and twittertechnology.com
[4].
on given data. At each node of tree, the attribute that most No. of fake records detected by rule as fake 990
effectively splits the sample sets into subsets is chosen. (TP)
No. of genuine records detected by rule as 105
The splitting factors used in C4.5 are information gain fake (FP)
and entropy. The attribute with highest information gain is No. of fake records detected by rule as 110
chosen to make decision and then it re-curses over the genuine (FN)
No. of genuine records detected by rule as 995
partitioned sub-trees. The information gain as shown in genuine (TN)
equation (4)
TABLE II
Info (D) = - ∑_(i=1)^n ᇾPi log2 Piᇿ (4) PERFORMANCE EVALUATION OF CLONE DETECTION USING SIMILARITY
where, Pi refers to probability. MEASURES
C4.5 algorithm find the similarity between the attributes Total no. of records checked 800
by building a tree-like structure. The given profile is compared
against the profiles which are already in the database. If the No. of normal records detected by system as 769
normal (TN)
given profile matches with any of the profiles in database, then No. of normal records detected by system as 11
the profile is termed as clone, else normal. clone (FN)
No. of clone records detected by system as 2
normal (FP)
IV. EXPERIMENTS AND RESULTS No. of clone records detected by system as 18
clone (TP)
A. Datasets Used
For detection of fake profiles, a total of 2200 accounts were
The datasets used in the experiment are collected from MIB fed into the system in which 1100 were genuine and 1100 were
projects. It consists of Genuine and Fake Twitter datasets. The fake. The rule set worked fine and was able to classify genuine
Genuine accounts dataset contains accounts of people who and fake accounts with an accuracy of 90.2% shown in Fig. 2.
came forward to be part of academic study for detecting fake Table I gives the performance evaluation of fake detection
accounts on Twitter and it is mostly a mixture of accounts of module.
researchers, social experts and journalists from Italy, US and For detection of clone profiles, 780 normal profiles along
other European countries[4]. The fake accounts were with 20 artificially generated clone profiles were fed to the
0069
Authorized licensed use limited to: Cornell University Library. Downloaded on September 02,2020 at 04:21:21 UTC from IEEE Xplore. Restrictions apply.
modules to check how accurately it detects clone profiles from Clone detection was carried out using Similarity Measures
the given set. The modules worked fine and was able to detect and C4.5 algorithm and a comparison was made to check the
clones with good accuracy. Table II and Table III gives the performance. Clone detection using Similarity Measures
performance evaluation of Clone Detection using Similarity worked better than C4.5 and was able to detect most of the
Measures and using C4.5 respectively. clones which were fed into the system. In this work we have
TABLE III considered only the profile attributes for fake and clone
PERFORMANCE EVALUATION OF CLONE DETECTION USING C4.5 detection. In future this work can be extended by taking tweets
Total no. of records checked 800 also into consideration by applying some NLP techniques.
No. of normal records detected by system as 765
normal (TN) REFERENCES
No. of normal records detected by system as 15 [1] Sowmya P and Madhumita Chatterjee ,” Detection of Fake and Cloned
clone (FN) Profiles in Online Social Networks”, Proceedings 2019: Conference on
No. of clone records detected by system as 4 Technologies for Future Cities (CTFC)
normal (FP) [2] Georgios Kontaxis, Iasonas Polakis, Sotiris Ioannidis and Evangelos P.
No. of clone records detected by system as 16 Markatos, “Detecting Social Network Profile Cloning”, 2013
clone (TP) [3] Piotr Bródka, Mateusz Sobas and Henric Johnson, “Profile Cloning
Detection in Social Networks”, 2014 European Network Intelligence
Conference
[4] Stefano Cresci, Roberto Di Pietro, Marinella Petrocchi, Angello
Spognardi, Maurizio Tesconi, “Fame for sale: Efficient detection of fake
Twitter followers”, 2015 Elsevier’s journal Decision Support Systems,
Volume 80
[5] Ahmed El Azab, Amira M Idrees, Mahmoud A Mahmoud, Hesham
Hefny, “Fake Account Detection in Twitter Based on Minimum
Weighted Feature set”, World Academy of Science, Engineering and
Technology, International Journal of Computer and Information
Engineering Vol:10, 2016
[6] M.A.Devmane and N.K.Rana, “Detection and Prevention of Profile
Cloning in Online Social Networks”, 2014 IEEE International
Conference on Recent Advances and Innovations in Engineering
[7] Kiruthiga. S, Kola Sujatha. P and Kannan. A, “Detecting Cloning
Attack in Social Networks Using Classification and Clustering
Techniques” 2014 International Conference on Recent Trends in
Information Technology
[8] Buket Erşahin, Ozlem Aktaş, Deniz Kilinç, Ceyhun Akyol, “Twitter
fake account detection”, 2017 International Conference on Computer
Science and Engineering (UBMK)
[9] Arpitha D, Shrilakshmi Prasad, Prakruthi S, Raghuram A.S, “Python
based Machine Learning for Profile Matching”, International Research
Journal of Engineering and Technology (IRJET), 2018
[10] Olga Peled, Michael Fire, Lior Rokach, Yuval Elovici, “Entity Matching
in Online Social Networks”, 2013 International Conference on Social
Computing
Fig. 2. Performance Evaluation Result. [11] Aditi Gupta and Rishabh Kaushal, “Towards Detecting Fake User
Accounts in Facebook”, 2017 ISEA Asia Security and Privacy
Results of Table II and Table III shows that 18 out of 20 (ISEASP)
clones were detected using similarity measures whereas only [12] Michael Fire, Roy Goldschmidt, Yuval Elovici, “Online Social
Networks: Threats and Solutions”, JOURNAL OF LATEX CLASS
16 clones were detected using C4.5 classification algorithm.
FILES, VOL. 11, NO. 4, DECEMBER 2012, IEEE Communications
So it can be concluded that clone detection using similarity Surveys & Tutorials
measures gives better results as compared to that of using C4.5 [13] Ashraf Khalil, Hassan Hajjdiab and Nabeel Al-Qirim, “Detecting Fake
classification algorithm. Followers in Twitter: A Machine Learning Approach” 2017
International Journal of Machine Learning and Computing
[14] Mohammad Reza Khayyambashi and Fatemeh Salehi Rizi, “An
V. CONCLUSION approach for detecting profile cloning in online social networks” 2013
International Conference on e-Commerce in Developing Countries: with
Fake and clone profiles have become a very serious problem focus on e-Security
in online social networks. We hear some or the other threats [15] Mauro Conti, Radha Poovendran and Marco Secchiero, “FakeBook:
caused by these profiles in everyday life. So a detection Detecting Fake Profiles in On-line Social Networks”, 2012 IEEE/ACM
International Conference on Advances in Social Networks Analysis and
method has been proposed which can find both fake and clone Mining
Twitter profiles. For fake detection, a set of rules were used
which when applied can classify fake and genuine profiles.
0070
Authorized licensed use limited to: Cornell University Library. Downloaded on September 02,2020 at 04:21:21 UTC from IEEE Xplore. Restrictions apply.