0% found this document useful (0 votes)
69 views

Detection of Fake and Clone Accounts in Twitter Using Classification and

Very useful

Uploaded by

nikhil vardhan
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
69 views

Detection of Fake and Clone Accounts in Twitter Using Classification and

Very useful

Uploaded by

nikhil vardhan
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

International Conference on Communication and Signal Processing, July 28 - 30, 2020, India

Detection of Fake and Clone accounts in Twitter


using Classification and Distance Measure
Algorithms
Sowmya P and Madhumita Chatterjee


Abstract—Online Social Network (OSN) is a network hub the social networks and easily fall prey to these attacks. The
where people with similar interests or real world relationships risks are more dangerous if the victims are children. In Profile
interact. As the popularity of OSN is increasing, the security and Cloning attack, the profile information of existing users are
privacy issues related to it are also rising. Fake and Clone profiles
are creating dangerous security problems to social network users.
stolen to create duplicate profiles and these profiles are
Cloning of user profiles is one serious threat, where already misused for spoiling the identity of original profile owners[1-
existing user’s details are stolen to create duplicate profiles and 6]. There are two types of Profile Cloning namely - Same Site
then it is misused for damaging the identity of original profile and Cross Site Profile Cloning[1,7-9].
owner. They can even launch threats like phishing, stalking, If user credentials are taken from one Network to create a
spamming etc. Fake profile is the creation of profile in the name clone profile in same Network then it is called Same Site
of a person or a company which does not really exist in social
media, to carry out malicious activities. In this paper, a detection profile cloning[1,10-12]. In Cross Site profile cloning, attacker
method has been proposed which can detect Fake and Clone takes the user information from one Network to create a
profiles in Twitter. Fake profiles are detected based on set of duplicate profile in other Network in which the user is not
rules that can effectively classify fake and genuine profiles. For having any account[1,13-15].
Profile Cloning detection two methods are used. One using As the registration process in social networks have become
Similarity Measures and the other using C4.5 decision tree
very simple in order to attract more and more users, the
algorithm. In Similarity Measures, two types of similarities are
considered – Similarity of Attributes and Similarity of Network creation of fake profiles are also increasing in an alarming rate.
relationships. C4.5 detects clones by building decision tree by An attacker creates a fake profile in order to connect to a
taking information gain into consideration. A comparison is made victim to cause malicious activities. And also to spread fake
to check how well these two methods help in detecting clone news and spam messages.
profiles. The paper organized as below. Section II describes the
literature survey. Section III explains the proposed
Index Terms—Clone, C4.5, Fake, Identity Theft, Online Social
methodology. Section IV discusses the results. At last, Section
Networks, OSN
V concludes the paper with the conclusion.

I. INTRODUCTION II. LITERATURE SURVEY


Today, Fake and Clone profiles have become a very serious
O NLINE Social Networks (OSN) like Facebook, Twitter,
LinkedIn, Instagram etc are used by billions of users all
around the world to build network connections. The ease and
threat in social networks. So, a detection method is very much
necessary to find these frauds who use people’s faith to gather
accessibility of social networks have created a new era of private information and create duplicate profiles. Many
networking. OSN users share a lot of information in the authors have worked in this area and have proposed methods
network like photos, videos, school name, college name, to identify these type of profiles in social networks. Some of
phone numbers, email address, home address, family relations, these methods are discussed below.
bank details, career details etc. This information if put into Georgios Kontaxis, Iasonas Polakis, Sotiris Ioannidis and
hands of attackers, the after effects are very severe. Most of Evangelos P Markatos [2] have proposed a prototype to check
the OSN users are unaware of the security threats that exist in whether the users have become victim to cloning attack or not.
Information is extracted from user profile and a search is made
in OSN to find profiles which match to that of user profile and
a similarity score is calculated based on commonality of
Sowmya P is with the Department of Computer Engineering,
Pillai College of Engineering, University of Mumbai, Maharashtra, India (e- attribute values. If the similarity score is above the threshold
mail: [email protected]). value then the particular profile is termed as clone.
Madhumita Chatterjee is with the Department of Computer Engineering,
Pillai HOC College of Engineering and Technology, University of Mumbai,
Maharashtra, India (e-mail: [email protected]).

978-1-7281-4988-2/20/$31.00 ©2020 IEEE 0067

Authorized licensed use limited to: Cornell University Library. Downloaded on September 02,2020 at 04:21:21 UTC from IEEE Xplore. Restrictions apply.
Brodka, Mateusz Sobas and Henric Johnson in their paper
[3] have proposed two novel methods for detecting cloned
profiles. The first method is based on the similarity of attribute
values from original and cloned profiles and the second
method is based on the network relationships. A person who
doubts that his profile has been cloned will be chosen as a
victim. Then treating name as primary key, a search is made
for profiles with the same name as that of victim, using query
search. Potential clone (Pc) and the Victim profile (Pv) are
compared and similarity S is calculated. If S(Pc, Pv) >
Threshold, then profile is suspected to be a clone. In the
verification step, the user does it manually as he knows which
is his original profile and which one is a duplicate.
Fig. 1. Architecture of proposed system.
Cresci S, Di Pietro R, Petrocchi M, Spognardi A, Tesconi
M [4], in their paper have reviewed some of the most relevant They usually make large number of tweets or sometimes
existing features and rules (proposed by Academia and Media) the profiles would not have made any tweets etc. The rules are
for fake Twitter accounts detection. They have used these rules applied on the profile, for each matching rule, a counter is
and features to train a set of machine learning classifiers. Then incremented, if the counter value is greater than pre-defined
they have come up with Class A classifier which can threshold, then the profile is termed as fake.
effectively classify original and fake accounts. B. Clone Profile Detection using Similarity Measures
Ahmed El Azab, Amira M Idrees, Mahmoud A Mahmoud,
This module detects clones based on Attribute and Network
Hesham Hefny [5], have proposed a classification method for
similarity. User profile is taken as input. User identifying
detecting fake accounts on Twitter. They have collected some
information are extracted from the profile. Profiles which are
effective features for the detection process from different
having attributes matching to that of user’s profile are
research and have filtered and weighted them in first stage.
searched. Similarity index is calculated and if the similarity
Various experiments are conducted to get minimum set of
index is greater than the threshold, then the profile is termed as
attributes which gives accurate results. From 22 attributes,
clone, else normal[1].
only seven attributes were selected which can effectively
detect fake accounts and have applied these factors on
i) Attribute Similarity
classification techniques. A comparison of the classification
Attribute similarity is calculated based on the similarity
techniques based on results are made and the one which
of attribute values between the profiles. The attributes that are
provides most accurate result is selected.
considered for similarity measurement are Name,
ScreenName, Language, Location and Time_zone. Two
III. PROPOSED SYSTEM
similarity measures are used to measure the similarity between
Fake and clone profiles have become a very serious social the attributes – Cosine similarity and Levenshtein distance.
threat. As information like phone number, email id, school or Cosine similarity is used to find similarity between words and
college name, company name, location etc are readily exposed Levenshtein distance is used to find similarity between two
in social networks, hackers can easily hack this information to
sequences.
create fake or clone profiles. They then try to cause various
Cosine similarity formula is given by equation (1)
attacks like phishing, spamming, cyberbullying etc. They even
try to defame the legitimate owner or the organisation. So, a cos (θ) = (1)
detection method has been proposed which can detect both
fake and clone profiles in order to make the social life of the where Ai and Bi are two non-zero vectors [1].
users more secure. The architecture of proposed system is as Two vectors have a cosine similarity of 1 if they are
shown in Fig. 1. with the same orientation; have a similarity of 0 if they are at
The proposed architecture consists of modules for Fake 90° and -1 if they are diametrically opposed [1]. Levenshtein
Profile detection and Clone Profile detection. distance is a similarity measuring metric to find similarity
between two sequences.
A. Fake Profile Detection
If two sequences are given, the Levenshtein distance
This module is used to detect fake Twitter profiles. Here between them is the minimum number of insert, delete or
fake profiles are detected based on rules that effectively substitution operations required to change one sequence into
distinguish fake profiles from genuine ones. Some of the rules another. Mathematically, the Levenshtein distance between
that are used to detect fake profiles are - usually fake profiles two strings a, b of length i and j respectively is given by
do not have profile name or image. They do not include any equation (2)
description about the account. The geo-enabled field will be
false as they do not want to expose their location in tweets.

0068

Authorized licensed use limited to: Cornell University Library. Downloaded on September 02,2020 at 04:21:21 UTC from IEEE Xplore. Restrictions apply.
purchased from three different Twitter online markets namely
fastfollowerz.com, intertwitter.com and twittertechnology.com
[4].

(2) B. Evaluation Metrics


ii) Network Similarity In order to evaluate the performance of the system, various
Network similarity is calculated based on network evaluation metrics are used based on following four standard
relationships[1]. Here, Followers_ids attribute is used to find indicators
the network similarity between the profiles. Followers_ids • True Positive (TP): True positives are records that are
gives the list of accounts which follows the user. The clone correctly detected with expected vectors.
profile always try to connect to same set of users as that of • True Negative (TN): True negatives are records correctly
legitimate owner in order to show that it is genuine one. So, by detected expected as Neutral.
comparing the Followers_ids of two profiles, we can find • False Positive (FP): False positives are records that were
detected by the system as expected but actually are listed in the
whether they are similar with respect to network relationships
other vectors.
or not.
• False Negative (FN): False negatives are records not
Network similarity is calculated as given in equation (3)
detected by the system.
NetSim (Pv, Pc) = (|MFF vc|)/(√|Fv|.|Fc|) (3) The evaluation metrics considered are
where:[3] 1. Accuracy which gives the ratio of number of correct
NetSim - Network Similarity results to the total number of inputs
Pv - Profile of victim 2. Precision which gives the proportion of positive
Pc - Profile of clone detection that was actually correct
MFFcv - Set of matching Followers_ids of Pv and Pc 3. Recall which gives the proportion of actual positives
Fv - Set of Followers_ids of Pv that was detected correctly
Fc - Set of Followers_ids of Pc 4. F1 Score which takes into account both precision and
If the NetSim value is greater than the threshold, then recall to compute the score. F1-score is given by
the profile is treated as clone, else normal[1]. harmonic mean of precision and recall. If F1-score is
1, then it is best value and worst is 0.
C. Clone Profile Detection using C4.5 algorithm
TABLE I
In this module, C4.5 algorithm is used to detect whether the PERFORMANCE EVALUATION OF FAKE DETECTION
given profile is a clone or not. C4.5 is a decision tree
algorithm used for classification. It builds a decision tree based Total no. of records checked 2200

on given data. At each node of tree, the attribute that most No. of fake records detected by rule as fake 990
effectively splits the sample sets into subsets is chosen. (TP)
No. of genuine records detected by rule as 105
The splitting factors used in C4.5 are information gain fake (FP)
and entropy. The attribute with highest information gain is No. of fake records detected by rule as 110
chosen to make decision and then it re-curses over the genuine (FN)
No. of genuine records detected by rule as 995
partitioned sub-trees. The information gain as shown in genuine (TN)
equation (4)
TABLE II
Info (D) = - ∑_(i=1)^n ᇾPi log2 Piᇿ (4) PERFORMANCE EVALUATION OF CLONE DETECTION USING SIMILARITY
where, Pi refers to probability. MEASURES
C4.5 algorithm find the similarity between the attributes Total no. of records checked 800
by building a tree-like structure. The given profile is compared
against the profiles which are already in the database. If the No. of normal records detected by system as 769
normal (TN)
given profile matches with any of the profiles in database, then No. of normal records detected by system as 11
the profile is termed as clone, else normal. clone (FN)
No. of clone records detected by system as 2
normal (FP)
IV. EXPERIMENTS AND RESULTS No. of clone records detected by system as 18
clone (TP)
A. Datasets Used
For detection of fake profiles, a total of 2200 accounts were
The datasets used in the experiment are collected from MIB fed into the system in which 1100 were genuine and 1100 were
projects. It consists of Genuine and Fake Twitter datasets. The fake. The rule set worked fine and was able to classify genuine
Genuine accounts dataset contains accounts of people who and fake accounts with an accuracy of 90.2% shown in Fig. 2.
came forward to be part of academic study for detecting fake Table I gives the performance evaluation of fake detection
accounts on Twitter and it is mostly a mixture of accounts of module.
researchers, social experts and journalists from Italy, US and For detection of clone profiles, 780 normal profiles along
other European countries[4]. The fake accounts were with 20 artificially generated clone profiles were fed to the

0069

Authorized licensed use limited to: Cornell University Library. Downloaded on September 02,2020 at 04:21:21 UTC from IEEE Xplore. Restrictions apply.
modules to check how accurately it detects clone profiles from Clone detection was carried out using Similarity Measures
the given set. The modules worked fine and was able to detect and C4.5 algorithm and a comparison was made to check the
clones with good accuracy. Table II and Table III gives the performance. Clone detection using Similarity Measures
performance evaluation of Clone Detection using Similarity worked better than C4.5 and was able to detect most of the
Measures and using C4.5 respectively. clones which were fed into the system. In this work we have
TABLE III considered only the profile attributes for fake and clone
PERFORMANCE EVALUATION OF CLONE DETECTION USING C4.5 detection. In future this work can be extended by taking tweets
Total no. of records checked 800 also into consideration by applying some NLP techniques.
No. of normal records detected by system as 765
normal (TN) REFERENCES
No. of normal records detected by system as 15 [1] Sowmya P and Madhumita Chatterjee ,” Detection of Fake and Cloned
clone (FN) Profiles in Online Social Networks”, Proceedings 2019: Conference on
No. of clone records detected by system as 4 Technologies for Future Cities (CTFC)
normal (FP) [2] Georgios Kontaxis, Iasonas Polakis, Sotiris Ioannidis and Evangelos P.
No. of clone records detected by system as 16 Markatos, “Detecting Social Network Profile Cloning”, 2013
clone (TP) [3] Piotr Bródka, Mateusz Sobas and Henric Johnson, “Profile Cloning
Detection in Social Networks”, 2014 European Network Intelligence
Conference
[4] Stefano Cresci, Roberto Di Pietro, Marinella Petrocchi, Angello
Spognardi, Maurizio Tesconi, “Fame for sale: Efficient detection of fake
Twitter followers”, 2015 Elsevier’s journal Decision Support Systems,
Volume 80
[5] Ahmed El Azab, Amira M Idrees, Mahmoud A Mahmoud, Hesham
Hefny, “Fake Account Detection in Twitter Based on Minimum
Weighted Feature set”, World Academy of Science, Engineering and
Technology, International Journal of Computer and Information
Engineering Vol:10, 2016
[6] M.A.Devmane and N.K.Rana, “Detection and Prevention of Profile
Cloning in Online Social Networks”, 2014 IEEE International
Conference on Recent Advances and Innovations in Engineering
[7] Kiruthiga. S, Kola Sujatha. P and Kannan. A, “Detecting Cloning
Attack in Social Networks Using Classification and Clustering
Techniques” 2014 International Conference on Recent Trends in
Information Technology
[8] Buket Erşahin, Ozlem Aktaş, Deniz Kilinç, Ceyhun Akyol, “Twitter
fake account detection”, 2017 International Conference on Computer
Science and Engineering (UBMK)
[9] Arpitha D, Shrilakshmi Prasad, Prakruthi S, Raghuram A.S, “Python
based Machine Learning for Profile Matching”, International Research
Journal of Engineering and Technology (IRJET), 2018
[10] Olga Peled, Michael Fire, Lior Rokach, Yuval Elovici, “Entity Matching
in Online Social Networks”, 2013 International Conference on Social
Computing
Fig. 2. Performance Evaluation Result. [11] Aditi Gupta and Rishabh Kaushal, “Towards Detecting Fake User
Accounts in Facebook”, 2017 ISEA Asia Security and Privacy
Results of Table II and Table III shows that 18 out of 20 (ISEASP)
clones were detected using similarity measures whereas only [12] Michael Fire, Roy Goldschmidt, Yuval Elovici, “Online Social
Networks: Threats and Solutions”, JOURNAL OF LATEX CLASS
16 clones were detected using C4.5 classification algorithm.
FILES, VOL. 11, NO. 4, DECEMBER 2012, IEEE Communications
So it can be concluded that clone detection using similarity Surveys & Tutorials
measures gives better results as compared to that of using C4.5 [13] Ashraf Khalil, Hassan Hajjdiab and Nabeel Al-Qirim, “Detecting Fake
classification algorithm. Followers in Twitter: A Machine Learning Approach” 2017
International Journal of Machine Learning and Computing
[14] Mohammad Reza Khayyambashi and Fatemeh Salehi Rizi, “An
V. CONCLUSION approach for detecting profile cloning in online social networks” 2013
International Conference on e-Commerce in Developing Countries: with
Fake and clone profiles have become a very serious problem focus on e-Security
in online social networks. We hear some or the other threats [15] Mauro Conti, Radha Poovendran and Marco Secchiero, “FakeBook:
caused by these profiles in everyday life. So a detection Detecting Fake Profiles in On-line Social Networks”, 2012 IEEE/ACM
International Conference on Advances in Social Networks Analysis and
method has been proposed which can find both fake and clone Mining
Twitter profiles. For fake detection, a set of rules were used
which when applied can classify fake and genuine profiles.

0070

Authorized licensed use limited to: Cornell University Library. Downloaded on September 02,2020 at 04:21:21 UTC from IEEE Xplore. Restrictions apply.

You might also like