0% found this document useful (0 votes)
17 views

Machine Learning-Based Secure Data Acquisition For

Uploaded by

Adarsh Krishna
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views

Machine Learning-Based Secure Data Acquisition For

Uploaded by

Adarsh Krishna
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

Hindawi

Wireless Communications and Mobile Computing


Volume 2022, Article ID 6356152, 10 pages
https://doi.org/10.1155/2022/6356152

Research Article
Machine Learning-Based Secure Data Acquisition for Fake
Accounts Detection in Future Mobile Communication Networks

B. Prabhu Kavin,1 Sagar Karki,2 S. Hemalatha,3 Deepmala Singh,2 R. Vijayalakshmi,4


M. Thangamani,5 Sulaima Lebbe Abdul Haleem,6 Deepa Jose,7 Vineet Tirth,8
Pravin R. Kshirsagar ,9 and Amsalu Gosu Adigo 10
1
Sri Ramachandra Institute of Higher Education and Research and Technology, Chennai, India
2
LBEF Campus (In Academic Collaboration with APU Malaysia), Kathmandu, Nepal
3
Department of Computer Science and Engineering, Panimalar Institute of Technology, Chennai, India
4
Department of Computer Science and Engineering, Velammal College of Engineering and Technology, Madurai, India
5
Department of Information Technology, Kongu Engineering College, Perundurai, Tamil Nadu, India
6
Department of Information & Communication Technology, South Eastern University of Sri Lanka (SEUSL), Sri Lanka
7
KCG College of Technology, Karapakkam, Chennai, Tamil Nadu, India
8
Mechanical Engineering Department, College of Engineering, King Khalid University, 61411 Abha, Asir, Saudi Arabia
9
Department of Artificial Intelligence, G. H Raisoni College of Engineering, Nagpur, India
10
Center of Excellence for Bioprocess and Biotechnology, Department of Chemical Engineering, College of Biological and
Chemical Engineering, Addis Ababa Science and Technology University, Ethiopia

Correspondence should be addressed to Pravin R. Kshirsagar; [email protected]


and Amsalu Gosu Adigo; [email protected]

Received 20 December 2021; Revised 25 December 2021; Accepted 29 December 2021; Published 27 January 2022

Academic Editor: Deepak Kumar Jain

Copyright © 2022 B. Prabhu Kavin et al. This is an open access article distributed under the Creative Commons Attribution License,
which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Social media websites are becoming more prevalent on the Internet. Sites, such as Twitter, Facebook, and Instagram, spend
significantly more of their time on users online. People in social media share thoughts, views, and facts and create new
acquaintances. Social media sites supply users with a great deal of useful information. This enormous quantity of social media
information invites hackers to abuse data. These hackers establish fraudulent profiles for actual people and distribute useless
material. The material on spam might include commercials and harmful URLs that disrupt natural users. This spam content is a
massive problem in social networks. Spam identification is a vital procedure on social media networking platforms. In this paper,
we have proposed a spam detection artificial intelligence technique for Twitter social networks. In this approach, we employed a
vector support machine, a neural artificial network, and a random forest technique to build a model. The results indicate that,
compared with RF and ANN algorithms, the suggested support vector machine algorithm has the greatest precision, recall, and F-
measure. The findings of this paper would be useful in monitoring and tracking social media shared photos for the identification of
inappropriate content and forged images and to safeguard social media from digital threats and attacks.

1. Introduction The data set created has been preprocessed to identify false
accounts on social networking sites, and the intelligent
In the last few years, online social networks (OSNs), including systems have identified false accounts. Random forest, neural
Facebook, Twitter, and LinkedIn, are becoming extremely network, and help vector machine classification output is used
common. People use OSNs to remain in contact, exchange to identify fraudulent accounts. The precision rates of fake
details, plan activities, and even operate their e-business [1]. accounts are compared using certain algorithms, and the
2 Wireless Communications and Mobile Computing

method is indicated with the highest accuracy [2]. In the past even web users who do not use social networking
twenty years, social media have expanded exponentially. Var- sites [7, 8]
ious forms of social networking gained a vast amount of peo-
ple, several events have been created, and social networking (ii) The second column describes emerging risks, includ-
has a tone of misleading profiles and bogus news created. Also, ing attacks that are essentially new to the OSN ecosys-
the false accounts use their accounts for multiple aims, includ- tem and use OSN technology to threaten the security
ing circulating rumors that impact a certain economy or even and anonymity of users
culture as a wider market. The identification of news of decep- (iii) The third group is combined risks, explaining how
tion is an ongoing problem [3]. Twitter is a large type of online hackers today can, and sometimes do, merge multi-
communication that probably contains vast knowledge which ple styles of threats to produce complex and deadly
opens up new opportunities for tweet content analysis. In real- attacks [4]
ity, 74 per majority of people state that either the “lacking of IT
infrastructure” or an overarching cost-benefit study is the (iv) The fourth and last types contain risks against
main barrier to use technology. Despite these obstacles, tech- children directly using social networks
nology appears to be gradually being embraced. More than
Figure 1 illustrates all the risks in the parts below. How-
half of the insurers analyzed said in the last five years, several
ever, the limitations between several risks may be obscured
in the last two years, they have been utilizing antifraud tech-
as strategies and goals sometimes overlap.
nology solutions [2, 3].
Twitter has many options to submit spam to address 3.1. Classic Threats. Since the popularity of the Internet, classic
assaults by hackers. By clicking on the link, a web user can attacks have become a concern. They appear to be a persistent
detect spam on their webpage. Twitter will evaluate the net- problem also named ransomware assaults, spam, cross-site
work user reports and deactivate the spam profiles. The Twitter (XSS), and phishing. While these challenges have been dis-
network is working to reveal fraudulent messages and suspect cussed previously, depending on the structure and existence
reports efficiently [4]. Several real login credentials are blocked of OSNs, such threats have become highly viral and may easily
out by Twitter when you block harmful tweets and suspect pro- propagate to network users [8]. Classic threats may manipu-
files. We thus need to get some effective ways for trash and late the personal details of the user-posted in a social network
spammers to be detected instantly. These modern techniques not just to target the user but also his friends simply by mod-
have in meantime no impact on authentic user tweets. We have ifying the threat to credit guarantee details of the user [5].
suggested in this paper an approach to detecting fraudulent
social accounts. We utilized the Twitter data set in this paper (i) Malware. Malware is software intended to disturb a
[5]. The data set obtained is utilized to produce a normal data device process to capture user passwords and gain
collection. Content-based features and user-based features were entry to your privacy. Social network malware uses
the types of features that were retrieved. To develop a model the OSN framework to spread across members and
with these features, we are employing a support vector their network mates. In certain instances, the ran-
machine, an artificial neural network, and the random forest somware may use the passwords acquired to imitate
algorithm [1, 6]. the client and deliver emails to online contacts [9]
(ii) Phishing Attacks. A phishing attack is a type of
2. Objective social engineering, which allows a reputable third
party to obtain user-sensitive and personal details.
In today’s modern social networks, there have been numer- New research has found people who are more prone
ous issues such as fraudulent profiles, online impersonation, to be phishing scams because of their social and
and other similar issues. In this paper, I plan to highlight a trustworthy nature, engaging with social media plat-
conceptual model for the automatic recognition of fake forms. One assault was perpetrated on Facebook
accounts, to ensure that person’s online lives are protected. and drew people to the bogus Facebook login sites.
We can also make it much simpler for sites to manage a The assault then spread among Facebook users by
larger amount of accounts by utilizing artificial intelligence encouraging buddies to click on the link on the ini-
techniques, which is incredibly difficult to accomplish man- tial user profile [7]. Luckily, this assault was pre-
ually at the moment due to a lack of resources. vented by Facebook
(iii) Cross-Site Scripting (XSS). An intrusion from the
3. Threats XSS is a web-based assault. The intruder who uses
XSS abuses the website client’s confidence and lets
As an online social network (OSN) is widely used, many the client’s computer run spyware to gather confi-
consumers are vulnerable to both privacy and protection dential details
risks unequivocally. These risks may be grouped into four
primary groups [7]. 3.2. Modern Threats. These risks are typically linked with
social networking online. In contrast to your connections,
(i) The first group involves classic risks to privacy and you want to collect users’ details. Attackers on social net-
protection, which not only target OSN members but working sites like Twitter aim at the confidentiality of a user
Wireless Communications and Mobile Computing 3

Threats to online social network users

Classic threats Modern threats Combination threats Threats targeting


children

Malware Clickjacking
Online
predators
Phishing De-anonymization attacks
attacks
Risky
Fake profiles behaviors
Cross-site
scripting
(XSS) Inference attacks Cyberbullying

Figure 1: Threats to online social network users.

since this is highly essential for them. This allows an attacker with accessible public OSN data, such as the entire
to access this data anymore if confidential information is network and user friends’ data [7]
made public [2, 3]; otherwise, an attacker can send a friend’s
request to certain users who have a tailored configuration. 3.3. Combination Threats. To build a more complex threat,
After then, your confidential information will be communi- today the attackers may still mix classical and contemporary
cated on confirmation of the request for friendship by the menaces. For example, a phishing attack can be used by an
specific attack. Below are the most advanced threats. intruder to capture a Facebook user password, then post mes-
sages with a clickjack on the stated schedule, so that friends of
(i) Clickjacking. Clickjacking is a malware process that the user Facebook can click on a posted message and get a
tricks users to click on something else click on. By secret virus installed according to their own devices [10]. An
pressing revving, the intruder will use spam notifi- additional example is the usage of cloned accounts to obtain
cations on its Facebook wall to exploit the client personal details on cloned user mates. The attacker could sub-
and unintentionally execute “like” connections [9] mit special, personalized spam emails containing a virus using
confidential info given by his friends. The malware is much
(ii) Deanonymization Attacks. Users will secure their pri- more likely to be triggered when utilizing personal details [8].
vacy and confidentiality using pseudonyms in certain
OSNs such as Twitter and MySpace. Deanonymiza- 3.4. Threats Targeting Children. Children, small children or
tion attacks use cookie monitoring, network architec- teens, definitely encounter the above specifics of classical
ture, and user community affiliation methods to and contemporary threats, but some threats target younger
expose the true identity of the user [7, 8] OSN users deliberately and in particular [4, 10].
(iii) Fake Profiles. Active or semiautomatic profiles (also
(i) Online Predators. The biggest issue about the privacy
known as styles or social bots) simulate human
of children’s confidential details is the Internet child
actions in OSNs. Fake accounts may also be used
predators, commonly known as cyber predators. To
to gather personal details from social networking
better understand the danger and harm associated
sites from the members. When you start connection
with the next online events, EU Kids Online’s Living-
requests for other people in the OSN, who also
stone and Haddon described typology [4]
grant requests, social bots may be able to capture
private user details that should be only accessible (ii) Risky Behaviors. Children’s possible dangerous habits
to friends of the user [4] can involve overt Internet contact with foreigners, the
usage of discussion forums for foreigner encounters,
(iv) Inference Attacks. Inference attacks are used in
sexually provocative conversations with foreigners,
OSNs to forecast users’ confidential, private details,
and providing private details and images to foreigners
such as religious affiliation or sexual identity that
they did not want to reveal. Such attacks can be car- (iii) Cyberbullying. Cyberbullying (also known as cyber
ried out using data mining methods in conjunction abuse) is an intruder that uses the web to annoy the
4 Wireless Communications and Mobile Computing

victim by sending hurtful texts, lewd comments or 5. Proposed System for Detecting Fake-
intimidating multiple occasions, posting embarrassing Accounts in Twitter Using AI
images or videos of the victim, or participating in other
offensive behaviors inside a technology network such We utilized many approaches for spam detection in Twitter
as e-mail, talk, mobile conversations, and OSNs [1, 6] data in the proposed method as shown in Figure 2. Each
approach employs its data set and data categorization func-
tionality. Spam detection methods made use of a variety of
4. Literature Review forms of functionality, including user-based and content-
based features and graphs, among others. The advantages
To provide effective identification on bogus Twitter account and disadvantages of each extracted feature are addressed
and bots, function selection technologies, and dimensional [10, 11]. We use these characteristics to develop a classifica-
reduction strategies, [1] implemented a modern SVM-NN tion method that distinguishes between false information
algorithm. This suggested methodology (SVMNN) uses and information that is true. To get the best classification
fewer than 98% of the records of our training set but is still results, we created an integrated classification model that
able to properly classify. includes support vector machines, artificial neural networks,
Regarding fake accounts on social media, particularly on and the random forest approach.
Facebook, the technology of 2018. A computer training func-
tion was used in this study to help predict counterfeit accounts 5.1. Data Collection. There are two ways to gather the data
from their comments and the location on their walls. Suitable set needed for experimental evaluation. The first step was
for validating material based on classification and interpreta- to manually collect the information. Here, users collect the
tion of the text was used to support vector machine (SVM) information that is present and designate them manually.
and supplement naïve bays (NCB). In [3] suggested space- A Twitter account with 1150 followers is utilized to gather
time mining in the social grids, with latent semantical analyt- the data manually. These were the real accounts [12]. User
ics, to classify the circle of consumers interested in malicious profile data is collected via the Twitter REST API. Sets of
incidents. Then, compare the effects of spatial-temporal coin- three persons perform additional labeling and verification.
cidence with the results of the initial organization/stories on Another data set of the project “The Fake Project” was
the social network, as the wavelet covalue and real organiza- obtained, and it was incorporated together with the data
tion will produce very motivating covalue. obtained. The final data set consists of 7,973 account infor-
A proof of concept enhancer model was developed [6] mation, divided into two portions, 75% used for model
which is successfully used for the identification of bots. In training and 25% used for model testing [14].
[9] detected spam in SMPs and used the value of features
in iterating a higher output collection of laws. Machine 5.2. Data Preprocessing. The obtained information must be
learning methods require environmental input to be adapted preprocessed through multiple measures until entering every
and improved. In [7] effectively training a neural network in classifier to ensure the algorithm recognizes the data and
the analysis of the error level of 4000 false and 4000 actual creates the absolute best model. Formatting and data clean-
images. With a strong success rate, the qualified neural net- ing are one of the preprocessing activities [11]. Formatting
work has managed to classify the picture as false or true. is an essential method by which the data can be read accept-
A review of hackers on Twitter was proposed [8] to help ably for the classifier, for example, by translating the data
grasp their behavioral features. An one hundred - thousand type into a text file or a flat format. Cleaning method man-
messages have been received to carry out the analysis over aging the missing values of a data set, like missing labels or
one month. The assessment was made of two separate spam- values of certain data, set properties that are manually
mer types using different trolling techniques. Also, three key accomplished by plurality voting for the matching values
groups identified a series of tools for identifying spammers: of other instances and even by deleting certain instances that
profile characteristics, social connections, and account adversely influence the classifier learning process [12]. Fur-
assets. In [4], Facebook users have shown themselves to con- thermore, cleaning details means deleting personal details
sider friendship invites from strangers they may not know that may breach the privacy of some people.
but with several relatives. Users inadvertently divulge their
private knowledge to absolute foreigners by taking in these 5.2.1. Tokenization. Tokenization is the breakdown of a text-
demands from friends. based circulation into words, sentences, symbols, or various
In [6, 11, 12] elaborate on the use of artificial intelligence essential components known as tokens. The objective is to
for different classification and prediction problems and fur- explore sentences in one phrase. The token list becomes a
thermore explain the use of hybrid artificial intelligence for parsing input or a text-based mining input for further anal-
feature extraction, classification, and prediction along with ysis. In languages (where textual material is segmented in a
modeling with different algorithms and optimization tech- format) and in laptop technology, tokenization is valuable
niques [13]. as a component of reading passages [10, 11]. Textual knowl-
In [10], he addressed the interest of making progress in edge at the beginning is most basic. All recognized recovery
the effective recognition of false identities produced by peo- techniques need data set terms. For this purpose, a processor
ple on SMPs and applied them to a series of fake human has to tokenize the data. This might be easy since the text is
accounts. already recorded in readable codecs in the computer system.
Wireless Communications and Mobile Computing 5

Data pre-processing
Feature selection
Collection of data

Tokenization

Analysis using AI
Removing stops model
words

Stemming
Account detection

Fake Real

Figure 2: Fake account detection in Twitter using artificial intelligence.

Nevertheless, certain issues remain, such as deleting punctu- use of several user-based features [16]. User functionality is
ation marks. Various characters such as brackets, hyphens, associated with user profiles, and the attributes of users are
and others also have to be processed [12]. derived from user profiles. Our approach takes advantage
of a variety of user-based characteristics, including:
5.2.2. Stop Word Removal. Stop words are more common
than conventional phrases like “and,” “are,” etc. They do (i) Number of Followers. This feature defines the num-
not appear useful for the basis of the data collected. They ber of other users in the network that are following
must thus be eliminated. Moreover, the evolution of such the tweets from your profile. In general, the number
stopping words between both text documents is complicated of follows determines the attractiveness of a per-
and uneven [15]. This method minimizes text knowledge son’s profile. Phishers are often less recognized
and enhances the performance of the approach. Each textual and have a smaller amount of followers than other
content report includes these sentences that are not essential types of users
for solutions for textual data.
(ii) Several Following. This feature determines the set of
5.2.3. Stemming. Stemming is a primitive intuitive procedure other user profiles that you are following. When you
that cuts off the extremities of words to attain this aim more follow somebody on Twitter, their tweets may appear
often than not properly [7]. It frequently includes the in your timeline. The Twitter network is aware of who
removal of prefixes and suffixes, which is a common occur- you are following and who is following you
rence in the English language. (iii) Age of Account. This feature indicates the date and
time at which the account was established
5.3. Feature Selection. Eleven characteristics have been
discovered in the proposed spam detection approach. The (iv) A follower to Following Ratio. This is the connection
retrieved characteristics are split into two categories. between the number of followers and the number of
followers for any user profile in a group. The ff ratio
5.3.1. User-Based Features. The activity of Twitter users is is usually lower for normal users, but for frauds, it is
characterized using user-based characteristics, which are greater
attributes that are unique to each user. These characteristics
are based on data from the Twitter data set, which includes
user relationships and user profiles, among other things. It
Number of following
is usual for users to collaborate with some other users on FFRatio = : ð1Þ
online communities to build their social networking sites. a number of followers
Phishers would like to follow numerous accounts [13]; thus,
they try to track numerous people to spread the disinforma-
tion. They wish to track the fraudsters. Usually, we assume
that the number of people that follow him is greater than (v) Reputation. This is the connection between the num-
that of users who follow him. To construct a model, we make ber of followers and the total number of followers
6 Wireless Communications and Mobile Computing

Followers 5.4. Analysis Using Artificial Intelligence Model. Once the


Reputation = : ð2Þ features and training and test sets have been established, it
Followers + Following
is essential to select the most appropriate classification
5.3.2. Content-Based Features. These characteristics are approach for the model [19]. Each data set has a perfect clas-
linked to user tweets. Regular users cannot post duplicate sification approach; this would be an exaggeration in the
material, yet a lot of duplicate tweets are posted by fraud- field of analytics; thus, a “fit model” must be created to
sters. Content-based features are based on stuff written by achieve excellent efficiency based on the data.
users. Spam communications may be detected with the con- 5.4.1. Support Vector Machines (SVM). SVM found the
tent functionality. Fraudsters are malevolent people who dis- approach to data grouping and training and prediction prob-
tribute a lot of disinformation to members of the network lems as one of the most basic and useful techniques [13].
[17, 18]. The disinformation comprises advertising and The input variables are the nearest data point to the judgment
harmful links for their goods. Our method uses the different area. The most fundamental and significant way to classify the
content-based features as follows: most simple classification models of lower-dimensional trans-
fer learning with discrete classification [10] is the highest range
(i) Number of Tweets. A person’s total amount of tweets
classification. SVM is a simple classification model. Equation
since the first time a profile has been created
(6) used to compute the SVM
(ii) Hashtag Ratio. This is the proportion of tweets with $ %
hashtags to the total number of comments submitted n

and of tweets with one hashtag yðxÞ = sign 〠 ∝k yk φðx, xk Þ + b , ð6Þ


k=1

Duplicate Hashtag where ∝k is the positive real constant and b is the real
Hashtag ratio = : ð3Þ constant.
Unique Hashtags × Tweet count
5.4.2. Artificial Neural Network (ANN). The ANN is a model
for computer machinery training based on a biological neu-
ral network structure and function. Input and output are
(iii) URL’s Ratio. This corresponds to the number of changed as the network knowledge flows across the network
duplicate URLs in tweets based upon the number affects the ANN structure [11]. The ANN is considered a
of tweets with distinct URLs nonlinear data modeling method that models complex
input-output relations [14]. Three basic layers are contained
Hash duplicate URLs in a neural network as shown in Figure 3.
Total URLs = : ð4Þ In other words, the corresponding variable ku ðxÞ is given
Hash unique URLs × Tweet count
to one-hidden layer MLP

ku ðxÞ = Aðo2 + z 2 ðsðo1 + z 1:x ÞÞÞ: ð7Þ


(iv) Mention Ratio. Users of Twitter account @username
“z 2 ” and “z 1 ” are the matrix weight, and “A” is repre-
are recognized. @username can be tweeted anytime.
sented by the kernel function, where “o2 ” and “o1 ” are the
Fraudsters exploit this function mistakenly to send
bias objects. Moreover, the hidden state of the h variable is
spam comments to real network members. User
defined as
communications typically possess a significant
number of reply tags that users then believe them-
hðxÞ = sðo + z 1:x Þ: ð8Þ
selves to be spam users
During this method, iterations are used to ensure the
Tweets containing@ minimum number of potential errors before the necessary
@Tweets = : ð5Þ input-output mapping has been achieved; a collection of
Total number of tweets
training data, including certain input and associated output
vectors, is required here [15]. We learn all model parameters
to train an MLP. Let theta = z 2 , o2 , z 1 , o1 is the set of param-
(v) Tweet Frequency. Spammers typically tweet more eters for learning.
frequently than legitimate Twitter users, which is a
problem 5.4.3. Random Forest (RF). The random algorithm of the for-
est is a managed algorithm for classification. This algorithm
(vi) Spam Words. We employ particular spam phrases generates a forest with many trees, as the name implies [12].
and measure the number of times they appear in The greater the number of trees in the forests, the same is the
the tweets of individuals. Fraudsters make use of case in the random forest classification. The random forest
these spam phrases to convey false information to learning algorithm uses the general entity framework aggre-
Internet users gation technique.
Wireless Communications and Mobile Computing 7

Input 1

Input 2

Input 3 Output

Input 4

Input 5 Hidden layer

Input layer

Figure 3: Schematic diagram of ANN structure.

Or by taking the majority vote in the case of classifica-


Tweets relating to each subject
tion trees.
5.5. Evaluation and Assessment. This section describes the
Is tweet’s source a authenticity of positive (P) and negative (N). Hacking is
Yes No
media sources
described as hit or positive in reality (TP), authorized in reality
Tweet in news set Tweet in the public as negative (TN), and authorized fake websites wrongly as a
false positive (PF) or false hit (FP) [14, 15].
Accuracy is determined by the classified instances profile
ratio over the total profile number as shown in Equation (10)
Label each correspondent pair based on semantic and sentimental
analysis as match/mismatch
TP + TN
Accuracy = : ð10Þ
TP + FN + FP + TN
Calculate mismatch ratio
Precision is measured as the proportion of scam profiles
accurately estimated against the total number of spam profiles.
Yes Mismatch ratio> In other terms, the junk profiles are the proportions that are
No
threshold junk profiles as illustrated in Equation (11)
Match Mismatch
TP
Precision = : ð11Þ
Figure 4: Flow chart of rumor detection in Twitter. TP + FP

The recall is the percentage of spam profiles accurately


Ensuring a set of X = x1 , ⋯, xn with answers Y = y1 , ::, yn estimated against the total amount of real spam profiles as
, repetition of bagging (G times), the training set substitutes described in Equation (12).
the random sample and suits trees to the samples.
For g = 1, ⋯, G TP
Recall = : ð12Þ
(1) Sample, with replacement, n training examples from TP + FN
X and Y; call these X g and Y g
F-measure is calculated as the weighted average for both
(2) Train a regression tree f g on X g and Y g precision and recall as shown in Equation (13).

Assumptions for unseen samples x may be rendered 2 × Precision × Recall


after training by an averaging of all the specific regression F − Measure = : ð13Þ
Precision + Recall
trees by x ′ as shown in Equation (4).
ROC Curve Region (AUC) is a well-known classifier
G   consistency assessment indicator. In the case of a random clas-
̂f = 1 〠 g x ′ : ð9Þ sifier, the AUC value shall equal 0.5, while AUC shall equal 1
G g=1
for a great classifier as described in Equation (14).
8 Wireless Communications and Mobile Computing

99 Table 1: Efficiency of each algorithm utilizing user-based features.


98
97 Algorithms Precision (%) Recall (%) F-measure (%)
96 SVM 97.45 98.19 97.32
95 RF 95.56 95.06 95.64
Rate (%)

94 ANN 93.21 92.65 92.73


93
When comparing the SVM method to the ANN algorithm and the RF
92 algorithm, the SVM algorithm has the highest accuracy, recall, and F-
91 measure.
90
89
Precision (%) Recall (%) F-measure (%) ð1 ð1
TP FP 1
Performance metrics AUC = d = TPdFP: ð14Þ
0 P N P:N 0
SVM
RF 5.6. Rumor Detection in Twitter. In our context, rumors, where
ANN
several people think is true, are categorized as any information
Figure 5: Performance of artificial intelligence algorithms using posted on Twitter, but dispute the news tweets on authenticated
user-based features. news outlets. The present assumption is the basis of our meth-
odology [14]. “Twitter’s authenticated TV network accounts
would have credible proof compared with the innocent unveri-
100
fied user accounts [20].” The method used by a validated news
source for post facts provides a basis for this premise. News
80
agencies verify the material before it is released [7, 9].
They keep the facts they share accountable. They try to
Rate (%)

60
uphold their integrity and publish the right information as
40 quickly as possible and take into account that the news
affects a broad user base. Twitter verifies their identities
20 and prevents fraud profiles on the news channel. There is
also trust in facts from the authenticated media outlet
0 account. Figure 4 gives the flow chart for the algorithm
Precision (%) Recall (%) F-Measure (%)
[13, 15]. The tweets are split into two sets to detect disinfor-
Performance metrics mation, both news, and public knowledge, under the princi-
SVM
RF
ple that their sources are news outlet accounts or otherwise.
ANN The news channel’s tweets are classified as news tweets, all
such tweets are tweeted to the general. Both tweets are linked
Figure 6: Performance of artificial intelligence algorithms using to semantic and feeling analyses in the news set. Finally, any
content-based features. pair is classified as a fit or mistake of the public cross prod-
uct as well as new tweet sets.
The difference ratio then measured according to the
100 following formula that represents the extent to which the
95 press and the public differ can be shown in Equation (15).

90 N
Rate (%)

Mismatch Ratio = : ð15Þ


85 K

80 where N is the amount of polarity public tweets to the


contrary and K is the total number of public tweets.
75 The issue is classified as a match if its mismatch ratio is
Precision (%) Recall (%) F-measure (%)
larger than a threshold value (say 25%). If a subject is classi-
Performance metrics
fied as gossip, then the information which conflicts with
SVM information from tested sources is believed by the public
RF
ANN
and is thus published.

Figure 7: Performance with user-based and content-based features 6. Result and Discussion
of artificial intelligence algorithms.
The main purpose of this paper is to measure the perfor-
mance of the spam detection classification models on Twit-
ter. User-based and content-based features are suggested
and retrieved from social networking sites to identify spam
Wireless Communications and Mobile Computing 9

Table 2: Efficiency of each algorithm utilizing content-based effective in our technique. In the future, we plan to broaden
features. our approach to include more types of characteristics and to
conduct similar tests on other social media networks that have
Algorithms Precision (%) Recall (%) F-measure (%) significant amounts of data.
SVM 93.34 93.239 93.11
RF 90.89 90.21 90.42 Data Availability
ANN 89.45 76.90 80.78
When comparing the SVM method to the ANN algorithm and the RF
The datasets used and/or analyzed during the current study
algorithm, the SVM algorithm has the highest accuracy, recall, and F- are available from the corresponding author on reasonable
measure. request.

Conflicts of Interest
Table 3: Efficiency of each algorithm utilizing user-based features
and content-based features. Conflict of interest is not applicable in this work.

Algorithms Precision (%) Recall (%) F-measure (%) Acknowledgments


SVM 97.43 95.70 94.84
RF 92.47 93.16 91.95 The author thankfully acknowledges the Deanship of Scien-
ANN 91.12 86.45 85.09 tific Research, King Khalid University, Abha, Asir, Kingdom
of Saudi Arabia, for funding the project under the grant
When comparing the SVM method to the ANN algorithm and the RF
number R.G.P1./74/42.
algorithm, the SVM algorithm has the highest accuracy, recall, and F-
measure.
References

on Twitter. We analyze classification performance using [1] Y. Boshmaf, D. Logothetis, G. Siganos et al., “Íntegro: leverag-
ing victim prediction for robust fake account detection in large
artificial neural networks, vector support, and random for-
scale osns,” Computers & Security, vol. 61, pp. 142–168, 2016.
ests. On each algorithm, individual classification trials are
[2] A. M. Hemeida, S. Alkhalaf, A. Mady, E. A. Mahmoud, M. E.
conducted. 75% of the Twitter data sets are picked randomly
Hussein, and A. M. Baha Eldin, “Implementation of nature-
for training purposes for trials, and the remaining 25% is inspired optimization algorithms in some data mining tasks,”
selected for classification tests. We utilized a series of mea- Ain Shams Eng Journal, vol. 11, no. 2, pp. 309–318, 2020.
surements termed precision, recall, and F-measure to evalu- [3] N. Kasliwal and T. Bachhav, “Detection of fake accounts of
ate the whole methodological procedure. Twitter using SVM and NN algorithms,” IEEE Transactions
All categorization algorithms have been developed and val- on dependable and secure computing, vol. 5, no. 1, pp. 37–48,
idated independently for user-based functions initially. Each 2019.
classification is then independently trained and assessed for [4] S. D. P. Reddy, “Fake profile identification using machine
content-based characteristics. All classifications will then be learning,” International Research Journal of Engineering and
assessed using Figures 5–7 utilizing user-oriented functions Technology (IRJET), vol. 6, no. 12, pp. 1145–1150, 2019.
and content-based features. Table 1 discusses the user-based [5] A. Zubiaga, K. Aker, M. Bontcheva, and R. P. Liakata, “Detec-
performance of the classifier. Table 2 covers the content-based tion and resolution of rumours in social media: a survey,”
classifier. Table 3 demonstrates user-driven and content-based ACM Computing Surveys (CSUR), vol. 51, no. 2, pp. 1–36,
classifier performance. 2018.
[6] P. Kshirsagar and D. S. Akojwar, “Classification and prediction
of epilepsy using FFBPNN with PSO,” in IEEE International
7. Conclusion Conference on Communication Networks, Gwalior, 2015.
This paper gives a systematic analysis of essential approaches [7] A. M. Al-Zoubi and H. Faris, “Spam profiles detection on
for identifying fraudulent accounts on online social network- social networks using computational intelligence methods:
ing sites, such as Facebook (OSNs). In this paper, the primary the effect of the lingual context,” in Proceedings of the 19th
international conference on World Wide Web, pp. 851–860,
techniques, as well as a broad range of approaches, that may be
Raleigh, NC, April 2010.
used for determining fraudulent accounts in online social net-
[8] H. Faris and Aljarah, “Improving email spam detection using
works (OSNs) are addressed. Because of the huge amount of
content based feature engineering approach,” in Jordan confer-
information available on social media platforms, it has become ence on applied electrical engineering and computing technolo-
increasingly difficult for consumers to find accurate and useful gies (AEECT), pp. 1–6, Aqaba, Jordan, October 2017.
data in recent years. This paper offers a hybrid collection of [9] A. el Azab, M. A. Mahmood, and A. el-Aziz, Fraud News
spam messages on social networking platforms. We have Detection for Online Social Networks Web Usage Mining Tech-
developed an extensive approach for spam identification in niques and Application across Industries, igi global, 2017.
the Twitter dataset to identify spammers. We have employed [10] E. Caldeira, G. Brandao, and A. C. M. Pereira, “Fraud analysis
SVM, ANN, and RF algorithms, as well as hybrid features, and prevention in e-commerce transactions,” in Web Congress
such as user-based and content-based features. The recall, pre- (LA-WEB), 2014 9th Latin American, pp. 42–49, Minas Gerais,
cision, and F-measure applying the SVM algorithm are quite Brazil, 2014.
10 Wireless Communications and Mobile Computing

[11] P. Kshirsagar, S. Akojwar, and N. D. Bajaj, “A hybridised neu-


ral network and optimisation algorithms for prediction and
classification of neurological disorders,” International Journal
of Biomedical Engineering and Technology, vol. 28, no. 4,
p. 307, 2018.
[12] P. Kshirsagar and S. Akojwar, “Novel approach for classifica-
tion and prediction of non linear chaotic databases,” in 2016
International Conference on Electrical, Electronics, and Opti-
mization Techniques (ICEEOT), pp. 514–518, Chennai, India,
2016.
[13] P. Kshirsagar and S. Akojwar, “Classification & detection of
neurological disorders using ICA & AR as feature extractor,”
International Journal of Scientific Engineering and Science
(IJSES), vol. 1, no. 1, 2015.
[14] J. Jiang, C. Wilson, X. Wang et al., “Understanding latent
interactions in online social networks,” in Proceedings of the
10th ACM SIGCOMM Conference on Internet Measurement,
ACM, pp. 369–382, Melbourne, Australia, 2019.
[15] M. Praveena, R. Asha Deepika, and C. Sai Raghavendhar,
“Analysis on prediction of heart disease using data mining
techniques,” Journal of Advanced Research in Dynamical and
Control Systems, vol. 10, no. 2, pp. 126–136, 2018.
[16] A. Kundu, S. Panigrahi, S. Sural, and A. K. Majumdar, “Blas-
tssaha hybridization for credit card fraud detection,” IEEE
Transactions on Dependable and Secure Computing, vol. 6,
no. 4, pp. 309–315, 2018.
[17] C. Buntain and J. Golbeck, “Automatically identifying fake
news in popular twitter threads,” in Proceedings of the IEEE
International Conference on Smart Cloud, pp. 208–215, New
York, 2017.
[18] S. Tschiatschek, A. Singla, M. Gomez Rodriguez, A. Merchant,
and A. Krause, “Fake news detection in social networks via
crowd signals,” in Proceedings of the World Wide Web Confer-
ences, pp. 517–524, France, 2018.
[19] A. Munther, “A preliminary performance evaluation of
Kmeans, KNN and EM unsupervised machine learning
methods for network flow classification,” International Journal
of Electrical and Computer Engineering (IJECE), vol. 6, no. 2,
p. 778, 2016.
[20] S. Abu-Nimeh, T. Chen, and O. Alzubi, “Malicious and spam
posts in online social networks,” Computer, vol. 44, no. 9,
pp. 23–28, 2011.

You might also like