0% found this document useful (0 votes)
15 views9 pages

Paper 64-A Sophisticated Deep Learning Framework

Uploaded by

Charu Gogar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views9 pages

Paper 64-A Sophisticated Deep Learning Framework

Uploaded by

Charu Gogar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

(IJACSA) International Journal of Advanced Computer Science and Applications,

Vol. 14, No. 12, 2023

A Sophisticated Deep Learning Framework of


Advanced Techniques to Detect Malicious Users in
Online Social Networks
Sailaja Terumalasetti1, Reeja S R2
School of Computer Science and Engineering, VIT-AP University, Amaravati, India

Abstract—Malicious user detection is a cybersecurity


exploration domain because of the emergent jeopardies of data I. INTRODUCTION
breaches and cyberattacks. Malicious users have the potential to Online social networks have turned out to be an
detriment the system by engaging in unauthorized actions or indispensable element of our everyday life. Platforms like
thieving sensitive data. This paper proposes the dual-powered Facebook, Twitter, Instagram, and LinkedIn have transformed
CLM technique (Convolution neural networks and LSTM) and
the way we intermingle, altercation data, and associate with
optimization technique, a sophisticated methodology for
people.
distinguishing malicious user behavior that assimilates LSTM
and CNN, and finally optimization technique to enhance the For cybersecurity professionals, ascertaining malicious
results. A genetic algorithm is used to augment the model's users in online social networks (OSN) presents a perplexing
capability to perceive altering and nuanced malicious task. People might now enthusiastically interact with friends
performance by fine-tuning its parameters. Due to the rising and families, discuss their views and opinions, and even
vulnerabilities of data breaches and cyber-attacks, malicious user conduct business online, acknowledging the upsurge of social
identification in OSN (Online Social Networks) is a significant media. Online social networks (OSNs) have turned out to be a
topic of research in cybersecurity. The proposed technique
crucial part of the contemporary era, endorsing connectivity
pursues to ascertain anomalous user behavior patterns by
assessing vast quantities of data generated by digital systems with
and information sharing. However, as these platforms are
CLM and optimizing detection accuracy with genetic algorithms. exposed, they are probable to diverse sorts of misuse, including
On a public dataset of social media bot dataset, a twibot-20 malevolent user activity. The research deals with the crucial
dataset comprehending user activity data, was explored to theme of detecting and mitigating malicious user behavior.
measure the performance of the suggested methodology. The Online social networks (OSNs) are virtual communities that
outcomes demonstrated that, in comparison to conventional allow individuals to associate and communicate with one
machine learning algorithms like SVM and RF, which another on a certain topic or just "hang out"[1].
respectively obtained 92.3% and 88.9% accuracy, our technique,
With billions of handlers worldwide, OSNs have turned out
had a better accuracy of 98.7%. Moreover, the other metrics
measures were assessed, and the proposed technique to be an indispensable component of modern civilization.
outperformed traditional machine learning algorithms in each Individuals are progressively using OSN sites due to the rapid
situation. growth of Web 2.0 technology. The rise of malevolent
individuals in online social networks, on the other hand, has
Keywords—Online social networks; malicious user behavior; become a substantial concern for both users and researchers.
convolution neural networks; long short-term memory; genetic Criminal hackers recurrently exploit social media to spread
algorithm spam and malware, which is acknowledged as social malware.
These destructive users will not "fit" into any of these
ABBREVIATIONS classifications because they have mutual friends and interests
Acronyms Definition and develop gigantic communal networks. The advent of
OSN Online Social Networks detrimental handlers in online social networks is an
CNN Convolution Neural Network intensifying basis of concern for users. According to one
LSTM Long Short-Term Memory assessment, the number of fraudulent social media profiles
GA Genetic Algorithm generated grew by 100% in the first half of 2020. According to
CLM Convolutional Neural Network and LSTM another survey, the amount of social media phishing attacks
NLP Natural Language Processing grew by 500% in the first quarter of 2021. These statistics lay
TP True Positive
emphasis on the prominence of detecting and preventing
FP False Positive
fraudulent users in online social networks.
TN True Negative
FN False Negative Security is of utmost consequence in the contemporaneous
Acc Accuracy era, since the majority of our private and sensitive data is
Prec Precision stockpiled digitally. Malicious users have the potential to harm
Rc Recall the system by engaging in unauthorized actions or stealing
F1s F1- Score sensitive information [3]. Access restrictions, intrusion

616 | P a g e
www.ijacsa.thesai.org
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 14, No. 12, 2023

detection systems, and firewalls are instances of traditional A unique approach is envisioned with a dual-powered CLM
security measures that can assist in preventing attacks to some (Convolution neural networks and LSTM) and optimization
extent but are not precisely operational when it comes to technique. The amalgamation of deep learning and
malicious user detection. Algorithms in machine learning have evolutionary computation provides the technique with the
been used to analyze user activity and detect anomalies. adaptive competencies vital to safeguard OSNs. The suggested
However, the accuracy of these algorithms is mostly method is evaluated on a user activity dataset in OSN, and the
determined by the prominence and volume of training data. outcomes are illustrious from those of conventional machine
learning techniques [4].
A malicious user utilizes a computer system or network
intending to cause harm, steal data, or disrupt normal The motivation for the proposed CLM and Optimization
operations [1]. Malicious users may have numerous intentions, method distinguishes hazardous users to improve security and
comprising of financial gain, retaliation, or political defend against cyberattacks. Exploiting system vulnerabilities,
involvement. They may use a variety of strategies to attainment unauthorized access, stealing sensitive data, and
accomplish their goals, encompassing malware, phishing, interrupting system operations can detriment people and
social engineering, and exploiting vulnerabilities in software companies. Firewalls and antivirus software don't always stop
and hardware. complex attacks, thus modern methods are obligatory to detect
and preclude them [13].
Analyzing user behavior is one procedure for identifying
malevolent users. It could be capable of flagging suspicious A. Organisation of the Paper
behavior and more research by discerning an eye on user The paper encompasses the subsequent subheadings:
activity patterns and perceiving abnormalities. CNN and Section II - Literature Review, Section III – Proposed
LSTM networks are instances of machine learning techniques Methodology, Section IV - Experimental Evaluations and
that possibly will be used to automatically analyze big datasets Results, Section V - Conclusion and References.
of user behavior and predicament patterns that can be
suggestive of harmful conduct [2]. By looking for the ideal set II. LITERATURE REVIEW
of hyper parameters, genetic algorithms (GAs) may be
Deep learning neural networks of the variation known as
employed to improve the enactment of the archetype.
CNNs are frequently engaged in processing images and videos.
Malicious users pose a severe threat to entities, They have been revealed to be incredibly efficacious in
governments, and organizations. They have the proficiency to resolving stimulating computer vision issues comprising
steal private information, jeopardize the security of systems, segmentation, object identification, and picture categorization.
and harm a company’s reputation and brand. Therefore, it is The vital principle of CNNs is to extract information from
essential to have effective techniques for identifying and pictures using convolutional filters and then to categorise or
reducing the actions of harmful users [3]. The upsurge of these determine objects using these characteristics [5]. CNNs have
daily threats over the past ten years is the main cause for revolutionised the field of computer vision and made it
concern for data security. Fig. 1 illustrates the tendency of the possible for a variety of applications, from self-driving cars to
threats in the past decade. medical imaging. CNN has significantly augmented its
popularity in voice and picture recognition tests. It captures
spatial and temporal tendencies in data since it is built on the
Frequency of Threats notion of native connectedness and shared weights. When
creating a CNN model, data inputs like images or data
categorizations are deployed through numerous of layers of
convolution, pooling, and activation functions. Ensuing this,
fully linked layers that dispense the response into numerous
classifications acquire the yield of these layers.
CNN has significantly amplified its popularity in voice and
picture recognition tests. It captures spatial and temporal
Threats in Billions tendencies in data since it is built on the notion of native
connectedness and shared weights. When creating a CNN
model, data inputs like images or data sequences are deployed
through numerous layers of convolution, pooling, and
activation functions. Ensuing this, fully linked layers that
distribute the response into several classifications acquire the
yield of these layers [7].
0 50 100 150 200 250 300 The LSTM variance of the recurrent neural network (RNN)
properly resolves the vanishing gradient problem that concerns
2022 2021 2020 2019 2018 2017 regular RNNs. [6]. The vanishing gradient problem occurs
2016 2015 2014 2013 2012 when gradients get tinier as they propagate over time, making
training the network on lengthy sequences challenging.
Fig. 1. Frequency threats in real-time.

617 | P a g e
www.ijacsa.thesai.org
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 14, No. 12, 2023

This problem will be resolved by LSTM, which has a Ranjan and Kumar [6]. The authors analysed user behavioural
particular form of memory cell that can store information for data using multiple machine-learning methods to identify
longer. Three gates govern the cell: the input gate, the forget unusual behaviours. The study demonstrated that UBA can be
gate, and the output gate. The forget gate standardizes the an expedient method for detecting malicious users. Tanuja et
retention of preceding data, the input gate controls the flow of al. [12] proposed a machine learning technique for identifying
new information into the cell, and the output gate regulates the fraudulent social network users. The authors analysed user
cell's output. activity data using multiple machine-learning methods to
identify abnormal conduct that may advocate a deceitful user.
In an extensive assortment of applications, including To identify various anomalous user behaviours and lessen their
speech recognition, machine translation, and NLP (natural negative impacts, statistical analysis was done. To find unusual
language processing), LSTM has been illustrated to be conduct that may point to a malevolent user, the authors
effective. It has also been used for anomaly detection and time- performed statistical analysis [10]. Several patents pertaining to
series prediction jobs, where it may discover temporal the detection of malicious users are accessible on Google
relationships and long-term trends in data [8]. Patents, including a framework for mobile advanced persistent
A heuristic optimization method based on natural selection threat detection, a deep learning method for detecting covert
and evolution is referred to as the Genetic Algorithm (GA). It channels in the domain name system, and a technique for
is used to address optimization issues that require determining detecting insider and masquerade attacks by identifying
the optimal parameter combination for a given objective malicious user behaviour [11] [12].
function. The GA generates a population of candidate
solutions, known as chromosomes. Each chromosome is III. PROPOSED METHODOLOGY
composed of a series of genes that represent various parameters A. System Model
of the issue being optimized. These parameters can include any
form of data, including numerical values, Boolean values, and System model for malicious user detection through user
texts. Subsequently, the GA evaluates the fitness value of the behavior for CLM and optimization technique. The Fig. 2 gives
respective chromosome in the population using the objective an overview of the system. The data collection and
function. The fitness value assesses how successfully the preprocessing module, the CLM and optimization technique,
chromosome resolves the issue. The GA then chooses the and the evaluation module encompass the classification model
population's top chromosomes to serve as the parents of the for malicious user detection through user behavior for CLM
following generation. and optimization technique.

Employing genetic operators like crossover, mutation, and


selection to the parents, the next generation gets generated. As
opposed to mutation, which involves altering certain genes in a
chromosome at random, crossover involves transferring genes
between two chromosomes to produce new progeny. In
selection, the population's finest chromosomes are chosen to
serve as the parents of the following generation [27] [29].
Using the conceptions of natural selection and evolution,
GA is a persuasive optimization technique that may unearth the
preeminent responses to thought-provoking issues. It is
comprehensively utilized through several disciplines, including
computer science, engineering, and finance.
Malicious user detection prominence is evolving in the
contemporary era because of security theft and data privacy.
The information in this digital world has to be secure. The
identification of malicious users is an important part of
cybersecurity [9] [30]. User behaviour analysis (UBA) is a
technology that employs machine learning and data analytics to
detect abnormal conduct that might suggest a malevolent user.
In this literature review, we will look at some current studies
on detecting illegitimate users using UBA.
A machine learning-based approach to identifying illicit
behaviour based on host process data was proposed by Han et
al. [2]. The authors analysed user behaviours and identified Fig. 2. Schematic diagram of proposed methodology.
abnormal behaviour using big data. The study is shown that
UBA can be an effective method for detecting harmful The problem with devising malicious user detection
activities. A user behaviour analysis system that utilises data employing CLM and optimization technique approach is to
analytics and machine learning to detect and differentiate that develop an artificial intelligence model that can precisely
exists between malicious and genuine users was introduced by distinguish malicious users through user behavior information
composed from a miscellaneous variety of sources. The model

618 | P a g e
www.ijacsa.thesai.org
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 14, No. 12, 2023

should be able to handle large datasets, noisy data, and a wide Algorithm 1 : Data Preprocessing
variety of malicious behavior types, including network attacks,
system intrusions, and user impersonation. The objective is to Initialize
develop a prototype that can be deployed in a real-world
setting to detect and prevent malicious user behavior before it BEGIN
can cause damage to users or systems. The system aims to Step 1: Load the Dataset
afford a reliable and precise methodology for identifying
malevolent users by scrutinizing their behavioral patterns. The Step 2: Handle the missing values
system attempts to capture both the spatial and temporal Replacing with Mean or Median values
aspects of user behavior data by utilizing the capabilities of
CNN and LSTM neural networks [17] [18]. In order to upsurge Step 3: Normalize the features
the model's performance and optimize its parameters, the Step 4: Splitting the dataset
Genetic Algorithm is also used.
Divide the Dataset
A dataset of user behavior that comprises elements like
login patterns, session length, transaction history, and other 1.Training Dataset
appropriate data is used as the system's input. The data is pre- 2. Testing Dataset
processed by the method in order to normalize and encrypt it
for neural networks. The CNN module of the classification Step 5: Feature Selection and Feature Extraction
pulls spatial characteristics from the input data, while the Step 6: Handling the time series data
LSTM component captures the temporal relationships [26].
The CNN and LSTM model is then trained expanding the Step 7: Data augmentation
training dataset [19] [20]. The Genetic Algorithm is used to Step 8: Finalize the pre-processed dataset
optimize the model, which scrutinizes various amalgamations
of hyper parameters to classify the optimal collection of End
parameters that maximizes the detection accuracy [23].
2) Algorithm implementation: The CLM and optimization
1) Data collection: The data collection and preparation technique model is in possession of assessing the pre-
module is responsible for gathering user behavior data and processed user behavior data and determining whether or not a
converting it into a format that can be used by the AIMDS certain user is acting maliciously. This model is made up of
model. This module collects data from many sources, such as two key parts: the CNN and LSTM layers, which extract
network traffic logs, user input logs, and system logs, and then features from user behavior data, and the genetic algorithm,
pre-processes the data to eliminate noise, missing values, and which optimizes the CLM technique (Convolution neural
other abnormalities [16]. networks and LSTM) model's parameters to enhance its
a) Dataset acquisition: A large-scale dataset accuracy [24][25].
comprehending user behaviour information is attained from a
Detecting malicious user behavior using the dual-powered
reliable internet platform. The dataset encompasses a diversity CLM technique and an optimization technique approach
of features such as user activities, timestamps, and session involves several algorithms formulas and techniques.
information.
b) Data Pre-processing: Pre-processing the dataset to a) Architecture: The CLM and optimization model
eradicate excessive or redundant characteristics, manage architecture is intended based on the three algorithms. The
missing values, and normalize the data [14]. Pre-processing CNN layer accumulates spatial characteristics from data, the
processes may include feature selection, data purification, and LSTM layer captures the temporal dynamics of user behaviour
categorical variable encoding. [15], and the GA layer optimizes the model's hyper
parameters.
c) Data preparation: The pre-processed data is
consequently prepared for model training. The data has been b) Training: The CLM and optimization model is
fragmented into training, validation, and testing sets to trained using the prepared data. The model is trained on the
accomplish this. The training set is utilized to train the training set, then it is validated on the validation set. During
prototypical, the validation set usage to fine-tune the hyper the training phase, the loss function is minimized using
parameters, and the testing set is used to assess the aftermath optimization techniques such as stochastic gradient descent or
of the model. Adam optimization.
Preprocessing the input data entails filtering and c) Hyperparameter tuning: The hyper parameters of the
normalizing the user behavior data to filter the noise and model are optimized using the GA. The GA is used to explore
insignificant data as the first stage. The feature extraction layer the hyper parameter space for the optimum hyper parameter
utilizes the input data to extract useful characteristics that may amalgamation that maximizes the model's performance. The
be utilized for further exploration once the preprocessed data GA's fitness function is based on evaluation measures such as
has been passed through it [21]. The Algorithm 1 provides the Acc, Prec, Rc, and f1s.
overview of the data preprocessing after the assemblage of the d) CLM Algorithm: The Algorithm 2 gives the details of
dataset has to endure a sequence of steps to further process. initialization of the convolution layer parameters and applying

619 | P a g e
www.ijacsa.thesai.org
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 14, No. 12, 2023

activation function. The scientific formulation for the CNN Ft_th: Fitness threshold
component involves convolutions and pooling operations.
s: hypotheses to be included
Let's symbolize the input data as X, the convolutional layer
output as C, and the pooling layer output as P. The Eq. (1) and F: fraction of population to be replaced
Eq. (2) gives the desired outcome. m: mutation error
) (1)
Step 1: Initialization
(2) Define population size
Algorithm 2: CLM
Step 2: Evaluation
BEGIN
Compute fitness
Initialize CNN parameters
Calculate fitness score
f = filtersize
Step 3: Selection
n=numoffilters
The probability Pr ( ) is
d= dropoutrate
fz= filtersizes

Define CNN
Step 4: Crossover
Inputlayer=input (shape= (input_shape))
Select pair of hypothesis from P
Convlayers= []
For each pair produce offspring by applying crossover
For f in fz:
Step 5: Mutation
Convlayer= Conv1D (filters=n, kernelsize=activation=’relu’)
(input_layer) Choose members with uniform probability
Poollayer =MaxPooling1D(poolsize=x) (convlayer) Step 6: Update
Convlayers.append (poollayer)
mergedlayer = Concatenate (axis=1) (convlayers) Step 7: Evaluate
flattenlayer = Flatten () (mergedlayer) Retrieve the best solution
dropoutlayer = Dropout (dropoutrate)(flatten_layer) END
outputlayer=
3) Evaluation and detection: The evaluation module is in
Dense(numclasses,activation='softmax') (dropoutlayer)
charge of establishing the CLM and optimization technique
model is accurate and successful at detecting harmful user
Compile and train the model behavior. This module often consists of testing the model's
performance on a test set of data and comparing its accuracy,
END precision, recall, and F1 score to other cutting-edge machine
learning models like SVM and Random Forest.
e) Optimization algorithm: The Algorithm 3 describes
the Genetic algorithm of initialization of population size, a) Evaluation: The CLM and optimization technique
evaluation of fitness, probabilistic selection to evaluate the efficacy is assessed using the testing set. Some of the
best solution. The fitness function in the genetic algorithm assessment metrics used include Acc, Prec, Rc, and f1 s. The
analyses the quality of each potential solution (chromosome). results are compared to other cutting-edge methodologies to
The fitness value is determined by the problem's purpose and assess the efficacy of the suggested methodology.
can be a combination of metrics such as accuracy, precision, b) Malicious user detection: Based on their conduct, the
recall, or F1-score. The fitness function directs the genetic trained proposed model CLM and optimization technique are
algorithm's selection, crossover, and mutation processes. applied to detect malicious users. The model accepts data on
user behaviour as input and produces the probability of the
Algorithm 3: Genetic Algorithm individual being malevolent. Based on the output prospect, a
BEGIN threshold is defined to identify people as malicious or non-
malicious. The methodology's architecture is depicted in Fig.
GA(Ft,Ft_th,s,f,m) 3.
Ft: Fitness function assigns evaluation score

620 | P a g e
www.ijacsa.thesai.org
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 14, No. 12, 2023

Fig. 3. Architecture to detect malicious user.

C. Accuracy
IV. EXPERIMENTAL EVALUATIONS AND RESULTS
Accuracy assesses the overall efficacy of the model's
A. Dataset Description predictions. It computes the proportion of correctly identified
The TwiBot-20 dataset, specifically designed for social cases (both harmful and non-malicious) in the dataset to the
media bots, serves as a substantial and all-encompassing total number of occurrences. A higher level of accuracy
standard for detecting Twitter bots. The purpose is to stimulate suggests superior performance. Eq. (3) can be used to evaluate
the difficulties posed by a small dataset size and accurately the accuracy. The Fig. 4 associates the present model with the
reflect both actual people and Twitter bots found in the real previous model.
world. The collection comprises 229,573 people, 33,488,192
(3)
tweets, 8,723,736 user property pieces, and 455,958 follow
relationships. It comprises a comprehensive range of
automated accounts and authentic users to more accurately
depict the Twitter community as it exists in reality. The dataset Performance of Accuracy
contains three different types of user information, which may
be used for both classifying individual users into two 100
95
0
categories and developing community-aware methods. The 90 89
0 91
0
three modalities are semantic information, property 80
information, and neighborhood information. The TwiBot-20
70
dataset is accessible for academic research objectives and is
hosted by the Bot Repository [22]. This benchmark is one of 60
the most extensive collections of Twitter bot detection data 50
available. It obliges as an accommodating tool for training and 40
assessing the proposed model that aim to identify harmful users
in online social networks, specifically in the context of Twitter 30
bot identification. 20
Considering the objective of achieving optimal 10
performance in identifying harmful user activity, it is vital to 0 0
conduct experiments and prudently tune the settings. The AIMDS SVM RF
properties of the dataset, the kind of malicious activity, and the
computational resources that are available for training and Accuracy
optimization all have a role in the selection of parameters.
When trying to fine-tune these parameters in an efficient Fig. 4. Accuracy comparison.
manner, it is frequently prerequisite to do iterative refinement
based on performance data and domain expertise. D. Precision
B. Experimental Results Precision is the measurement of successfully recognized
harmful users among all occurrences projected to be malicious
Numerous indicators may be used to measure the success
[22]. It is determined as the ratio of TP (malicious users
of a system built to identify harmful user behaviour using CLM
accurately predicted) to the total of TP and FP (malicious users
and optimization techniques [28] [31]. Considering the
wrongly categorized as non-malicious). A higher precision
frequently used assessment metrics.

621 | P a g e
www.ijacsa.thesai.org
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 14, No. 12, 2023

suggests that there are fewer false positives. Eq. (4) is used to suggested strategy was assessed. The outcomes show how well
evaluate the precision. Fig. 4 compares the present model with CLM technique (Convolution neural networks and LSTM) and
previous models. optimization technique appropriately classify malicious users
based on their behaviour patterns.
(4)
On the, which encompasses of TwiBot-20 dataset gathered
E. Recall from an online platform, the performance of the suggested
The fraction of real malicious users properly recognized by strategy was assessed. The outcomes show how well CLM
the model is measured by Rc, also labelled as sensitivity or true technique (Convolution neural networks and LSTM) and
positive rate. It is determined as the proportion of true positives optimization technique perform in correctly classifying
to the total of TP and FN (malicious users categorized malicious users based on their behaviour patterns. Fig. 7 gives
mistakenly as non-malicious). A better recall means that there the portrayal of a comparison of evaluation metrics.
are fewer false negatives. Eq. (5) is used to evaluate the recall. The proposed methodology consistently outperforms
Fig. 5 compares the present model with previous models. existing methodologies and traditional models, as demonstrated
by the assessment measures. The genetic algorithm's ability to
(5) adapt is a crucial factor in accomplishing enhanced
performance through hyper parameter optimization and feature
selection. Conventional models may face difficulties in
PERFORMANCE twigging the ever-changing and dynamic aspects of user
behavior, while the proposed model excels in identifying
PRECISION RECALL intricate patterns.
94

Comparision
93

92

CLM Technique SVM RF

96
90

95 95
89

94 94
88

93 93 93

92 92

91 91
CLM RF SVM
TECHNIQUE 90 90 90
Fig. 5. Comparison of precision and recall. 89 89 89 89
F. F1 Score 88 88
The f1s combine accuracy and recall into a single statistic
that balances their respective trade-offs. It provides an ample 87
evaluation of the model's performance and is the harmonic 0 1 2 3 4 5
mean of accuracy and recall. An increased F1-score suggests a
better balance of accuracy and recall. The Eq. (6) evaluates the Fig. 6. Performance of proposed approach.
F1 score.
TABLE I. PERFORMANCE EVALUATION
(6)
Techniques
Metric
Summarizing the values, the following Table I and Fig. 6 Proposed Technique SVM RF
provide the overall performance of the CLM and optimization Accuracy 95 89 91
technique with the traditional algorithms. The experimental
Precision 94 88 92
study of CLM and optimization technique used an amalgam of
CNN, LSTM, and genetic algorithms (GA) to assess user Recall 93 90 89
behaviour in order to ascertain malevolent users. On the user F1 Score 93 89 90
behaviour dataset, which encompasses of user behaviour data
gathered from a web platform, the performance of the

622 | P a g e
www.ijacsa.thesai.org
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 14, No. 12, 2023

data sources such as network traffic statistics, device


Comparision of Evaluation information, and contextual data. To escalate the detection
capacity, the future scope might investigate the integration of
Metrics numerous data modalities. The prospect of detecting
detrimental user behavior in online social networks is vast and
AIMDS RF SVM promising. Ongoing exploration in these domains will not only
enhance contemporary models but also aid in the conception of
Accuracy online security systems that are more ethical, transparent, and
96 user-friendly. The initiative aims to tackle the complex issues
94 presented by malicious user behavior in the digital domain by
92 utilizing a multidisciplinary approach that encompasses
computer science, social sciences, and ethics.
90
88 REFERENCES
86 [1] Al-Hassan, M., Abu-Salih, B., & Al Hwaitat, A. (2023). DSpamOnto:
f1 Score 84 Precision An Ontology Modelling for Domain-Specific Social Spammers in
Microblogging. Big Data and Cognitive Computing, 7(2), 109.
[2] Han, R., Kim, K., Choi, B., & Jeong, Y. (2023). A Study on Detection of
Malicious Behavior Based on Host Process Data Using Machine
Learning. Applied Sciences, 13(7), 4097.
[3] Hayawi, K., Saha, S., Masud, M. M., Mathew, S. S., & Kaosar, M.
(2023). Social media bot detection with deep learning methods: a
systematic review. Neural Computing and Applications, 35(12), 8903-
8918.
Recall [4] El-Ghamry, A., Darwish, A., & Hassanien, A. E. (2023). An optimized
CNN-based intrusion detection system for reducing risks in smart
Fig. 7. Comparison of evaluation of metrics. farming. Internet of Things, 22, 100709.
[5] Alkahtani, H., & Aldhyani, T. H. (2022). Artificial intelligence
The proposed model that is anticipated has extraordinary algorithms for malware detection in android-operated mobile
devices. Sensors, 22(6), 2268.
performance; nonetheless, it is not immune to constraints. The
efficacy of the model may differ contingent on the [6] Ranjan, R., & Kumar, S. S. (2022). User behaviour analysis using data
analytics and machine learning to predict malicious user versus
distinguishing features of various social platforms as well as legitimate user. High-Confidence Computing, 2(1), 100034.
the characteristics of malicious activities. Furthermore, the [7] Lazarov, A. D., & Petrova, P. (2022). Modelling activity of a malicious
prospective for further research lies in exploring the user in Computer Networks. Cybernetics and information
interpretability of the model, explicitly addressing issues technologies, 22(2), 86-95.
regarding the opaque nature of deep learning models. [8] Jabar, T., Singh, M. M., & Al-Kadhimi, A. A. (2022, March). Mobile
Advanced Persistent Threat Detection Using Device Behavior
V. CONCLUSION (SHOVEL) Framework. In Proceedings of the 8th International
Conference on Computational Science and Technology: ICCST 2021,
The paper proposed a novel architecture a CLM technique Labuan, Malaysia, 28–29 August (pp. 495-513). Singapore: Springer
(Convolution neural networks and LSTM) and an optimization Singapore.
technique to detect harmful user behavior using user behavior [9] Shen, X., Lv, W., Qiu, J., Kaur, A., Xiao, F., & Xia, F. (2022). Trust-
analysis in this study. In reliably detecting fraudulent users, an aware detection of malicious users in dating social networks. IEEE
amalgam of CNN, LSTM networks, and genetic algorithms Transactions on Computational Social Systems.
(GA) produced promising results. The model efficiently caught [10] Jain, A. K., Sahoo, S. R., & Kaubiyal, J. (2021). Online social networks
security and privacy: comprehensive review and analysis. Complex &
spatial patterns in the TwiBot-20 dataset by utilizing CNN. To Intelligent Systems, 7(5), 2157-2177.
capture temporal interdependence and sequential patterns in
[11] Senthil Raja, M., & Arun Raj, L. (2021). Detection of malicious profiles
user behavior sequences, LSTM networks were used. The and protecting users in online social networks. Wireless Personal
incorporation of genetic algorithms assisted in the optimization Communications, 1-18.
of model parameters and the improvement of model [12] Gururaj, H. L., Tanuja, U., Janhavi, V., & Ramesh, B. (2021). Detecting
performance. On the TwiBot-20 dataset, the CLM and malicious users in the social networks using machine learning
optimization technique surrogate conventional machine approach. International Journal of Social Computing and Cyber-
Physical Systems, 2(3), 229-243.
learning algorithms including SVM and Random Forest in
terms of Acc, Prec, Rc, and F1s.This demonstrates the utility of [13] Khaund, T., Kirdemir, B., Agarwal, N., Liu, H., & Morstatter, F. (2021).
Social bots and their coordination during online campaigns: A
deep learning and genetic algorithms for identifying harmful survey. IEEE Transactions on Computational Social Systems, 9(2), 530-
user behavior. Overall, the technique proposed in this study 545.
provides a strong foundation for identifying fraudulent user [14] Rahman, M. S., Halder, S., Uddin, M. A., & Acharjee, U. K. (2021). An
behavior using deep learning methods. It paves the door for efficient hybrid system for anomaly detection in social
future research in deep learning, genetic algorithms, and user networks. Cybersecurity, 4(1), 1-11.
behavior analysis, paving the way for more advanced and [15] Sansonetti, G., Gasparetti, F., D’aniello, G., & Micarelli, A. (2020).
accurate detection systems. Data on user behavior may not be Unreliable users detection in social media: Deep learning techniques for
automatic detection. IEEE Access, 8, 213154-213167.
sufficient to provide an ample portrait of harmful activity. The
detection system's accuracy may be enhanced by additional

623 | P a g e
www.ijacsa.thesai.org
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 14, No. 12, 2023

[16] Terumalasetti, S. (2022, August). A Comprehensive Study on Review of [24] Kim, J., Park, M., Kim, H., Cho, S., & Kang, P. (2019). Insider threat
AI Techniques to Provide Security in the Digital World. In 2022 Third detection based on user behavior modeling and anomaly detection
International Conference on Intelligent Computing Instrumentation and algorithms. Applied Sciences, 9(19), 4018.
Control Technologies (ICICICT) (pp. 407-416). IEEE. [25] Qiu, J., Shen, X., Guo, Y., Yao, J., & Fang, R. (2019, August). Detecting
[17] Wu, X., Sun, Y. E., Du, Y., Xing, X., Gao, G., & Huang, H. (2020). An malicious users in online dating application. In 2019 5th International
efficient malicious user detection mechanism for crowdsensing system. Conference on Big Data Computing and Communications
In Wireless Algorithms, Systems, and Applications: 15th International (BIGCOM) (pp. 255-260). IEEE.
Conference, WASA 2020, Qingdao, China, September 13–15, 2020, [26] Kiran, K., Manjunatha, C., Harini, T. S., Shenoy, P. D., & Venugopal,
Proceedings, Part I 15 (pp. 507-519). Springer International Publishing. K. R. (2019, March). Identification of anomalous users in Twitter based
[18] Sarker, I. H., Kayes, A. S. M., Badsha, S., Alqahtani, H., Watters, P., & on user behaviour using artificial neural networks. In 2019 IEEE 5th
Ng, A. (2020). Cybersecurity data science: an overview from machine International Conference for Convergence in Technology (I2CT) (pp. 1-
learning perspective. Journal of Big data, 7, 1-29. 5). IEEE.
[19] Wanda, P., Hiswati, M. E., & Jie, H. J. (2020). DeepOSN: Bringing deep [27] Hong, T., Choi, C., & Shin, J. (2018). CNN‐based malicious user
learning as malicious detection scheme in online social network. IAES detection in social networks. Concurrency and Computation: Practice
International Journal of Artificial Intelligence, 9(1), 146. and Experience, 30(2), e4163.
[20] Mou, G., & Lee, K. (2020). Malicious bot detection in online social [28] Yu, J., Wang, K., Li, P., Xia, R., Guo, S., & Guo, M. (2017). Efficient
networks: arming handcrafted features with deep learning. In Social trustworthiness management for malicious user detection in big data
Informatics: 12th International Conference, SocInfo 2020, Pisa, Italy, collection. IEEE Transactions on Big Data, 8(1), 99-112.
October 6–9, 2020, Proceedings 12 (pp. 220-236). Springer [29] Saracino, A., Sgandurra, D., Dini, G., & Martinelli, F. (2016). Madam:
International Publishing. Effective and efficient behavior-based android malware detection and
[21] Samokhvalov, D. I. (2020). Machine learning-based malicious users' prevention. IEEE Transactions on Dependable and Secure
detection in the VKontakte social network. Труды института Computing, 15(1), 83-97.
системного программирования РАН, 32(3), 109-117. [30] Khan, M. U. S., Ali, M., Abbas, A., Khan, S. U., & Zomaya, A. Y.
[22] Rabbani, M., Wang, Y. L., Khoshkangini, R., Jelodar, H., Zhao, R., & (2016). Segregating spammers and unsolicited bloggers from genuine
Hu, P. (2020). A hybrid machine learning approach for malicious experts on twitter. IEEE Transactions on Dependable and Secure
behaviour detection and recognition in cloud computing. Journal of Computing, 15(4), 551-560.
Network and Computer Applications, 151, 102507. [31] Khan, M. U. S., Ali, M., Abbas, A., Khan, S. U., & Zomaya, A. Y.
[23] https://botometer.osome.iu.edu/bot-repository/datasets.html [Dataset]. (2016). Segregating spammers and unsolicited bloggers from genuine
experts on twitter. IEEE Transactions on Dependable and Secure
Computing, 15(4), 551-560.

624 | P a g e
www.ijacsa.thesai.org

You might also like