0% found this document useful (0 votes)
20 views

Optimizing YouTube Spam Detection With Ensemble Deep Learning Techniques

The document discusses the optimization of YouTube spam detection using ensemble deep learning techniques, specifically focusing on CNN, LSTM, Attention, and a hybrid CNN-LSTM model. The proposed stacking classifier achieved an accuracy of 93.5%, outperforming traditional spam detection methods and enhancing user experience on the platform. The study emphasizes the need for effective spam removal algorithms to protect content creators and users from malicious activities in YouTube comments.

Uploaded by

Sandeep N
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
20 views

Optimizing YouTube Spam Detection With Ensemble Deep Learning Techniques

The document discusses the optimization of YouTube spam detection using ensemble deep learning techniques, specifically focusing on CNN, LSTM, Attention, and a hybrid CNN-LSTM model. The proposed stacking classifier achieved an accuracy of 93.5%, outperforming traditional spam detection methods and enhancing user experience on the platform. The study emphasizes the need for effective spam removal algorithms to protect content creators and users from malicious activities in YouTube comments.

Uploaded by

Sandeep N
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

Optimizing YouTube Spam Detection with

Ensemble Deep Learning Techniques


2024 14th International Conference on Cloud Computing, Data Science & Engineering (Confluence) | 979-8-3503-4483-7/24/$31.00 ©2024 IEEE | DOI: 10.1109/CONFLUENCE60223.2024.10463326

A.Ilavendhan Srinivasa Narayanan.A N.Janani


SCOPE SCOPE SCOPE
Vellore Institute of Technology Vellore Institute of Technology Vellore Institute of Technology
Chennai, India Chennai, India Chennai, India
[email protected] [email protected] [email protected]
.in
Abstract—A pressing conundrum on social media YouTube comments. These state-of-the-art neural network
platforms- Spam comments, have ushered the ecosystem of topologies advance spam detection through the use of
YouTube suppressing content quality and user experience powerful text-processing skills.
where recent scams like Ponzi schemes and phishing are
hiking. These unnecessary self-promotions and redirection to In this study, we start a thorough investigation of CNN,
malicious sites which lead to finance and information loss are LSTM, Attention, and CNN-LSTM hybrid ensemble
aimed to be detected for saving both the content creator's time approaches for YouTube comment spam detection. We go
by making the filtration process easier and also automatic into the subtleties of comment spam, looking at how it
spam removal algorithm more efficient. Although tools like changes over time and the different strategies spammers use.
Google Safe Browsing or YouTube team's blocker are in play, We then investigate how these sophisticated neural network
they just block basic links and fail to secure users in real-time models enhance the efficiency of our ensemble strategy. The
scenarios. Here we propose a stacking classifier, an ensemble design, implementation, and assessment of an ensemble-
approach towards spam detection that combines various deep based comment spam detection system specifically created
learning techniques such as LSTM, CNN, Attention, and a
for YouTube are the main focuses of our study. We weigh
hybrid model of LSTM-CNN to output a classification using a
the output of our ensemble approach to other state-of-the-art
DNN as the meta classifier. Our stacking approach has
achieved an accuracy of 93.5% and a comparative study with
techniques, looking at important metrics including precision,
individual models is also looked into. recall, F1-score, and Accuracy. By leveraging deep learning
models, we aim to provide increased efficiency as well as
Keywords—Deep Learning, Ensemble Techniques, spam, save time for content creators who often manually remove
classifier, Stacking. spam comments. The proposed algorithm aims to enhance
user experience and provide a better and safer platform.
I. INTRODUCTION
This paper centers around the idea of using Deep
Learning techniques and attempting it alongside the
In the vast expanse of cyberspace, where billions of users ensembling method and combining it to see the accuracy of
flock to the ever-evolving digital landscapes of YouTube for this work in classifying spam and reducing the false
entertainment, information, and connection, a positives. The rest of the paper has been structured in the
lurking shadow casts doubt on the authenticity and following manner, the subsequent section(ii) focuses on the
security of this platform. As the YouTube community contribution of other researchers circling this issue. In sector
continues to grow, the volume of user-generated content, (iii), the experimental framework and methodological
especially in the form of comments continues to expand too. processes followed to obtain the result are explained in brief
This shadow takes the form of YouTube comment scams and followed by the recorded result and comparative analysis
nefariously designed traps that exploit trust and curiosity for given in section 4. Moving on to the second division of this
malicious gain. While the comments provide an avenue for thesis, elaborated below are the various approaches followed
engagement, discussion, and feedback, they also attract an by different researchers on various platforms and the
ever-increasing menace: comment spam. On YouTube, analysis of their efficiency and drawbacks.
comment spam is a prolonged and ubiquitous issue that can
negatively affect viewers' experiences, harm a channel's II. COMPREHENSIVE REVIEW OF EXISTING APPROACHES
reputation, and even encourage illegal activity. Therefore, it
is crucial to identify and eliminate comment spam to There are various deep learning techniques implemented
preserve the reliability and usability of this cherished in various spam classifiers on different platforms. In, [1] the
platform. Developers have used a variety of strategies and authors used a TF-IDF(Term frequency-Inverse document
algorithms to combat this issue, but spam comments on frequency) model associated with a Random Forest
YouTube persist. Specifically focused on the application of classification algorithm. The model outperformed other
ensemble approaches which also includes a hybrid model of algorithms in terms of accuracy. It gave a precision of 0.98
CNN and LSTM. and a f-measure of approx 0.97 on the SMS dataset
extracted from the UCI ML Repository.
Attention, and a CNN-LSTM hybrid model—this
research explores the topic of preventing fraudulent

979-8-3503-4483-7/24/$31.00 ©2024 IEEE 625


Authorized licensed use limited to: Bahria University. Downloaded on November 11,2024 at 09:17:10 UTC from IEEE Xplore. Restrictions apply.
In [2], Xuemin Chen et.al proposed an HMM-Hidden In [7], the research concentrates on the acquisition of
Markov Model for detection purpose, which considers the hyperparameters in ensemble models, mostly using Random
word order information and not just the form of the word. Forest and XGBoost, and improves predictive accuracy by
mining decision trees in email headers to distinguish
This model additionally improvised which the traditional
between complete and partial spams. With over 90%
feature extraction algorithms like BoW and TF-IDF fail
accuracy, it combines PCA and CFS for feature selection,
because of low term frequency which leads to inaccurate
and it is further enhanced by NB, PSO, and GDTPNLP
results. When compared to other models such as naïve
approaches. Solving environment: ...working... unsuccessful
Bayes, LDA-Latent Dirichlet Allocation, SVM-Support
initial attempt using frozen solve and retrying with flexible
Vector Machines, LSTM-Long Short-term Memory, and
solutions. An ensemble model is used to combine multiple
CNN, the HMM method achieved an accuracy of 95.9%,
different prediction models and then derive one final
which is the highest after CNN’s 97.9% on the UCI English
prediction result. In [8 ], the researchers proposed an
dataset. The model achieved a higher accuracy on the
ensemble approach with two models, soft and hard voting.
Chinese dataset due to the different forms of English words.
Hard voting considers the most preferred classification
In [3], a different approach was proposed by the authors result while Soft voting utilizes the average of class
introducing a semi-supervised novelty detection with one predictions' possibilities. The results show that the ESM-S
class SVM for spam detection, which is trained on ham data. achieved the highest accuracy with 95.06%.
This makes it possible to use in the lack of labeled spam
Seungwoo Choi et al. [9], worked with an algorithm,
data. A One-class classification model is used for identifying
specifically for Ted Talks videos to identify the comments
the desired class or an outlier after being fitted to the
regarding information on contents of the video. Nonetheless,
required data. The pre-processing of the ham data is
the result proved to be inadequate in processing the
followed by further an integer encoding and a low
sentiments and feelings behind expressing opinions on
dimensional vector embedding which results in a processed
digital platforms.
dataset ready to be trained on the OC-SVM model. It can be
seen that the OC-SVM outperformed the best results from The authors of [10] conducted a comprehensive study on
binary, frequency, and TF-IDF with supervised learning YouTube comment spam detection with the help of YouTube
models in comparison with a spam detection rate of 100%. API by collecting original comments from 5 most popular
The OC-SVM achieved an overall accuracy of 98%, beating videos. These comments were manually classified as either
the best binary (97.2%), the best frequency-based model spam or ham using a collaborative classifying tool. In their
(97.4%), and the best TF-IDF (97.3%). data preprocessing phase, they used techniques such as the
Bag of Words model (BoW) and Term frequency(TF). BoW
A vanilla Transformer based model has been proposed
is just used for initial data preprocessing. The limitations of
by [4] to tackle the problem of spam detection in SMS. The
these BoW and TF are explained in [11]. They partitioned
authors reviewed and compared the performance of
their dataset into 70% for training and 30% for testing, and
traditional many traditional machine learning models
the test was again run on a fresh dataset to confirm the result
including CNN and LSTM, where the CNN-LSTM solution
obtained previously. Impressively, all ten classifiers
performed the best, resulting with 98.3% accuracy and a f1
achieved over 90% accuracy, with less than 5% false
score of 0.914. The authors proposed a modified form of the
positives. Notably, classifiers such as CART, SVM-L, SVM-
regular transformer with the introduction of memory, which
R, RF, LR, and NB-B demonstrated an exceptional level of
is a list of trainable parameters that act as a substitute for
accuracy, reaching a 99.9% confidence level. In a
output sequence embedding. Sigmoid function has been
subsequent study by [12], the aim was to provide better
used as the final activation function , which modifies the
results than in [10] with the introduction of an Artificial
output of the linear layers after they pass through the
Neural Network (ANN) classification model. The authors
decoders, to produce a binary result classifying whether the
used the same dataset, but the comments were paraphrased
message as spam or ham. The transformer model, when
and presented more technically, without altering their
compared to other typical spam classifiers, achieved the best
content. On a YouTube spam dataset, research in [13]
accuracy of 98.92%. Analyzing the performance of the
examined 20 machine-learning techniques. Gradient
models, the proposed model outperformed the other models
boosting technique LightGBM surpassed all others,
with scores of 97.06%, 0.8746, 0.8576, and 0.8600 on the
outperforming Random Forest Classifier by 7% and Logistic
accuracy, precision, recall, and f1-score on the UtkMI’s
Regression by 1% with a phenomenal 94% accuracy.
Twitter dataset.
LightGBM's success was largely due to its capacity to
In [5], one of the most widely used DL model-Recurrent prioritize information gained from data samples with
Neural Network (RNN), as well as its variants such as significant gradients, suggesting the possibility for
LSTM, were used in this process and proved to be additional advancements using dynamic datasets from the
extremely effective out of everything experimented during Google API. Context and semantic analysis are better done
the past few researches, whereas in [6], an attention-like by DL models providing more or less the same accuracy.
sequential model which was originally designated for Problems of NLP processing difficulty and adaptability
translation purpose also was a great success in faced with these ML models were improved upon by our
English-French/German translations. method thus proving to be more efficient and precise.

2024 14th
626 Authorized licensed International Conference on Cloud Computing, Data Science & Engineering (Confluence)
use limited to: Bahria University. Downloaded on November 11,2024 at 09:17:10 UTC from IEEE Xplore. Restrictions apply.
In [14], Roy et al. used deep learning methods to Fig. 1. Architectural diagram for processes involved in the classification.
separate spam from real SMS messages by combining
LSTM and CNN. They contrasted their strategy with well-
established machine learning techniques like LR, Stochastic The dataset has been taken from the comment section of
Gradient Descent, Gradient boosting, and Random Forest. 5 different music artists. This dataset has been directly
The CNN-LSTM models outperformed other approaches, extracted from Kaggle which had already been classified as
according to the results. Additionally, they suggested two spam ham and provided with 1.6k+ comments to work on.
methods, one for SMS spam identification that combined an Here first the raw dataset is taken and preprocessed using
MLP network with a pre-trained "BERT" language model NLP techniques to make it ready for implementation.
for 98.37% accuracy and a hybrid CNN-LSTM classifier for In the implementation phase, all DL techniques
94.37% accuracy. introduced above and discussed below are coded on the
Moving on to the next phase, an overview of multi- dataset and Resultant values such as F1 score, Accuracy, and
faceted landscape spam detection across different platforms Precision are recorded to make a comparative analysis of
has been studied and approximately 90-94% has been these with an ensembled and stacked model. In this way, we
achieved; however looking at spam detection in the can say whether the approach and objective of this paper
YouTube spam sector, only 89-90% has been recorded. The have been a success or a failure in disputing this issue. The
objective is to improve the accuracy rate and decrease false processes followed are elaborated below, starting with Data
positives with the combination of deep learning approaches Selection in (A) part.
and ensembling techniques. A deeper look into this process A. Data Selection
has been explained in detail in the consequent part, (iii)rd
A dataset from Kaggle has been used in this paper. This
section of this paper.
consists of a spam/ham collection from 5 popular global
III. METHODOLOGY artists including Katy Perry, LMFAO, Psy, and others. The
dataset is balanced with 1005 spam and 750 ham comments.
Since it is pre-labelled, supervised learning is employed
Discussed in the section is the overall approach and here. The dataset is a CSV file with comment id, author,
method used to classify YouTube comment sections by date, content and label has columns. For future work,
individually analyzing DL methods such as LSTM, CNN, procuring the comments using the Youtube API would
and CNN-LSTM combination as well as Attention models
and comparing it with results obtained from combining

them.

2024 14th International Conference on Cloud Computing, Data Science & Engineering (Confluence)
Authorized licensed use limited to: Bahria University. Downloaded on November 11,2024 at 09:17:10 UTC from IEEE Xplore. Restrictions apply.
627
provide a more rich but unbalanced dataset. low-dimensional dense vector that retains this
collected information and also identifies nuisance
texts that are also spam.

Algorithms also expect vectors that Count Vectorizer and


Tokenizer API from TensorFlow Keras obtain. Here, 80%
training, 10% validation, and 10% test samples are used.

A. Implementation of various DL models and other


techniques
Now we classify data that should be used for testing and
training separately. It’s used for the construction of a
modifier class. A few methods used here are:
TABLE I. SPAM/HAM DATASET FOR 5 YOUTUBE ARTISTS FROM
UCI REPOSITORY Ï LSTM: Long Short-Term Memory used for
semantic synthesis in text analysis is a popular
Datasets Youtube ID Spam Ham deep learning model. It converts text corpus word
vectors into sentence representations. With the
Psy 9bZkp7q19f0 175 175 addition of three gates—an input gate, an output
gate, and a forget gate—LSTM addresses the
KatyPerry CevxZvSJLk8 175 175 vanishing gradient issue that plagues classic RNNs
and improves information retention by better
handling of varying-length sequences.
LMFAO KQ6zr6kCPj8 236 202

Ï CNN: Convolutional Neural Networks require a


Eminem uelHwf8o7_U 245 203
systematic strategy to be used for spam
identification. CNN architecture consists of a
Shakira 174 196 convolutional layer and a pooling layer followed
pRpeEdMmmQ
0 by the fully connected layers, which is made to
Now the dataset has been classified and made ready for extract local text features and classify binary
NLP operations in the (B) section. spam. Once trained, the CNN can be used
effectively to detect spam in a variety of text data
B. Data Pre-Processing applications.
This is a very essential process to make the dataset ready
for the implementation of different techniques. It removes Ï Attention: It improves classification precision. By
spaces, special symbols, periods, capitalizations, and fine-tuning the attention model on comments that
paragraphs to make it text-processed. This process mainly have been flagged as spam and ham and using the
entails following NLP specifications by undergoing proper threshold, one can achieve successful
processes like stemming, tokenization, and lemmatization categorization and increase the accuracy of spam
and then exiting the elimination stage. detection in YouTube comments.
C. Feature Extraction
Ï CNN-LSTM Hybrid: Text comments are initially
This step converts raw data into processable numeric tokenized, preprocessed, and embedded. LSTM
values. Two major techniques used here are: layers record sequential patterns, while CNN
layers extract local features from the text. By
Ï TF-IDF: Words are given weights based on how combining local and sequential information, the
important they are to the document and the overall model is better able to distinguish between spam
dataset. It transforms raw text data into numerical and ham.
values. It evaluates the significance of data and
matches it with the most frequent occurrences in Ï Ensembling Techniques: It takes several base
the spam data given but less in legitimate messages models into account and gives a predictive model
which can be spam. as output. It provides a better result than any of the
individual base models.
Ï Word embeddings (such as Word2Vec and GloVe):
Here, Word2Vec-Google News is used. Using Hence in the ratio of 80:10:10 datasets are segregated
dense vector representations of words to capture into training, validation, and test data. They are processed
semantic links. They encapsulate semantic using the above given techniques and the individual models
information and can be used to represent text and and ensemble model is trained on them.
enable algorithms. They specifically generate a

2024 14th
628 Authorized licensed International Conference on Cloud Computing, Data Science & Engineering (Confluence)
use limited to: Bahria University. Downloaded on November 11,2024 at 09:17:10 UTC from IEEE Xplore. Restrictions apply.
Moving on to the final part of the third section, we will Ï Recall(true positive Rate or sensitivity): By
see about stacking classification and what it is used for. evaluating the ratio of true positives to all actual
positive cases, recall accesses the rate of catching
all genuine positive cases.
B. Stacking Classifiers and Result

(3)
ܴ݈݈݁ܿܽ = ܶܲܶܲ+‫ܰܨ‬

Ï F1 Measure: This metrics is the harmonic mean of


precision and recall. It gives a fair evaluation of a
model's performance, which is especially helpful
when recall and precision have competing
objectives.

‫ܨ‬1 ‫ = ݁ݎݑݏܽ݁ܯ‬2*ܲ(+ܲ*ܴܴ) (4)

TP - True Positive, FP - False Positive, FN - False


Negative, P - Precision, R - Recall.

A result has been put forward in a tabular format using


Fig. 2. Stacking classifier model the above-given parameters. The objective of this study is to
Stack classifiers are mainly used for batch or real-time prove that ensemble methods prove to be more effective
processing of comments and improve accuracy by utilizing than individual dl/transformer techniques. With the working
various model strengths. Continuous observation and specification, research, and other measures we can confirm
modification are essential for preserving efficacy in the same.
identifying changing spam patterns. The pre-trained Observing Table 2, Fig. 3(a) and 3(b), we can infer that
individual models are stacked together and further dense and the stacking model outperforms the other individual base
dropout layers are added to this creating the ensemble model with an accuracy of 93.5%, followed by CNN with
model. The outcome of this process would be the result 91%, LSTM and CNN-LSTM with 90% accuracy. Since the
which is the segregated ham and spam counts. A precision for ham identification of the ensemble method is
combination of multiple predicted models can be achieved less, we can improve our model by using datasets collected
through this which helps with the centric idea of this from YouTube through Google’s API which can be accessed
research. This ensembled approach mitigates the risk of from their developer console.
overfitting which is a major problem faced hence proving to
be better than choosing the best of the given models and
additionally gives improved generalization and better
performance which is proved in the successive section.
The recorded values are discussed in the (iv)th section
which also rates the success of this approach and a
comparative analysis of the proposed model with others.

IV. RESULT

The result conclusion will be done based on the following


parameters:
Ï Accuracy: A prediction's accuracy is determined by Fig. 3. (a) Comparative analysis of the models for identification of Spam
how many of the total forecasts were correct. It's an
easy statistic to use to judge overall correctness.

(1)
‫ݏ݊݋݅ݐܿ݅݀݁ݎܲ ݈ܲܶܽݐ݋ܶ = ݕܿܽݎݑܿܿܣ‬+ܶܰ

Ï Precision: Precision determines the proportion of


true positives (accurately predicted) to all positive
cases that were observed. It assesses how well the
model can produce reliable positive predictions.
(2)
ܲ‫ܲܶܲܶ = ݊݋݅ݏ݅ܿ݁ݎ‬+‫ܲܨ‬

2024 14th International Conference on Cloud Computing, Data Science & Engineering (Confluence)
Authorized licensed use limited to: Bahria University. Downloaded on November 11,2024 at 09:17:10 UTC from IEEE Xplore. Restrictions apply.
629
Fig. 3. (b) Comparative analysis of the models for identification of Ham [2]PortioResearch Worldwide A2P SMS Markets 2014–2017:
Understanding and Analysis of Application to-Person Text Messaging
Markets Worldwide; Portio Research Limited: Chippenham, UK, 2014.
TABLE II. COMPARISON OF RESULTS FOR ALL MODELS [3] Yerima, Suleiman & Bashar, Abul. (2022). Semi-supervised novelty
detection with one class SVM for SMS spam detection.
Precision Recall F1-Score Accuracy 10.1109/IWSSIP55020.2022.9854496.
Metrics/ [4] Liu, Xiaoxu & Lu, Haoye & Nayak, Amiya. (2021). A Spam
Algorithms Transformer Model for SMS Spam Detection. IEEE Access. PP. 1-1.
10.1109/ACCESS.2021.3081479.
Attention Spam 0.82 0.82 0.82 0.86 [5] P. K. Roy, J. P. Singh, and S. Banerjee, ‘‘Deep learning to filter SMS
spam,’’ Future Gener. Comput. Syst., vol. 102, pp. 524–533, Jan. 2020
[6] T. B. Brown et al., ‘‘Language models are few-shot learners,’’ 2020,
Ham 0.88 0.88 0.88
arXiv:2005.14165. [Online].
[7] Prof. R.S and Ms. Rachana Mishra, 2013, “Thakur: Analysis of
CNN Spam 0.81 0.96 0.88 0.91 Random Forest and Naïve Bayes for Spam Mail using Feature Selection
Categorization”.
Ham 0.97 0.86 0.91 [8] H. Oh, "A YouTube Spam Comments Detection Scheme Using
Cascaded Ensemble Machine Learning Model," in IEEE Access, vol. 9, pp.
144121-144128, 2021, doi: 10.1109/ACCESS.2021.3121508.
LSTM Spam 0.82 0.97 0.89 0.90 [9] Choi, Seungwoo & Segev, Aviv. (2016). Finding informative
comments forvideoviewing.2457-2465.10.1109/BigData.2016.7840882.
Ham 0.98 0.86 0.91 [10] S. Sharmin and Z. Zaman, ‘‘Spam detection in social media employing
machine learning tools for text mining,’’ in Proc. 13th Int. Conf.
SignalImage Technology. Internet-Based Syst. (SITIS), Dec. 2017, pp.137–
Spam 0.85 0.91 0.88 0.90 142, doi: 10.1109/SITIS.2017.32.
CNN [11] A. O. Abdullah, M. A. Ali, M. Karabatak, and A. Sengur, ‘‘A
-LSTM Ham 0.94 0.90 0.92 comparative analysis of common Youtube comment spam filtering
techniques,’’ in Proc. 6th Int. Symp. Digit. Forensic Secur. (ISDFS), Mar.
2018, pp. 1–5, doi: 10.1109/ISDFS.2018.8355315.
Stacking Spam 0.96 0.85 0.90 0.935
[12] E. Poche, N. Jha, G. Williams, J. Staten, M. Vesper, and A. Mahmoud,
‘‘Analyzing user comments on Youtube coding tutorial videos,’’ in Proc.
Ham 0.90 0.97 0.94 IEEE/ACM 25th Int. Conf. Program Comprehension (ICPC), May 2017,
pp. 196–206, doi: 10.1109/ICPC.2017.26.
The results of individual techniques and their analysis [13] J. Gracia Betty, R. Harivarthini, O. Deepthi, R. Pari, and P. Maharajan,
have been tabulated above. This research can be further "YouTube Video Spam Comment Detection Using Light Gradient Boosting
Machine," 2023 5th International Conference on Inventive Research in
revamped in further studies with mathematical values Computing Applications (ICIRCA), Coimbatore, India, 2023, pp. 1650-
however, with other factors such as running time, result 1656, doi: 10.1109/ICIRCA57980.2023.10220615.
segregation, etc we can draw and conclude that the [14] Roy, P.K.; Singh, J.P.; Banerjee, S. Deep learning to filter SMS Spam.
ensemble method proves more effective than others after all. Future Gener. Comput. Syst. 2020, 102, 524–533.

V. CONCLUSION

The research report on YouTube comment scam


detection concludes that these scams pose a significant
threat to users. They typically involve fraudulent comments
with links to phishing sites, fake giveaways, or scams
designed to steal personal information and funds. The
application of deep learning algorithms for comment
moderation is pivotal in addressing this issue effectively.
Neural networks for comment analysis allow for the
automatic identification of suspicious patterns, enabling
quicker responses and the removal of fraudulent content.
The data collection of the individual deep learning
techniques has proven that the implementation of an
ensembled version of the approach as a collaborative fusion
of CNN, LSTM, Attention, and a CNN-LSTM hybrid model
is the key solution to proactively identifying and removing
these nefarious scams from the platform.

REFERENCES

[1] Sjarif, Nilam & Mohd Azmi, Nurulhuda & Chuprat, Suriayati & Sarkan,
Haslina & Yahya, Yazriwati & Sam, Suriani. (2019). SMS Spam Message
Detection using Term Frequency-Inverse Document Frequency and
Random Forest Algorithm. Procedia Computer Science. 161. 509-515

2024 14th
630 Authorized licensed International Conference on Cloud Computing, Data Science & Engineering (Confluence)
use limited to: Bahria University. Downloaded on November 11,2024 at 09:17:10 UTC from IEEE Xplore. Restrictions apply.

You might also like