Utilizing Ensemble Learning For Detecting Multi-Modal Fake News
Utilizing Ensemble Learning For Detecting Multi-Modal Fake News
ABSTRACT The spread of fake news has become a critical problem in recent years due extensive use of
social media platforms. False stories can go viral quickly, reaching millions of people before they can be
mocked, i.e., a false story claiming that a celebrity has died when he/she is still alive. Therefore, detecting
fake news is essential for maintaining the integrity of information and controlling misinformation, social
and political polarization, media ethics, and security threats. From this perspective, we propose an ensemble
learning-based detection of multi-modal fake news. First, it exploits a publicly available dataset Fakeddit
consisting of over 1 million samples of fake news. Next, it leverages Natural Language Processing (NLP)
techniques for preprocessing textual information of news. Then, it gauges the sentiment from the text of each
news. After that, it generates embeddings for text and images of the corresponding news by leveraging Visual
Bidirectional Encoder Representations from Transformers (V-BERT), respectively. Finally, it passes the
embeddings to the deep learning ensemble model for training and testing. The 10-fold evaluation technique
is used to check the performance of the proposed approach. The evaluation results are significant and
outperform the state-of-the-art approaches with the performance improvement of 12.57%, 9.70%, 18.15%,
12.58%, 0.10, and 3.07 in accuracy, precision, recall, F1-score, Matthews Correlation Coefficient (MCC),
and Odds Ratio (OR), respectively.
INDEX TERMS Ensemble learning, convolutional neural network, multi-modal fake news, classification,
boosted CNN, bagged CNN.
of news and circulating information [3]. The accessibility of research background. Section V summarizes the paper and
news and information on the Internet is very low-cost and suggests future work.
convenient. However, spreading fake news on these carriers is
straightforward and effortless [4]. Fake news can lead to false II. RELATED WORK
assumptions that drastically affect our society. Consequently, Although extensive research on fake news detection has been
it is critical to design an automated fake news detection performed [9], [10], [11], [12], [13], [14], [15], [16], [17],
system. [18], [19], [20], [21], [22], [23], most research is conducted
Many researchers are actively developing new and better on textual data or uni-modal features. However, two most
methods for identifying and combating the spread of misin- relevant researches [24], [25] proposed deep learning-based
formation. Some of the key research areas and trends in this solutions for detecting fake news. The proposed approach
field include deep learning approaches, e.g., Convolutional (ELD-FN) differs from baseline approaches as it does not
Neural Network (CNN); linguistic features, e.g., sentiment work for the multi-modal features but also considers the
analysis, topic modeling, and stylometric analysis; source- sentiments involved in the textual information of news.
based approaches, e.g., analyzing the domain name, social Most of the state-of-the-art fake news classification
media presence, or history of the news source, and ensemble approaches can be categorized as follows: 1) fake news
approaches, e.g., combining linguistic, source-based, and classification approaches for single-modality and 2) fake
deep learning models to create a more robust and accurate news classification approaches for multi-modality.
detection system. Although recent research has identified
the issues of the said problem and proposed different A. FAKE NEWS CLASSIFICATION APPROACHES FOR
solutions, e.g., pre-trained language models have shown their SINGLE-MODALITY
effectiveness in alleviating feature engineering efforts, such The fake news classification approaches for single-modality
as Bidirectional Encoder Representations from Transformers can be further divided into two categories based on the
(BERT) [5], OpenAI GPT [6], and Elmo [7], however; the text/image features.
problem requires significant performance improvement.
From this perspective, this paper proposes an ensemble 1) SINGLE-MODALITY BASED CLASSIFICATION
learning-based detection of multi-modal fake news (ELD- APPROACHES USING TEXTUAL FEATURES
FN). It first exploits a publicly available dataset Fakeddit, Textual features can be divided into generic and latent
a novel multi-modal dataset consisting of over 1 million sam- categories. Usually, traditional machine learning algorithms
ples from multiple categories of fake news. Second, it lever- utilize Generic textual features. These algorithms analyze
ages Natural Language Processing (NLP) techniques for text based on linguistic levels such as lexicon, syntax,
preprocessing textual information of news. Third, it gauges discourse, and semantics. Previous research has compiled
the sentiment from the text of each news. Fourth, it generates a detailed table summarizing these features [10]. However,
embeddings for text and images of the corresponding news Latent textual features consist of the embeddings extracted
by leveraging V-BERT [8], respectively. Finally, it passes from textual data of news at the word, sentence, or document
the embeddings to the deep learning ensemble model for level. Latent vectors are constructed from the textual news
training and testing. The 10-fold evaluation technique is used data. Furthermore, these latent vectors are used as input for
to check the performance of ELD-FN. The evaluation results classifiers, i.e., SVM.
are significant and outperform the state-of-the-art approaches Recurrent neural networks (RNNs) are potent in modeling
with the performance improvement of 12.57%, 9.70%, and analyzing sequential data. For example, Ma et al. used
18.15%, 12.58%, 0.10, and 3.07 in accuracy, precision, RNNs to capture relevant information over time by learning
recall, F1-score, Matthews Correlation Coefficient (MCC), hidden layer representations [11]. Meanwhile, Chen et al.
and Odds Ratio (OR), respectively. proposed a CNN-based approach for the classification [12].
The main contributions made in this paper are as follows. Moreover, a novel technique Attention-Residual Network
(ARC) is introduced to acquire long-range features. Ma et al.
• The proposed approach integrates news sentiment as
introduced a Generative Adversarial Network (GAN)-based
a crucial feature and employs ensemble learning to
model that employs a Generator network based on Gated
identify multi-modal fake news.
Recurrent Units (GRU) to generate contentious instances.
• It is evident from the evaluation results that ELD-FN
Furthermore, a Discriminator network based on RNNs is
is significant and outperforms the baseline approaches
designed to identify essential features [13].
with the performance improvement of 12.57%, 9.70%,
RNN-based models have proven very effective in classify-
18.15%, 12.58%, 0.10, and 3.07 in accuracy, precision,
ing fake news detection datasets. However, the RNN-based
recall, F1-score, MCC, and OR, respectively.
models prioritize the recent input sequence, and the essential
The organization of the rest of the paper is as follows. features may be located at the end of the sequence. Yu et
Section III describes the details of ELD-FN. Section IV al. proposed a CNN-based approach that resolves this issue.
describes the evaluation methods for ELD-FN, obtained The proposed technique does not prioritize recent input
results, and their threats to validity. Section II discusses the sequences. This approach applies feature extraction based
on the relationship of the essential features [14]. Vaibhav suggested multi-domain visual neural models to capture the
and Hovy utilize a graphical approach for classifying news inherent traits of fabricated news images more effectively.
articles [15]. For this purpose, they used Graph Neural These multi-domain models merged frequency and pixel
Networks, such as Graph Convolutional Networks (GCN) domain visual data to differentiate between genuine and
and Graph Attention Networks (GAT), to create graph fabricated news based on visual characteristics [34]. Poor
embeddings for fake news detection. quality is a common trait in fake news images. The poor
Wu et al. utilize multi-task learning techniques to classify quality feature and image semantics are visible in frequency
and detect fake news. Moreover, the stance classification and pixel domains. However, the quality feature is extracted
task optimizes shared layers concurrently, improving news by CNN model, and the semantics of the images are extracted
representations [16]. Cheng et al. utilized LSTM model by CNN-RNN model.
to classify the textual news data. They used a variational
autoencoder to extract essential textual features at the tweet- B. FAKE NEWS CLASSIFICATION APPROACHES FOR
level text. MULTI-MODALITY
Some researchers have assumed that complex and Word-based and Image-based information are both important
multi-dimensional news are not accessible initially. The in detecting fake news. As social networks often contain
accessibility of only text-based news depends on the popular- both types of information, combining them can improve per-
ity [17]. Qian et al. developed a text-based model that utilizes formance. This section discusses the different multi-modal
word/sentence level data from legitimate papers to produce approaches for fake news detection, categorized based on the
user feedback for early detection [18]. This addressed the different perspectives they adopt.
scarcity of user reviews as an auxiliary source of information.
For example, Qian et al. proposed an approach for generating
1) PROBLEMS IN MULTI-MODALITY
user feedback on the text. Such feedback was along with
word/sentence level information from real articles for the Several studies have explored using visual information to
classification process [18]. Giachanou et al. investigated the complement textual information in detecting fake news.
influence of emotional cues in the proposed model. They These studies typically use text-based and image-based
propose an LSTM model that integrates emotional signals encoders to extract textual and visual features, respectively.
extracted from claim texts to differentiate between true and Furthermore, these feature vectors construct an overall
false news [19]. feature vector for each news. For example, Wang et al.
proposed Event classification as an additional task to enhance
the generalizing ability of the model for event-invariant multi-
2) SINGLE-MODALITY BASED CLASSIFICATION modal features [32]. Other researchers, such as Singhal et al.,
APPROACHES USING IMAGE FEATURES use a combination of text-based and image-based features.
As multimedia becomes more prevalent in social networks, They utilize BERT and XLNet pre-trained models for
news now contains text and visual information such as images encoding text-based and image-based data, respectively [35].
and videos that convey rich meaning. However, textual However, these approaches are proven to be limited in
feature-based approaches face challenges in effectively effectively detecting multi-modal fake news because of their
capturing visual information because of the heterogeneity ability to capture complex cross-modal correlations. More
between text and image data. Consequently, many researchers advanced multi-modal techniques are needed to improve the
have proposed image-based approaches for detecting fake performance of fake news detection.
news.
Classical image-based models utilized basic fundamental 2) FLEXIBILITY IN MULTI-MODALITY
numerical features of images [20], [26], such as image Some studies have recognized that irrelevant images are a
count, popularity [27], and type to identify fake news. For common characteristic of multi-modal fake news and have
impaired images, complex forensics features were extracted. focused on measuring the consistency between the text and
Furthermore, post and user-based features are integrated to visual components in detection. One approach by Zhou and
identify fake news [28]. However, it was evident that basic Zafarani [36] used an image captioning model to generate
numerical features are inadequate to describe complex visual sentences from images and then measured the similarity
information of the news images. between those sentences and the original text. However,
Deep learning models such as CNNs have proven effective this approach was constrained by the discrepancies that
in capturing visual features in news images. Many researches existed between the training data of the image captioning
have shown that feature extraction from CNN models can be model and the real news corpus. Another approach by
used in visual recognition tasks to generate generic image Xue et al. projected the visual and textual features into a
representation [29]. shared feature space and computed the similarities between
Building on the success of CNNs, recent studies have resulting multi-modal features. However, they encountered
utilized pre-trained deep CNNs like VGG19 [30], [31] to difficulties capturing multi-modal inconsistencies because of
obtain generic visual representations [32], [33]. Researchers the semantic gap between the two types of features [37].
Ghorbanpour et al. [38] proposed the Fake-News-Revealer 3) Then, it computes the sentiment from the text of each
(FNR) method, which uses a Vision-transformer [39] and news.
BERT [5] to extract image and text features respectively. The 4) After that, it generates embeddings for text and images
model extracted textual and visual features separately and of the corresponding news by leveraging V-BERT,
determined their similarities by loss. respectively.
5) Finally, it passes the embeddings to the deep learning
3) IMPROVEMENT IN MULTI-MODALITY ensemble model for training and testing.
Several researchers have proposed different approaches for
fake news detection using multi-modal data. Jin et al. utilized B. PROBLEM DEFINITION
an RNN model and applied an attention mechanism to com- A news n from a set of multi-modal dataset of news N can be
bine information extracted from textual, visual, and social represented as follows:
context data [40]. Zhang et al. [41] used a multi-channel
CNN with an attention mechanism to combine multi-modal n = < t, i, s > (1)
information, while Song et al. [42] proposed the co-attention
where, t is the textual information of n, i is the image of n,
transformer to model the bidirectional enhancement between
and s is an assigned status to n whether n is fake or true.
images and text. Qian et al. developed a Hierarchical
The ELD-FN suggests the status of new news as either ture
Multi-modal Contextual Attention Network (HMCAN),
or false, where ture represents that the news is real and false
which was designed to collectively capture multi-modal
represents that the news is fake. Consequently, the automatic
context data and the hierarchical semantics of text [43].
classification of a new news n could be defined a mapping f :
Wu et al. introduced the Multi-modal Co-Attention Network
(MCAN) that extracts spatial-domain and frequency-domain f :n→c
features from the image and text, and fuses visual and
c ∈ {ture, false} , n∈N (2)
textual features using multiple co-attention layers [44].
Other researchers have also utilized Graph Convolutional where, c is a suggested status from a news status set (ture,
Networks (GCN) and entity-centric cross-modal interaction false).
to model the relationship between word-based and image-
based objects. Finally, Zhang et al. and Laura et al. proposed C. PREPROCESSING
a BERT-based multi-modal model to encode text-based and The news may contain inappropriate and unnecessary text,
image-based information. The model effectively captures the e.g., English stop-words. Such information is considered
interplay between text and images and employs contrastive an overhead for the machine learning classification algo-
learning to enhance multi-modal representations. [24], [45] rithms because of processing time and memory utilization.
integrated visual entities to enhance the comprehension of Therefore, preprocessing of news text is essential for the
high-level semantics in news images and to model the performance of ELD-FN to make it fast and memory efficient.
inconsistencies and mutual enhancements of multi-modal We perform the following preprocessing steps to clean the
entities [22]. text of news.
In summary, when performing multi-modal fake news
detection, there are three important inductive biases to con- 1) TOKENIZATION
sider when examining text-image correlations. Firstly, images
Text tokenization breaks down a piece of text into smaller
provide additional information to the text, highlighting the
units called tokens. Tokens are individual words, phrases,
need for multi-modal. Secondly, problems between text
or other meaningful text elements, which can be analyzed and
and images can serve as a potential signal for detecting
processed further.
fake news using multiple modalities. Finally, text-based and
image-based data can improve performance by identifying
2) SPECIAL CHARACTER REMOVAL
essential features.
The text of news may contain special characters, e.g.,
III. METHODOLOGY semicolon (;). This step removes the special characters from
A. OVERVIEW the list of tokens.
The overview of ELD-FN is depicted in Fig. 1. The following
3) STOP-WORD REMOVAL
are the main steps of ELD-FN.
English text contains meaningless words that are used to
1) First, the publicly available multi-modal dataset
make sentences meaningful, called stop-words. This step
(Fakeddit) is collected from Google Drive.1
removes stop-words from the working list.
2) Next, it leverages NLP techniques, e.g., tokenization,
stop-word removal, lowercase conversion, and lemma-
4) SPELL CORRECTION AND LOWERCASE CONVERSION
tization, for preprocessing textual information of news.
This step identifies and corrects the spelling mistakes from
1 https://fakeddit.netlify.app/, accessed on 15-01-2023. the working list of tokens of news.
where, BERT I is the relevant BERT-shared layer modeling Algorithm 1 Ensemble Model
for images, and X I is the input representation of images. 1: procedure Ensemble Model
2: Input: XtT +1 , αg ,
b
3) PRE-FEATURE EXTRACTION R bg g g R gb
h (σ, Wh , bh )· · · N g (σ, W
b b g ,b g )
The BERT-shared layer is strong enough for feature extrac- b N gb N gb
b b
tion. It includes a pre-feature extractor to enhance the 3: Initialize: ŷt+1 , h ← 2
ability of BERT to learn semantic characteristics. Pre- Rt
1 (σ,W1 ,b1 )
t t
E. RESULTS true for f1-score, and the factor (using different approaches)
1) RQ1: COMPARISON OF ELD-FN AGAINST BASELINE significantly differs in f1-score.
APPROACHES Moreover, we utilize two re-sampling methods, over-
Table 2 and Fig. 5 present the evaluation metrics for three sampling and under-sampling to tackle the class imbal-
different approaches (ELD-FN, FakeNED, MultiFND) based ance within the dataset. Over-sampling involves generating
on their accuracy, precision, recall, F1-score, MCC, and additional samples for the minority class through Ran-
OR. The results advised that the average values of these domOverSampler, while under-sampling entails removing
metrics for ELD-FN, FakeNED, and MultiFND are (88.83%, surplus records from the majority class in imbalanced
93.54%, 90.29%, 91.89%, 0.49, and 17.02), (89.25%, datasets using RandomUnderSampler. The findings reveal
91.12%, 87.54%, 89.29%, 0.45, and 15.78), and (78.91%, that employing under-sampling results in accuracy, precision,
85.27%, 76.42%, 80.60%, 0.39, and 13.95), respectively. recall, and F1-score values of 86.12%, 92.54%, 88.76%,
The f1-score distribution of cross-validation for ELD- and 90.61%, respectively. However, it’s important to note
FN, FakeNED, and MultiFND are presented in Fig. 6. that under-sampling diminishes the number of majority class
A beanplot is a visualization that displays a continuous samples, leading to a loss of information. Consequently,
variable’s distribution across different groups. The beanplot the performance of both majority and minority classes in
compares the f1-score distributions by plotting one bean the fine-tuned BERT model declines when under-sampling
for each approach. Across a bean, the width of the bean is applied. Likewise, utilizing the over-sampling technique
represents the density of the data, with wider beans indicating yields accuracy, precision, recall, and F1-score values of
higher density. 90.26%, 94.37%, 91.88%, and 93.11%, respectively. This
The following observations are made from Table 2, Fig. 5, enhancement is attributed to BERT being exposed to a
and Fig. 6. larger dataset, enabling it to learn meaningful patterns more
effectively.
• ELD-FN has the accuracy (88.83%) and highest pre- The preceding analysis concluded that ELD-FN outper-
cision (93.54%), indicating that it has the highest forms the baseline approaches in detecting fake news.
percentage of correctly classified instances and true
positive instances.
• ELD-FN has the highest recall (90.29%) and F1-score 2) RQ2: INFLUENCE OF SENTIMENT ON ELD-FN
(91.89%), indicating that it has the highest ability The evaluation results of ELD-FN with and without sentiment
to correctly identify positive instances and achieve a analysis are presented in Table 3 and Fig. 8. The evaluation
balance between precision and recall. results of ELD-FN for different settings of sentiment
• ELD-FN also has the highest MCC (0.49) and OR (enable/disable) based on their accuracy, precision, recall, F1-
(17.02), indicating a better correlation between pre- score, MCC, and OR are (88.83%, 93.54%, 90.29%, 91.89%,
dicted and actual classifications and higher odds of event 0.49, and 17.02) and (88.12%, 90.38%, 89.98%, 90.17%,
occurrence than FakeNED and MultiFND. The average 0.49, and 17.02), respectively.
results of MCC (0.49 > 0.45 > 0.39) > 0 and OR From Table 3 and Fig. 8, it is observed that Disabling sen-
(17.02 > 15.78 > 13.95) > 1 are true for ELD-FN and timent (i.e., textual features only) brings out the significant
confirm its effectiveness. difference in precision from 93.54% to 90.38% and f1-score
• The minimum f1-score of ELD-FN is higher than the from 91.89% to 90.17%. However, MCC and OR remain the
maximum f1-scores of FakeNED and MultiFND (shown same.
in Fig. 6). Table 5 represents the relationship between sentiment and
news. It presents that 65.84% of negative news are real,
To validate the significant difference in the means of whereas only 34.16% of the positive news are real. However,
performance (f1-score) for all iterations of ELD-FN, Fak- 73.71% of negative news are fake, whereas only 26.29%
eNED, and MultiNED, we perform a single-factor Analysis of the positive news are fake. It means the possibility of
of Variance (ANOVA). ANOVA is a statistical method used spreading fake news is 180.37% = (73.71% - 26.29%) /
to test whether there is a significant difference in the means of 26.29%, if the news is negative. For example, if a fake news
three or more independent groups or samples. It is conducted article portrays a political figure negatively, it can contribute
on Excell with its default settings and presented in Fig. 7. to a negative sentiment towards that figure among the public
It suggests that F > Fcric and p-value < (α = 0.05) are and will propagate quickly.
and (86.51%, 90.22%, 86.19%, 88.21%, 0.48, and 16.92), TABLE 5. Relation between sentiment and news.
respectively.
The following observations are made from Table 5 and
Fig. 10.
• ELD-FN outperforms CNN and LSTM. The perfor-
assigned labels by Nakamura et al. [47] are correct. However,
mance enhancement of ELD-FN upon CNN in accuracy,
incorrect labeling of data may cause the productivity
precision, recall, f1-score, MCC, and OR is 2.42%,
of ELD-FN.
1.06%, 5.22%, 3.18%, 0.01, and 0.05, respectively.
The choice of assessment metrics of ELD-FN is another
However, the performance enhancement of ELD-FN
threat to construct validity. The chosen metrics for detecting
upon LSTM in accuracy, precision, recall, f1-score,
news are the most accepted in the literature for the
MCC, and OR is 2.68%, 3.68%, 4.76%, 4.17%, 0.01,
classification task.
and 0.10, respectively.
The choice of the sentiment analysis repository is the first
• ELD-FN performs better than LSTM because LSTM
threat to internal validity. The chosen repository III-E has
requires short text and performs sequential processing,
been public and has good results in computing sentiment.
which is unnecessary in our case. In contrast, CNN is
Exploiting other repositories may cause the productivity of
proven efficient for long text and works better to extract
ELD-FN.
local invariant features.
ELD-FN, FakeNED, and MultiFND coding is the second
The preceding analysis concluded that ELD-FN outper- threat to internal validity. The coding and the produced
forms other classifiers in detecting fake news. results of ELD-FN, FakeNED, and MultiFND are verified to
mitigate the threat. However, unknown errors may cause the
F. THREATS TO VALIDITY productivity of ELD-FN.
The probability of incorrect labeling of news is the first The hyper-parameters setting of ELD-FN is the third threat
threat to construct validity. This research assumes that the to internal validity. The hyper-parameters setting for ELD-FN
is mentioned in Section IV-E4. The change in settings may lead to harm or loss of life. Therefore, the ability to
cause the productivity of ELD-FN. automatically identify and flag false information can help
mitigate the threats of fake news. From this perspective, this
V. CONCLUSION AND FUTURE WORK paper proposes an ensemble deep learning-based detection
Automatic fake news detection is crucial to avoid spreading of fake news. The proposed approach leverages NLP
false information that can have serious consequences, ranging techniques for preprocessing textual information of news,
from reputational damage to social and political unrest. computes the sentiment from the text of each news, generates
In some cases, fake news can even incite violence and embeddings for text and images of the corresponding news
by leveraging V-BERT, and passes the embeddings to the [18] F. Qian, C. Gong, K. Sharma, and Y. Liu, ‘‘Neural user response generator:
deep learning ensemble model for training and testing. The Fake news detection with collective user intelligence,’’ in Proc. 27th Int.
Joint Conf. Artif. Intell., Jul. 2018, pp. 3834–3840.
evaluation results significantly outperform the state-of-the- [19] A. Giachanou, P. Rosso, and F. Crestani, ‘‘Leveraging emotional signals for
art approaches in identifying fake news. credibility detection,’’ in Proc. 42nd Int. ACM SIGIR Conf. Res. Develop.
In future, we would like to investigate the need to adapt Inf. Retr., Jul. 2019, pp. 877–880.
[20] K. Wu, S. Yang, and K. Q. Zhu, ‘‘False rumors detection on Sina Weibo
detection algorithms to new types of media. Fake news
by propagation structures,’’ in Proc. IEEE 31st Int. Conf. Data Eng.,
is not limited to text-based content, and algorithms must Apr. 2015, pp. 651–662.
be able to detect false information in images, videos, and [21] P. Li, X. Sun, H. Yu, Y. Tian, F. Yao, and G. Xu, ‘‘Entity-oriented multi-
audio as well. Moreover, we are interested in improving modal alignment and fusion network for fake news detection,’’ IEEE Trans.
Multimedia, vol. 24, pp. 3455–3468, 2022.
the interpretability of detection algorithms. Current methods [22] P. Qi, J. Cao, X. Li, H. Liu, Q. Sheng, X. Mi, Q. He, Y. Lv, C. Guo,
often rely on opaque deep learning models, making it difficult and Y. Yu, ‘‘Improving fake news detection by using an entity-enhanced
to understand how decisions are being made. Future work framework to fuse diverse multimodal clues,’’ in Proc. 29th ACM Int. Conf.
Multimedia, Oct. 2021, pp. 1212–1220.
could focus on developing more transparent models or tools
[23] C. Song, N. Ning, Y. Zhang, and B. Wu, ‘‘A multimodal fake news
that help users understand how algorithms arrive at their detection model based on crossmodal attention residual and multichannel
conclusions. convolutional neural networks,’’ Inf. Process. Manage., vol. 58, no. 1,
Jan. 2021, Art. no. 102437.
[24] L. D. Sciucca, M. Mameli, E. Balloni, L. Rossi, E. Frontoni, P. Zingaretti,
REFERENCES and M. Paolanti, ‘‘FakeNED: A deep learning based-system for fake news
[1] S. De Sarkar, F. Yang, and A. Mukherjee, ‘‘Attending sentences to detect detection from social media,’’ in Proc. Int. Conf. Image Anal. Process.,
satirical fake news,’’ in Proc. 27th Int. Conf. Comput. Linguistics, 2018, 2022, pp. 303–313.
pp. 3371–3380. [25] I. Segura-Bedmar and S. Alonso-Bartolome, ‘‘Multimodal fake news
[2] H. Allcott and M. Gentzkow, ‘‘Social media and fake news in the 2016 detection,’’ Information, vol. 13, no. 6, p. 284, Jun. 2022. [Online].
election,’’ J. Econ. Perspect., vol. 31, no. 2, pp. 211–236, May 2017. Available: https://www.mdpi.com/2078-2489/13/6/284
[3] A. Moon. (2017). Two-Thirds of American Adults Get News From Social [26] F. Yang, Y. Liu, X. Yu, and M. Yang, ‘‘Automatic detection of rumor on
Media: Survey. [Online]. Available: https://uk.reuters.com/article/us- Sina Weibo,’’ in Proc. ACM SIGKDD Workshop Mining Data Semantics,
usa-internet-socialmedia/two-thirds-of-american-adults-get-news-from- Aug. 2012, pp. 1–7.
social-media-survey-idUKKCN1BJ2A8 [27] Z. Jin, J. Cao, Y. Zhang, J. Zhou, and Q. Tian, ‘‘Novel visual and
[4] K. Shu, A. Sliva, S. Wang, J. Tang, and H. Liu, ‘‘Fake news detection statistical image features for microblogs news verification,’’ IEEE Trans.
on social media: A data mining perspective,’’ ACM SIGKDD Explor. Multimedia, vol. 19, no. 3, pp. 598–608, Mar. 2017.
Newslett., vol. 19, no. 1, pp. 22–36, 2017. [28] C. Boididou, S. Papadopoulos, D.-T. Dang-Nguyen, G. Boato, and
[5] J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, ‘‘BERT: Pre-training Y. Kompatsiaris, ‘‘The certh-unitn participation@ verifying multimedia
of deep bidirectional transformers for language understanding,’’ 2018, use 2015,’’ MediaEval, vol. 1, p. 2, May 2015.
arXiv:1810.04805. [29] B. Emek Soylu, M. S. Guzel, G. E. Bostanci, F. Ekinci, T. Asuroglu,
[6] A. Radford, K. Narasimhan, T. Salimans, and I. Sutskever, ‘‘Improving and K. Acici, ‘‘Deep-learning-based approaches for semantic segmentation
language understanding by generative pre-training,’’ Tech. Rep., 2018. of natural scene images: A review,’’ Electronics, vol. 12, no. 12,
[7] J. Sarzynska-Wawer, A. Wawer, A. Pawlak, J. Szymanowska, I. Stefaniak, p. 2730, Jun. 2023. [Online]. Available: https://www.mdpi.com/2079-
M. Jarkiewicz, and L. Okruszek, ‘‘Detecting formal thought disorder by 9292/12/12/2730
deep contextualized word representations,’’ Psychiatry Res., vol. 304,
[30] Q. S. Hamad, H. Samma, and S. A. Suandi, ‘‘Feature selection of pre-
Oct. 2021, Art. no. 114135.
trained shallow CNN using the QLESCA optimizer: COVID-19 detection
[8] L. Harold Li, M. Yatskar, D. Yin, C.-J. Hsieh, and K.-W. Chang, as a case study,’’ Appl. Intell., vol. 53, no. 15, pp. 18630–18652, Feb. 2023,
‘‘VisualBERT: A simple and performant baseline for vision and language,’’ doi: 10.1007/s10489-022-04446-8.
2019, arXiv:1908.03557.
[31] S. R. Shah, S. Qadri, H. Bibi, S. M. W. Shah, M. I. Sharif, and F. Marinello,
[9] S. Afroz, M. Brennan, and R. Greenstadt, ‘‘Detecting hoaxes, frauds, and
‘‘Comparing inception v3, VGG 16, VGG 19, CNN, and ResNet 50:
deception in writing style online,’’ in Proc. IEEE Symp. Secur. Privacy,
A case study on early detection of a Rice disease,’’ Agronomy,
May 2012, pp. 461–475.
vol. 13, no. 6, p. 1633, Jun. 2023. [Online]. Available:
[10] X. Zhou, J. Wu, and R. Zafarani, ‘‘SAFE: Similarity-aware multi-modal
https://www.mdpi.com/2073-4395/13/6/1633
fake news detection,’’ in Proc. Advances in Knowledge Discovery and Data
[32] Y. Wang, F. Ma, Z. Jin, Y. Yuan, G. Xun, K. Jha, L. Su, and J. Gao, ‘‘EANN:
Mining. Cham, Switzerland: Springer, 2020, pp. 354–367.
Event adversarial neural networks for multi-modal fake news detection,’’
[11] J. Ma, W. Gao, P. Mitra, S. Kwon, B. J. Jansen, K.-F. Wong, and M. Cha,
in Proc. 24th ACM SIGKDD Int. Conf. Knowl. Discovery Data Mining,
‘‘Detecting rumors from microblogs with recurrent neural networks,’’ in
2018, pp. 849–857.
Proc. Int. Joint Conf. Artif. Intell. (IJCAI), 2016, pp. 3818–3824.
[12] Y. Chen, J. Sui, L. Hu, and W. Gong, ‘‘Attention-residual network with [33] D. Khattar, J. S. Goud, M. Gupta, and V. Varma, ‘‘MVAE: Multimodal
CNN for rumor detection,’’ in Proc. 28th ACM Int. Conf. Inf. Knowl. variational autoencoder for fake news detection,’’ in Proc. World Wide Web
Manage., Nov. 2019, pp. 1121–1130. Conf., May 2019, pp. 2915–2921.
[13] J. Ma, W. Gao, and K.-F. Wong, ‘‘Detect rumors on Twitter by promoting [34] K. Simonyan and A. Zisserman, ‘‘Very deep convolutional networks for
information campaigns with generative adversarial learning,’’ in Proc. large-scale image recognition,’’ 2014, arXiv:1409.1556.
World Wide Web Conf., May 2019, pp. 3049–3055. [35] S. Singhal, R. R. Shah, T. Chakraborty, P. Kumaraguru, and S. Satoh,
[14] F. Yu, Q. Liu, S. Wu, L. Wang, and T. Tan, ‘‘A convolutional approach for ‘‘SpotFake: A multi-modal framework for fake news detection,’’ in Proc.
misinformation identification,’’ in Proc. 26th Int. Joint Conf. Artif. Intell., IEEE 5th Int. Conf. Multimedia Big Data (BigMM), Sep. 2019, pp. 39–47.
Aug. 2017, pp. 3901–3907. [36] X. Zhou and R. Zafarani, ‘‘A survey of fake news: Fundamental theories,
[15] V. Vaibhav, R. M. Annasamy, and E. Hovy, ‘‘Do sentence interactions detection methods, and opportunities,’’ ACM Comput. Surv., vol. 53, no. 5,
matter? Leveraging sentence level representations for fake news classifi- pp. 1–40, Sep. 2021.
cation,’’ 2019, arXiv:1910.12203. [37] J. Xue, Y. Wang, Y. Tian, Y. Li, L. Shi, and L. Wei, ‘‘Detecting fake news
[16] L. Wu, Y. Rao, H. Jin, A. Nazir, and L. Sun, ‘‘Different absorption from by exploring the consistency of multimodal data,’’ Inf. Process. Manage.,
the same sharing: Sifted multi-task learning for fake news detection,’’ 2019, vol. 58, no. 5, Sep. 2021, Art. no. 102610.
arXiv:1909.01720. [38] F. Ghorbanpour, M. Ramezani, M. A. Fazli, and H. R. Rabiee, ‘‘FNR:
[17] M. Cheng, S. Nazarian, and P. Bogdan, ‘‘VRoC: Variational autoencoder- A similarity and transformer-based approach to detect multi-modal fake
aided multi-task rumor classifier based on text,’’ in Proc. Web Conf., 2020, news in social media,’’ Social Netw. Anal. Mining, vol. 13, no. 1, pp. 1–15,
pp. 2892–2898. Mar. 2023.
[39] A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, MUHAMMAD FAHEEM (Member, IEEE)
T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly, J. Uszko- received the B.Sc. degree in computer engineering
reit, and N. Houlsby, ‘‘An image is worth 16×16 words: Transformers for from the Department of Computer Engineering,
image recognition at scale,’’ 2020, arXiv:2010.11929. University College of Engineering and Tech-
[40] Z. Jin, J. Cao, H. Guo, Y. Zhang, and J. Luo, ‘‘Multimodal fusion with nology, Bahauddin Zakariya University, Multan,
recurrent neural networks for rumor detection on microblogs,’’ in Proc. Pakistan, in 2010, the M.S. degree in computer
25th ACM Int. Conf. Multimedia, Oct. 2017, pp. 795–816. science from the Faculty of Computer Science
[41] H. Zhang, Q. Fang, S. Qian, and C. Xu, ‘‘Multi-modal knowledge-aware and Information Systems, Universiti Teknologi
event memory network for social media rumor detection,’’ in Proc. 27th
Malaysia (UTM), Johor Bahru, Malaysia, in 2012,
ACM Int. Conf. Multimedia, Oct. 2019, pp. 1942–1951.
and the Ph.D. degree in computer science from
[42] C. Song, C. Yang, H. Chen, C. Tu, Z. Liu, and M. Sun, ‘‘CED: Credible
early detection of social media rumors,’’ IEEE Trans. Knowl. Data Eng., the Faculty of Engineering, UTM, in 2021. From 2012 to 2014, he was
vol. 33, no. 8, pp. 3035–3047, Aug. 2021. a Lecturer with the COMSATS Institute of Information and Technology,
[43] S. Qian, J. Wang, J. Hu, Q. Fang, and C. Xu, ‘‘Hierarchical multi-modal Pakistan. From 2014 to 2022, he was also an Assistant Professor with the
contextual attention network for fake news detection,’’ in Proc. 44th Int. Department of Computer Engineering, Abdullah Gul University, Turkey.
ACM SIGIR Conf. Res. Develop. Inf. Retr., Jul. 2021, pp. 153–162. He is currently a Researcher with the School of Computing (Innovations
[44] Y. Wu, P. Zhan, Y. Zhang, L. Wang, and Z. Xu, ‘‘Multimodal fusion with and Technology), University of Vaasa, Vaasa, Finland. He has authored
co-attention networks for fake news detection,’’ in Proc. IJCNLP, 2021, several papers in refereed journals and conferences. His research interests
pp. 2560–2569. include cybersecurity, blockchain, artificial intelligence, smart grids, and
[45] W. Zhang, L. Gui, and Y. He, ‘‘Supervised contrastive learning for smart cities. He served as a reviewer for numerous journals in IEEE, Elsevier,
multimodal unreliable news detection in COVID-19 pandemic,’’ in Proc. Springer, Wiley, Hindawi, and MDPI.
30th ACM Int. Conf. Inf. Knowl. Manage., Oct. 2021, pp. 3637–3641.
[46] T. M. Khoshgoftaar, J. Van Hulse, and A. Napolitano, ‘‘Comparing
boosting and bagging techniques with noisy and imbalanced data,’’ IEEE
Trans. Syst., Man, Cybern., A, Syst. Hum., vol. 41, no. 3, pp. 552–568,
May 2011. WAHEED YOUSUF RAMAY received the Ph.D.
[47] K. Nakamura, S. Levy, and W. Y. Wang, ‘‘r/Fakeddit: A new multimodal
degree from the University of Science and Tech-
benchmark dataset for fine-grained fake news detection,’’ in Proc. Int.
nology Beijing (USTB) China. He is currently an
Conf. Lang. Resour. Eval., 2020, pp. 1–9.
Assistant Professor with Air University. His aca-
[48] M. Tausif, S. Dilshad, Q. Umer, M. W. Iqbal, Z. Latif, C. Lee, and
R. N. Bashir, ‘‘Ensemble learning-based estimation of reference demic and clinical focus is the use of algorithms
evapotranspiration (ETO),’’ Internet Things, vol. 24, Feb. 2023, (deep learning, machine learning, and big data
Art. no. 100973. [Online]. Available: https://www.sciencedirect.com/ analysis), advanced text analysis techniques, and
science/article/pii/S2542660523002962 sentiment analysis.
MUHAMMAD LUQMAN received the bachelor’s MAJID BASHIR AHMAD received the master’s
degree in computer science from the University degree in computer science from COMSATS Uni-
of Gujrat, Pakistan, in 2017, and the master’s versity Islamabad, Pakistan, in 2014, and the M.S.
degree in computer science from Northwestern degree in computer science from The University
Polytechnical University, China. He is currently a of Lahore, Pakistan, in 2019. He is currently
young Scholar in the field of computer science. a Research Scholar in the field of computer
His research interests include wide spectrum, science. His research interests include artificial
primarily focusing on cutting-edge fields, such intelligence, machine learning, and data mining.
as artificial intelligence, deep learning, and data
mining.