0% found this document useful (0 votes)
2 views

idea5

This study explores sentiment analysis in low-resource Roman Urdu using machine learning-based ensemble approaches. It compares various ensemble techniques on UCL and IMDB datasets, highlighting stacking as the most effective method, achieving high accuracy and F1-scores. The research addresses the challenges of linguistic diversity and resource limitations in Roman Urdu sentiment classification, proposing a robust methodology that includes feature selection and text standardization.

Uploaded by

Amala RK
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

idea5

This study explores sentiment analysis in low-resource Roman Urdu using machine learning-based ensemble approaches. It compares various ensemble techniques on UCL and IMDB datasets, highlighting stacking as the most effective method, achieving high accuracy and F1-scores. The research addresses the challenges of linguistic diversity and resource limitations in Roman Urdu sentiment classification, proposing a robust methodology that includes feature selection and text standardization.

Uploaded by

Amala RK
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 13

Received 20 May 2024; revised 2 September 2024; accepted 5 October 2024.

Date of publication 8 October 2024;


date of current version 21 October 2024. The review of this article was arranged by Associate Editor Nandha Kumar Thulasiraman.
Digital Object Identifier 10.1109/OJCS.2024.3476378

Polarity Classification of Low Resource Roman


Urdu and Movie Reviews Sentiments Using
Machine Learning-Based Ensemble
Approaches
MUHAMMAD EHTISHAM HASSAN 1,IFFAT MAAB 2 , MASROOR HUSSAIN 1, USMAN HABIB 3,

AND YUTAKA MATSUO 4


1
Department of Data Science, Computer Engineering, Ghulam Ishaq Khan Institute, Swabi 23640, Pakistan
2
Digital Content and Media Sciences Research Division, National Institute of Informatics, Tokyo 101-0003, Japan
3
Software Engineering Department, FAST School of Computing, National University of Computer & Emerging Sciences, Islamabad 44000, Pakistan
4
The University of Tokyo, Tokyo 113-8654, Japan
CORRESPONDING AUTHOR: IFFAT MAAB (e-mail: [email protected]).

ABSTRACT The complex linguistic characteristics and limited resources present sentiment analysis in
Roman Urdu as a unique challenge, necessitating the development of accurate NLP models. In this study,
we investigate the performance of prominent ensemble methods on two diverse datasets of UCL and IMDB
movie reviews with Roman Urdu and English dialects, respectively. We perform a comparative examination
to assess the effectiveness of ensemble techniques including stacking, bagging, random subspace, and
boosting, optimized through grid search. The ensemble techniques employ four base learners (Support
Vector Machine, Random Forest, Logistic Regression, and Naive Bayes) for sentiment classification. The
experiment analysis focuses on different N-gram feature sets (unigrams, bigrams, and trigrams), Chi-square
feature selection, and text representation schemes (Bag of Words and TF-IDF). Our empirical findings
underscore the superiority of stacking across both datasets, achieving high accuracies and F1-scores: 80.30%
and 81.76% on the UCL dataset, and 90.92% and 91.12% on the IMDB datasets, respectively. The proposed
approach has significant performance compared to baseline approaches on the relevant tasks and improves
the accuracy up to 7% on the UCL dataset.

INDEX TERMS Chi-square, ensemble learning, grid search optimization, low resource language, machine
learning, natural language processing (NLP), n-gram, sentiment analysis.

I. INTRODUCTION domains including politics, news analytics, and marketing,


Profound rise of social media platforms, ease of connectivity, while also posing complex challenges in artificial intelli-
and unprecedented access to information, users actively en- gence [4]. SA involves contextual mining of text and focuses
gage and express their concerns and feedback about products, on textual subjectivity, such as opinions, emotions, and atti-
services, or support systems on media platforms, creating tudes in the source material [5]. By employing a combination
new opportunities and challenges in the digital age [1], [2]. of machine learning classifiers (ML) and NLP techniques, SA
Examples of such platforms include the World Wide Web, seeks to analyze the polarity of sentiments expressed by users
social media sites like Facebook and Twitter, as well as blogs, in written text [6].
forums, and entertainment websites. The opinions and reviews Researchers have adopted various feature engineering
shared on these platforms serve as a key resource for discern- techniques and ML algorithms to address diverse chal-
ing customer sentiments [3]. lenges and enhance the accuracy of NLP tasks [7]. The
Sentiment analysis (SA) can be considered a key form of adopted techniques include bag of words (BOW), term fre-
non-topical text analysis, with several important application quency and inverse document frequency (TF-IDF), word
© 2024 The Authors. This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see
VOLUME 5, 2024 https://creativecommons.org/licenses/by-nc-nd/4.0/ 599
HASSAN ET AL.: POLARITY CLASSIFICATION OF LOW RESOURCE ROMAN URDU AND MOVIE REVIEWS SENTIMENTS

embeddings, sentiment lexicons, neural networks, and ML presenting practical applications across various domains. SA
classifiers comprising Support Vector Machines (SVM), and seeks to identify and extract sentiment-related information
Naive Bayes (NB) [8], [9]. Deep neural networks (DNN), from data sources, essential for knowledge collection and
including Recurrent neural networks (RNN) and Long Short decision-making [23]. Numerous studies on SA have been
Term Memory Networks (LSTM), have been employed for conducted that utilize supervised, unsupervised, and semi-
various complex text classification tasks such as SA [10], supervised techniques, however, there exists limited work on
[11], with LSTM addressing issues like the vanishing and ex- Roman Urdu SA. Table 1 outlines the previous work on Ro-
ploding gradient problems in RNN [12]. However, employing man Urdu SA. This section summarizes the past research on
DNN architecture necessitates extensive hyperparameter tun- Roman Urdu SA and discusses effective approaches adopted
ing and large training parameters, which become challenging, by practitioners across various domains for sentiment analy-
particularly with large datasets. sis.
Extensive research on SA focuses on European languages Mehmood et al. [13] utilized machine learning (ML) ap-
and English, benefiting from their rich linguistic resources. proach (NB, SVM, LR, KNN, and DT) with N-gram features
The digital platforms support English and Roman scripts pre- for Roman Urdu SA. They created a Roman Urdu cor-
dominantly [14], [20], encouraging users to use their native pus comprising 779 reviews spanning five distinct domains:
language with Roman scripts for reviews like Urdu, Hindi, movie, politics, drama, mobile reviews, and miscellaneous.
Arabic, etc [21]. SA in languages like Roman Urdu faces lim- The study findings show that NB and LR classifiers outper-
itations due to resource constraints, including the absence of form other ML classifiers employed in the study. In another
standardized corpora. Urdu, the native and official language of study on Roman Urdu SA [14], SVM was applied to re-
Pakistan, is spoken in many South Asian states due to its his- views sourced from an e-commerce website (Daraz.pk). They
torical and cultural roots [22]. The adoption of Roman Urdu, used a vector space model and a TF-IDF weighting scheme
which transcribes Urdu using the English alphabet, on social to represent the reviews. Mehmood et al. [15] proposed a
media platforms serves users of diverse age groups, enhancing deep learning (DL) model to analyze emotions and attitudes
communication comfort. Consequently, the development of expressed in Roman Urdu, utilizing a dataset comprising
robust SA models for low-resource Roman Urdu is vital for 10,021 sentences related to various genres. They established
comprehending user emotions and sentiments. a manually annotated benchmark corpus for SA in Roman
In this research, we present a novel approach to sentiment Urdu and applied rule-based, N-gram, and Recurrent Con-
analysis in low-resource Roman Urdu, leveraging machine volutional Neural Network (RCNN) to classify sentiments.
learning and ensemble learning techniques. We investigate en- RCNN model acquires high performance compared to rule-
semble approaches performance optimized using grid search based and N-gram, achieving an accuracy of 65.2% for binary
on both UCL Roman Urdu and IMDB datasets. Our study classification. Another study by Chandio et al. [16] employed
focuses on improving sentiment classification performance, fine-tuned SVM utilizing Roman Urdu stemmer to classify
recognizing the complexity of linguistic characteristics and sentiments. The study adopted bag of words (BOW) and
limited resources in Roman Urdu. The main contributions of TF-IDF schemes and introduced the largest Roman Urdu e-
our study are as follows: commerce dataset (RUECD).
r Developed a robust approach to SA in low-resource Ro- In [17], three neural embeddings (Word2vec, GloVe, and
man Urdu and English dialects, leveraging optimized FastText) were introduced for Roman Urdu to enhance NLP
ensemble classifiers using grid search to enhance perfor- tasks. To establish a performance baseline for Roman Urdu
mance. SA, the study employed ML classifiers (SVM, NB, and LR),
r Devise a standardization method for word variations in DL models (RCNN and RNN), and proposed a multi-channel
Roman Urdu text. hybrid approach. The hybrid approach combining CNN and
r Applied Chi-square feature selection to identify and se- RCNN with pre-trained embeddings outperforms ML and
lect the statistically significant features in the N-gram DL methods on the Roman Urdu corpus with 3,241 senti-
sets i.e., unigrams, bigrams, trigrams, and their combi- ments categorized into positive, negative, and neutral classes.
nation. A deep learning approach CNN-LSTM was adopted by
The subsequent sections of the article are organized as fol- Khan et al. [18] with diverse word embeddings for SA in both
lows: Section II provides an overview of previous research Roman Urdu and English dialects. They employed machine
in the subject domain. Section III discusses the proposed learning classifiers alongside the proposed DL architecture.
methodology for sentiment polarity classification. The exper- They evaluated the performance of the DL and ML classifiers
imental setup adopted in this study is detailed in Sections IV, on four datasets, where SVM and Word2Vec continuous bag
and V discusses the results acquired from the proposed ap- of words (CBOW) demonstrated improved performance on
proach. Finally, Section VI concludes the work, highlighting Roman Urdu SA. Another study by Chandio et al. [19],
future directions. applies deep recurrent architecture RU-BiLSTM for SA in
Roman Urdu, combining bidirectional LSTM with word em-
II. LITERATURE REVIEW beddings and attention mechanism. The proposed DL model
Sentiment analysis (SA) has gained importance in re- outperforms the baseline models on two Roman Urdu datasets
cent times due to the proliferation of user-generated text, i.e., RUSA-19 and RUECD.
600 VOLUME 5, 2024
TABLE 1. Prior Research Work on Roman Urdu Sentiment Analysis

Xie et al. [24] proposed a maximum entropy model to and ensemble techniques to effectively classify sentiments in
extract emotion words from Wikipedia and corpus using Roman Urdu text.
probabilistic latent semantic analysis. A fuzzy logic-based
technique for exhibiting the polarity acquired through train-
III. METHODOLOGY
ing sets or a training set was presented by Dragoni and
In this study, we investigate the performance of machine
Petrucci [25]. The approach they adopted makes use of poten-
learning and ensemble classifiers for binary classification of
tial conceptual domain overlaps to develop a general model
sentiments in both Roman Urdu and English dialects, utilizing
that can determine the polarity of texts into arbitrary domains.
two distinct datasets i.e., UCL and IMDB movie reviews.
A new approach for employing machine learning methods on
Sample sentences from the UCL Roman Urdu dataset are pre-
the movie reviews dataset was proposed by Pang and Lee [26],
sented in Table 2. We employed five ensemble approaches in
in which text classification algorithms were employed on
the experiment analysis, namely bagging, random subspace,
the subjective parts of the documents using Minimum cuts
stacking, boosting and majority voting to enhance classifi-
formulation to determine the polarity of sentiments. The ap-
cation performance by combining predictions from multiple
proach adopted, in the initial step subdivided the objective
base learners. The base learners utilized in our study include
and subjective words belonging to the documents and al-
SVM, Random Forest (RF), LR, and NB, chosen for their
lowed the remaining words for the next step. To extract the
competitive performance on various NLP tasks [17], [29]. Our
results, NB and SVM classifiers were applied in the proposed
experimental setup comprises the exploration of various word
approach.
level N-gram feature sets (unigrams, bigrams, trigrams, and
In article [27], the authors reported designing and building a
unigram + bigram), along with Chi-square feature selection
system for summarising movie reviews and ratings for mobile
to identify significant features, and two text representation
platforms. The results of applying sentiment classification to
schemes, BOW and TF-IDF. Hyperparameter optimization is
movie reviews were used to determine the rating in the pro-
conducted through grid search with 10-fold cross-validation
posed approach. Recognizing the significance of identifying
to ensure robust model performance. The entire process of
product features for feature-based summarization, the authors
the proposed approach can be completed in various steps as
proposed employing Latent Semantic Analysis (LSA) for
depicted in Fig. 1. These steps are elaborated upon in de-
identifying these features. Another study by Khan et al. [28],
tail in the subsequent subsections, providing a comprehensive
employed rule-based, ML-based, and DL-based approaches
overview of the experimental methodology.
for Urdu SA on a multi-class dataset to establish baseline
results. They manually annotated the Urdu dataset compris-
ing 9,312 reviews into positive, negative, and neutral classes. A. PREPROCESSING MODULE
Text representation schemes of N-gram, FastText, and BERT The following preprocessing steps are incorporated in this
word embeddings were adopted in the study. The proposed study:
fine-tuned multilingual BERT outperformed other baseline r Removal of sentences with complete English language in
classifiers used in the research, achieving an F1-score of UCL dataset.
81.49%. r Conversion of text to lower case.
r Removal of digits, special characters, and punctuations.
r Eliminating ASCII (American Standard Code for Infor-
A. RESEARCH GAP mation Interchange) control characters and HTML (Hy-
Prior research focusing on resource-deprived Roman Urdu pertext Markup Language) tags.
language shows a notable absence of extensive application r Stop-words Removal
of ensemble approaches and feature selection for sentiment – Common words occurring in the text, both Roman
polarity classification. This study seeks to address the gap by Urdu and English such as “the”, “is”, “in”, “for”, “to”,
presenting a broader approach that employs machine learning “at” etc., which are articles and prepositions that add

VOLUME 5, 2024 601


HASSAN ET AL.: POLARITY CLASSIFICATION OF LOW RESOURCE ROMAN URDU AND MOVIE REVIEWS SENTIMENTS

TABLE 2. Sample Sentences Roman Urdu UCL Corpus

FIGURE 1. Proposed framework for sentiment polarity classification.

no useful meaning to the sentiments were removed primarily groups Roman Urdu words with different varia-
from the text. Additionally, we compiled a separate tions in spellings conveying similar meanings. Prioritizing
list of stop words specific to Roman Urdu comprising word variations as the primary criterion, we also considered
terms like “aur”, “ap”, “hun”, and “hai”. phonetic similarity (e.g., “acha” and “accha”, both meaning
“good” in English) and contextual relevance (e.g., “zabardast”
and “bekaar”), ensuring that the standardized forms accu-
B. STANDARDIZATION OF ROMAN URDU TEXT rately reflect semantic consistency. The pseudocode of the
Roman Urdu is the Latin script of Urdu. To date as per standardization is given in Algorithm 1. The program system-
our knowledge, there is no standardization of Roman Urdu atically replaces each variation in the UCL dataset with its
script, resulting in multiple variations of the same word due standardized counterpart using a mapping dictionary and reg-
to differences in spelling. Consider the example of the word ular expressions. For example, variations of the Roman Urdu
“bekaar”, which translates to ‘useless’ in English. The word word “bakwaas” which translates to (rubbish) in English, like
can appear in various forms such as “bekaaar”, or “bekar”, “bakwaaas”, “bakwaass”, and “bakwass” are systematically
all conveying the same negative sentiment but with different mapped to “bakwaas”. Similarly, variations of “jhoota” (liar)
spellings. These variations can lead to inconsistency in sen- such as “jhota”, “jhooty”, and “jhooti” are replaced with
timent analysis (SA) results, as the algorithms employed for “jhoota”. To standardize the text in the IMDB dataset, lemma-
SA may not efficiently interpret and identify these variations. tization was performed to reduce words to their root forms
To address this challenge we devise a Python program to using the Natural Language Toolkit [30].
standardize these variations of words.
Our approach involves creating a mapping dictionary that C. FEATURE EXTRACTION
pairs each standard Roman Urdu word with its variations. In SA, feature extraction and representation using feature
The mapping dictionary, comprising 2,500 entries, was man- vectors are the key steps. These vectors enable the acquisi-
ually curated from the vocabulary of the UCL dataset. It tion of classification models to determine labels for unseen

602 VOLUME 5, 2024


and unigram + bigram (uni + bi) representations from N-gram
Algorithm 1: Pseudocode of Standardizing Roman Urdu
models as a feature set. The word order within a sentence’s
Word Variations.
vector representation is captured by the N-gram model. The
1: Input: Roman Urdu dataset RU _D, Word Mapping
vector representations are very effective, especially in senti-
Dictionary W _M_D
ment analysis where they provide a thorough comprehension
2: Output: Normalized Roman Urdu dataset with
of the linguistic complexities found within the text. For ex-
standardized words
ample, consider a sentence expressed in Roman Urdu, “Inho
3: for each text t in dataset RU _D do
ne har field mein apna loha manwaya hai”, which in English
4: for each standard word sw and its variations var in
translates to “They have proven their mettle in every field”.
W _M_D do
Through trigram analysis (N = 3) after stopwords removal,
5: for each variation v in var do
distinct combinations like ‘Inho har field’, ‘har field apna’,
6: pattern ← create_regex(v)
‘field apna loha’, and ‘apna loha manwaya’, are generated.
7: t ← replace_with_regex(t, pattern, sw)
This N-gram scheme extracts meaningful sequences of N to-
8: end for
kens from the text while maintaining the sentence’s sequence.
9: end for
10: end for
E. FEATURE SELECTION
11: return Normalized Roman Urdu dataset RU _D
In this study, we applied Chi-square feature selection to re-
duce the vast array of textual features into a manageable
classes [31]. Our study explores TF-IDF, BOW, and N-gram subset that best captures the underlying patterns of the data.
models for sentiment polarity classification. The chi-square test was utilized on word-level N-grams to
select the top k features based on the statistical significance.
Employing feature selection gives several advantages, in-
1) BAG-OF-WORDS (BOW)
cluding improved interpretability, reduced risk of overfitting,
The BOW technique represents a document as a collection
enhanced comprehensibility, and model generalization [34].
of words, without considering grammar, syntax, and word
The χ 2 statistic is calculated using the following formula:
orderings [32]. BOW models the data in the text form by
considering the frequency of word occurrence in the corre- 
C
ni
sponding text document. A vector of fixed length is utilized, χ̃ (feat) =
2
× χ 2 (feat, classi ) (2)
N
with each entry mapping to a word present in a pre-defined i=1
dictionary. If a word in a sentence appears in the pre-defined In (2), χ̃ 2 (feat) represents the chi-square statistic for the
dictionary, its corresponding entry in the vector indicates its feature feat. It is computed by summing over all sentiment
frequency in the document. However, if the word is not in the classes (C) and multiplying the proportion of occurrences
dictionary, its entry is typically set to 0. The BOW vocabulary of feature feat within each class ( nNi ) by the corresponding
size is determined by the document’s word count. chi-square score (χ 2 (feat, classi )), which measures the rela-
tionship between the feature and the sentiment class.
2) TERM FREQUENCY AND INVERSE DOCUMENT
FREQUENCY (TF-IDF) F. MACHINE LEARNING CLASSIFIERS
The TF-IDF text representation scheme assigns weights to This section presents the ML classifiers used in the study
individual terms within a document based on their term fre- to classify Roman Urdu and movie reviews. The learning
quency (TF) and inverse document frequency (IDF). TF-IDF algorithms take input features which are extracted using tech-
emphasizes terms with higher weights as more significant niques outlined in Section III.
within the document [33]. These two indicators TF and IDF
determine the weighted score of a single word in each docu- 1) SUPPORT VECTOR MACHINE
ment using the formula provided below: SVM are ML models capable of handling both classification
T F − IDFj,k = T Fj,k × IDFj (1) and regression tasks [35]. SVMs are applied to segregate data
points by incorporating decision boundaries with hyperplanes
where T Fj,k determines the text frequency of word wk in into a multidimensional feature space. During training, SVM
document d j . IDFj determines the inverse value of document selects the points near the decision boundary, termed support
frequency df j , and df j is document frequency of word w j . vectors, which influence the positioning of the hyperplane. In
Inverse document frequency assigns significance to the words this study, we have applied SVM for the binary classification
that have rare occurrences in the document. of Roman Urdu and English reviews into positive and negative
classes.
D. N-GRAM
The N-gram approach is a key text representation scheme 2) LOGISTIC REGRESSION
extensively utilized in text categorization tasks. In this work, LR is a linear classification algorithm primarily utilized for
we utilize word-level unigram (uni), bigram (bi), trigram (tri), binary classification tasks, especially related to NLP [17].

VOLUME 5, 2024 603


HASSAN ET AL.: POLARITY CLASSIFICATION OF LOW RESOURCE ROMAN URDU AND MOVIE REVIEWS SENTIMENTS

The LR model for binary output variables employs a logistic


function to approximate probabilities, providing insights into
the relationship between multiple variables [36]. This logis-
tic function, a fundamental component of LR, calculates the
probability that a given feature vector k belongs to the positive
class using the sigmoid function applied to the linear product
of the weight parameters (w) and the input variables (X ).
The logistic regression equation is expressed as follows:
FIGURE 2. Illustration of boosting ensemble technique.
1
P(c = 1|d ) = g(d ) = (3)
1 + ewTd
1) BAGGING
where P(c = 1|d ) is the probability that document d belongs Bagging or Bootstrap aggregating [42] is an ensemble learn-
to class c, and w denotes the feature-weight parameter to be ing algorithm of multiple similar base learning algorithms,
estimated. aimed to improve accuracy and exhibit high predictive per-
formance. The algorithm combines classifiers that are trained
3) NAÏVE BAYES CLASSIFIER using several training sets acquired by applying the bootstrap
NB is a probabilistic ML model with application in text clas- sampling technique on the initial training set. Bootstrap sam-
sification tasks [37]. NB classifiers are derived from Bayes’ pling involves uniform sampling with replacement from the
theorem but operate under a firm assumption of feature in- original dataset, maintaining the sizes of the training sets iden-
dependence. Multinomial Naïve Bayes model [38] estimates tical to that of the original training set. The base estimators
the class probabilities for a given document d, which is used in our experiment, including SVM, NB, RF, and LR, are
represented by the feature vector ( f1 , f2 , . . . , fn ). Under the trained using this technique.
assumption of independence, the probability of observing fea-
tures within the range f1 to fn for a particular class c can
2) BOOSTING
be calculated as the product of individual probabilities, as
provided below: Boosting [43], is an extensively used ensemble technique em-
ployed to improve the performance of weak classifiers and

P( f1 , f2 , . . ., fn |c) = P( fi |c) (4) produce a more robust classification model. In this technique,
1≤i≤n
classifiers are trained utilizing various sampling distributions
derived from the training data. Weak learning algorithms can
This formulation simplifies the computation of posterior prob- be employed to produce a single, more reliable classification
abilities when NB is employed to classify new instances. model. We employed the Adaboost ensemble technique in
Multinomial Naïves Bayes performs well for easily countable our experiment to classify sentiments. AdaBoost enhances the
data, such as word counts in text, and permits each feature boosting algorithm by emphasizing instances that are chal-
distribution to be multinomial. lenging to learn. In the initial stage, equal weights are assigned
to each pattern within the training set. However, as the ensem-
4) RANDOM FOREST ble learning progresses, these weights undergo adjustments,
RF is an ML technique that employs numerous numbers of increasing for instances misclassified and decreasing for those
decision trees to achieve more stable and accurate predic- classified correctly. The boosting ensemble process is pre-
tions. RF has been widely utilized in NLP tasks based on sented in Fig. 2.
its significant performance [39]. This approach addresses the
inherent limitations of individual decision trees, which often 3) STACKING
exhibit high variance and low bias, leading to sub-optimal Stacking [44] or stacked generalization is an ensemble com-
classification performance. RF integrates two key techniques: bination technique that employs a two-staged structure. In
random feature selection and bagging, both renowned in the this approach, a meta-classifier is trained to combine the
realm of machine learning [40]. predictions of heterogeneous models of different types. The
base learners are trained on the complete dataset, whereas
G. ENSEMBLE TECHNIQUES the meta-learning algorithm is trained on the output of these
Ensemble learning methods utilize a combination of several base classifiers. To train the meta-classifier, a new dataset
classifiers to enhance classification accuracy by aggregating comprising the outputs from the base learners is employed.
their predictions. This approach compensates for individual The dataset differs from the one used to train the base learners,
weaknesses and improves overall performance [41]. In our aiming to minimize overfitting. In this study, we utilize SVM,
experimental setup, we employ five ensemble approaches, NB, LR, and RF as base classifiers in the stacking ensemble
namely bagging, random subspace, stacking, boosting, and approach adopted for sentiment polarity classification. Fig. 3,
majority voting. illustrates the stacking technique adopted in the study.

604 VOLUME 5, 2024


TABLE 3. Hyperparameters for Ensemble Classifiers

TABLE 4. Statistics of UCL and IMDB Datasets

FIGURE 3. Illustration of stacking ensemble technique.

4) RANDOM SUBSPACE
Random subspace method [45], is an ensemble technique that are tested for the ensemble i.e., 10, 20, and 50. In the stack-
employs feature based partitioning to acquire diversity in the ing ensemble, the hyperparameter final_estimator__C
base learners. Multiple base learners are trained on randomly presents the regularization strength of the meta-classifier (Lo-
selected feature subspaces and then combined in this ensem- gistic Regression) used for combining the predictions of the
ble scheme to improve the predictive performance. The weak base learners. We experimented with three values: 0.1, 1.0,
base classifiers trained on random samples of the feature space and 10.0. For AdaBoost, the n_estimators hyperparam-
are combined to produce a robust classifier similar to bagging. eter controls the maximum number of base estimators to
This study employs SVM, LR, NB, and RF as base estimators be sequentially trained. The AdaBoost ensemble classifier is
for random subspace ensemble. evaluated on values: 50, 100, and 150.

5) VOTING
IV. EXPERIMENT
Voting is another ensemble learning approach in which in-
This section outlines the details of the datasets adopted in
stances undergo classification by multiple learning algorithms
the study and the procedure for the experiment carried out to
through combined voting. One of the fundamental voting
classify the sentiments.
schemes utilized is majority voting, where the output class
label is determined by the predictions of more than half of
the employed base classifiers. Consequently, the output of A. EXPERIMENTAL DATASET
the ensemble is based on the class having the most votes. In In this research, we have evaluated the performance of the pro-
this study, we employed the majority voting scheme along- posed architecture on two datasets with statistics depicted in
side hard voting, a specific instance of majority voting where Table 4. The experiment was performed on only positive and
each classifier in the ensemble contributes equally to the fi- negative sentiments from both the UCL and IMDB datasets
nal decision. Hard voting aggregates the predictions made by to ensure consistency across both datasets and enhance model
each classifier and selects the class label with the most votes. performance in binary sentiment classification. Moreover, the
This approach offers increased robustness, improved gener- adopted approach avoids data imbalance in the UCL dataset
alization, and reduction in overfitting based on a collective with neutral reviews having a significantly different distribu-
decision-making process. tion compared to positive and negative reviews. The details of
the datasets are presented in the following subsections.
H. HYPERPARAMETER OPTIMIZATION
The grid search technique is adopted for hyperparameter op- 1) UCL ROMAN URDU DATASET
timization of ensemble classifiers. The hyperparameter tuning
The UCL Roman Urdu corpus comprises 20,228 sentences
is performed on configurations obtained by combining feature
that are further categorized into three classes: positive, nega-
extraction schemes (BOW and TF-IDF), N-gram feature sets,
tive, and neutral [15], [46]. The complete dataset comprises
and Chi-square feature selection. Predefined sets of hyperpa-
6,013 positive sentences, 5,286 negative sentences, and 8,929
rameters are utilized to conduct the grid searches to identify
neutral sentences.
the optimal values. The systematic approach assists in fine-
tuning the models effectively and improving their predictive
capabilities on both the UCL and IMDB datasets. Table 3 2) IMDB DATASET
summarizes the hyperparameters for each ensemble method The 50K IMDB movie reviews is a sentiment analysis dataset
used in the experiment analysis. Bagging and random sub- used in the experiment analysis comprising English text only
space n_estimators hyperparameter determines the num- and obtained from the Kaggle platform [47]. The dataset is
ber of base estimators in the ensemble. Three different values labeled and has 50,000 records with two attributes i.e., review

VOLUME 5, 2024 605


HASSAN ET AL.: POLARITY CLASSIFICATION OF LOW RESOURCE ROMAN URDU AND MOVIE REVIEWS SENTIMENTS

TABLE 5. Classification Accuracy Results of ML Classifiers on UCL Dataset

TABLE 6. Classification Accuracy Results of ML Classifiers on IMDB Dataset

and sentiment. The acquired dataset is balanced with 25,000 The tailored approach to feature selection enhances the clas-
sentiments having ‘positive’ polarity and 25,000 having ‘neg- sification performance capturing the informative and relevant
ative’ polarity. features to train the models leading to optimal accuracy re-
sults. The base classifiers are trained on the selected features
of the training sets. The ensemble classifiers leverage the
B. EXPERIMENTAL PROCEDURE insights derived from the trained base classifiers to enhance
To establish robust models for classifying sentiment polarity, classification performance and predict sentiment labels on the
feature extraction schemes of BOW and TF-IDF, utilizing testing set. The performance metrics of accuracy and the F1
N-grams, are applied to both the training and testing sets. Both score are adopted to evaluate the classification performance of
UCL and IMDB datasets are split into training and testing the ensemble classifiers.
sets with an 80:20 ratio, ensuring a representative distribution Hyperparameter optimization is performed using grid
of classes in both sets. The chi-square test is used to select search with 10-fold cross validation to fine-tune the param-
the most informative and discriminative features across vari- eters of each ensemble classifier on the UCL and IMDB
ous N-gram configurations to reduce the word-level N-grams datasets. The optimal value of the n_estimators hyper-
in the UCL and IMDB datasets. In this research, a series parameter, set to 50, gives the best classification results for
of experiments were conducted using features ranging from both bagging and random subspace methods. For the stacking
the top 200 to 4000 words based on significant Chi-square classifier, the optimal value for the final_estimator__C
test score values for the UCL dataset, and from the top 500 hyperparameter is determined to be 0.1. Additionally, the Ad-
to 6000 words for the IMDB dataset. For optimal results, aBoost classifier achieves optimal performance with a setting
the top 1000 and 4000 word-level features are selected for of 150 for the n_estimators hyperparameter. The major-
the UCL and IMDB datasets, respectively, based on signif- ity voting aggregates the predictions from LR, RF, SVM, and
icant Chi-square test score values on the training sets. The NB classifiers to produce ensemble predictions. All the exper-
difference in the selected features reflects the distinct char- iments are conducted within the Google Colaboratory (Colab)
acteristics of each dataset. The large IMDB dataset has a notebook environment, leveraging scikit learn packages for
more diverse vocabulary size compared to the UCL dataset. machine learning classifiers.

606 VOLUME 5, 2024


TABLE 7. Classification Accuracy Results of Ensemble Algorithms on UCL Dataset with (a) Unigrams, Bigrams, and trigrams (b) Unigram + Bigram
Feature Set

TABLE 8. Classification Accuracy Results of Ensemble Algorithms on IMDB Dataset with (a) Unigrams, Bigrams, and trigrams (b) Unigram + Bigram
Feature Set

V. RESULTS AND DISCUSSION results, comparatively better performance on unigram features


This section presents the results achieved by machine learn- is observed on IMDB as depicted in Table 6. The classifiers
ing classifiers and proposed optimized ensemble classifiers on SVM (87.88%), LR (87.01%), and NB (85.52%) perform
the sentiment datasets. better on unigram features with the TF-IDF weighting scheme
and no feature selection on N-grams sets. Compared to UCL
A. PERFORMANCE OF THE BASE CLASSIFIERS an increase in the classifiers’ performance can be observed
Table 5 presents the accuracy values of machine learning with the combination of unigram, and bigram features, where
classifiers adopted in the study using different N-gram feature SVM achieves a significant accuracy of 87.96% with unigram
sets and weighting schemes (BOW and TF-IDF) on the UCL and bigram feature sets. RF has low classification perfor-
Roman Urdu dataset. The performance is evaluated without mance compared to other base classifiers and achieves a high
feature selection and using Chi-square on the N-gram features. accuracy of 73.49% on UCL with merged unigram and bigram
The unigram features consistently yield the highest accuracy feature set and BOW scheme.
across the classifiers used in the experiment with both BOW The performance of the machine learning classifiers im-
and TF-IDF schemes. NB achieves the highest accuracy of proves significantly with Chi-square feature selection on the
75.84% with the unigram and BOW weighting scheme, with- UCL dataset. LR achieves 76.76% accuracy with unigram
out employing feature selection on N-grams. LR achieves and BOW feature extraction scheme after Chi-square feature
an accuracy of 75.39% and SVM acquires an accuracy of selection (top k = 1000), compared to 75.39% without feature
73.14% on unigram and BOW scheme. Similar to the UCL selection. The performance difference is more pronounced

VOLUME 5, 2024 607


HASSAN ET AL.: POLARITY CLASSIFICATION OF LOW RESOURCE ROMAN URDU AND MOVIE REVIEWS SENTIMENTS

TABLE 9. Comparison of the Proposed Approach Performance With Closely Related Works

with the Chi-square feature selection employed on a combi- consistently yield strong results, particularly with TF-IDF
nation of N-gram features (i.e., unigram + bigram) compared representation, showcasing the effectiveness of the feature
to no feature selection on the same combinations. NB achieves extraction technique in capturing meaningful textual informa-
the highest accuracy of 79.17% using Chi-square feature se- tion. Boosting (AdaBoost) demonstrates comparatively lower
lection on the merge of unigram, and bigram features with performance across the UCL dataset, suggesting limitations
the TF-IDF weighting scheme. LR achieves comparatively in effectively leveraging weak learners to boost overall per-
better performance than SVM with an accuracy of 78.53% formance.
on the feature set of merged unigram, and bigram with the The consistent performance of the stacking ensemble is
BOW scheme and using Chi-square feature selection. Similar also observed in the IMDB dataset as shown in Table 8,
improved results of the base classifiers can also be observed achieving the highest accuracy of 90.92%. This significant
in the IMDB corpus on top 4000 N-grams selected using performance is achieved with the combination of unigram and
Chi-square. SVM demonstrates high performance on IMDB, bigram features (i.e., unigram + bigram) and TF-IDF repre-
achieving an accuracy of 89.91%, utilizing features selected sentation using Chi-square top k = 4000 features. Similar to
through Chi-square (top k = 4000) on the combined feature the UCL dataset, random subspace with SVM as base esti-
set of unigram, and bigram with the TF-IDF text representa- mator achieves a second high accuracy of 90.74% on merged
tion scheme. unigram and bigram features with the TF-IDF scheme. Ma-
jority voting, which incorporates all base classifiers, exhibits
B. PERFORMANCE OF THE PROPOSED ENSEMBLE competitive accuracy results ranging from 89.29% to 89.79%.
CLASSIFIERS Across bagging and random subspace methods, LR con-
Table 7 presents detailed classification results for the UCL sistently demonstrates strong performance, particularly with
Roman Urdu dataset in terms of accuracy. The results are TF-IDF representation. Considering the performance of text
obtained by incorporating ensemble algorithms which are fur- representation schemes on both datasets, BOWs generally
ther optimized through grid search. The stacking ensemble produce less accuracy as compared to TF-IDF. However,
approach gives consistent high performance across multiple both techniques showcase consistent trends across ensemble
combinations on UCL, achieving an impressive accuracy of methods. TF-IDF representation tends to capture the semantic
80.30%. The performance is attained using the combined un- relevance of words more effectively, resulting in higher clas-
igram and bigram feature set (i.e., unigram + bigram) with sification accuracy across various ensemble techniques.
TF-IDF representation and Chi-square top k = 1000 features. The stacking ensemble method achieves the highest F1-
Notably, stacking leverages all base classifiers, including LR, score of 81.76% on the UCL dataset and 91.12% on the IMDB
SVM, NB, and RF, with logistic regression serving as the dataset, utilizing the combination of unigram and bigram fea-
meta-classifier. Random subspace with SVM as base estima- ture set (i.e., unigram + bigram) with the TF-IDF scheme as
tor achieves the second best accuracy of 80.22% on merged depicted in Fig. 4.
unigram and bigram features and TF-IDF weighting scheme.
Majority voting, another ensemble technique employing all C. COMPARATIVE EVALUATION
base classifiers, also demonstrates competitive performance, The comparison of our proposed approach on both the
achieving accuracies ranging from 55.77% to 79.46%. Across datasets (i.e., Roman Urdu UCL and IMDB) with other
both bagging and random subspace methods, LR and NB baseline algorithms is presented in Table 9. The comparison

608 VOLUME 5, 2024


FIGURE 4. Performance comparison of the proposed ensemble classifiers across N-gram features.

analysis shows that our proposed ensemble classifiers espe- potential in addressing the challenges of sentiment analysis
cially stacking outperform other existing techniques, achiev- in low-resource languages, offering insights applicable to di-
ing a high accuracy and F1-score of 80.30% and 81.76%, verse linguistic domains like sentiment emotion classification,
respectively, on the UCL dataset. Similarly, our proposed ap- sarcasm detection, and fake news identification.
proach also maintains a superior performance on the IMDB
dataset, achieving a remarkable accuracy of 90.92% and an
impressive F1-score of 91.12%. REFERENCES
[1] A. Lytos, T. Lagkas, P. Sarigiannidis, and K. Bontcheva, “The evolution
of argumentation mining: From models to social media and emerging
VI. CONCLUSION AND FUTURE DIRECTIONS tools,” Inf. Process. Manage., vol. 56, no. 6, 2019, Art. no. 102055.
This research provides valuable insights into the performance [2] M. Al-Smadi, M. Al-Ayyoub, Y. Jararweh, and O. Qawasmeh, “En-
hancing aspect-based sentiment analysis of arabic hotels’ reviews using
of ensemble learning techniques for sentiment analysis focus- morphological, syntactic and semantic features,” Inf. Process. Manage.,
ing on low-resource Roman Urdu UCL and IMDB datasets. A vol. 56, no. 2, pp. 308–319, 2019.
standardization approach was implemented to normalize the [3] S. Al-Dabet, S. Tedmori, and A.-S. Mohammad, “Enhancing arabic
aspect-based sentiment analysis using deep learning models,” Comput.
Roman Urdu dataset, reducing complexity by standardizing Speech Lang., vol. 69, 2021, Art. no. 101224.
word variations. [4] O. Araque, G. Zhu, and C. A. Iglesias, “A semantic similarity-based
Our study demonstrates the effectiveness of the stacking en- perspective of affect lexicons for sentiment analysis,” Knowl.-Based
Syst., vol. 165, pp. 346–359, 2019.
semble method in classifying sentiments in both Roman Urdu [5] A. Kumar, K. Srinivasan, W.-H. Cheng, and A. Y. Zomaya, “Hybrid
and English, achieving maximum accuracy of 80.30% and context enriched deep learning model for fine-grained sentiment anal-
90.92%, respectively. These empirical findings underscore the ysis in textual and visual semiotic modality social data,” Inf. Process.
Manage., vol. 57, no. 1, 2020, Art. no. 102141.
robustness and versatility of ensemble techniques in senti- [6] B. Zhang, X. Xu, X. Li, X. Chen, Y. Ye, and Z. Wang, “Sentiment analy-
ment analysis tasks across diverse datasets and languages. sis through critic learning for optimizing convolutional neural networks
Furthermore, our study highlights ensemble learning’s with rules,” Neurocomputing, vol. 356, pp. 21–30, 2019.

VOLUME 5, 2024 609


HASSAN ET AL.: POLARITY CLASSIFICATION OF LOW RESOURCE ROMAN URDU AND MOVIE REVIEWS SENTIMENTS

[7] V. K. Vijayan, K. Bindu, and L. Parameswaran, “A comprehensive study [30] E. Loper and S. Bird, “NLTK: The natural language toolkit,” in Proc.
of text classification algorithms,” in Proc. 2017 Int. Conf. Adv. Comput. ACL Workshop Effective Tools Methodol. Teach. Natural Lang. Process.
Commun. Inform., 2017, pp. 1109–1113. Comput. Linguistics, 2002, pp. 63–70.
[8] X. Wang, J. Wang, Y. Yang, and J. Duan, “Labeled LDA-kernel SVM: [31] P. C. Lane, D. Clarke, and P. Hender, “On developing robust models for
A short chinese text supervised classification based on sina weibo,” in favourability analysis: Model choice, feature sets and imbalanced data,”
Proc. 4th Int. Conf. Inf. Sci. Control Eng., 2017, pp. 428–432. Decis. Support Syst., vol. 53, no. 4, pp. 712–718, 2012.
[9] M. S. Haydar, M. Al Helal, and S. A. Hossain, “Sentiment extraction [32] G. Hackeling, Mastering Machine Learning With Scikit-Learn. Birm-
from Bangla text: A character level supervised recurrent neural network ingham, U.K.: Packt Publishing Ltd., 2017.
approach,” in Proc. 2018 Int. Conf. Comput. Commun. Chem. Mater. [33] W. Zhang, T. Yoshida, and X. Tang, “A comparative study of TF*IDF,
Electron. Eng., 2018, pp. 1–4. LSI and multi-words for text classification,” Expert Syst. Appl.,
[10] A. Yadav and D. K. Vishwakarma, “Sentiment analysis using deep vol. 38, no. 3, pp. 2758–2765, 2011. [Online]. Available: https://www.
learning architectures: A review,” Artif. Intell. Rev., vol. 53, no. 6, sciencedirect.com/science/article/pii/S0957417410008626
pp. 4335–4385, 2020. [34] A. Madasu and S. Elango, “Efficient feature selection techniques for
[11] A. Mahmoud and M. Zrigui, “Deep neural network models for para- sentiment analysis,” Multimedia Tools Appl., vol. 79, pp. 6313–6335,
phrased text classification in the Arabic language,” in Proc. 24th Int. 2020.
Conf. Appl. Natural Lang. to Inf. Syst., Salford, U.K., Springer, 2019, [35] U. Ali et al., “Automatic cancerous tissue classification using discrete
pp. 3–16. wavelet transformation and support vector machine,” J. Basic Appl. Sci.
[12] R. Pascanu, T. Mikolov, and Y. Bengio, “On the difficulty of training Res., vol. 6, no. 7, pp. 15–23, 2016.
recurrent neural networks,” in Proc. Int. Conf. Mach. Learn., PMLR, [36] B. Gaye, D. Zhang, and A. Wulamu, “A tweet sentiment classification
2013, pp. 1310–1318. approach using a hybrid stacked ensemble technique,” Information,
[13] K. Mehmood, D. Essam, and K. Shafi, “Sentiment analysis system for vol. 12, no. 9, 2021, Art. no. 374. [Online]. Available: https://www.
Roman Urdu,” in Proc. 2018 Comput. Conf. Intell. Comput., Springer, mdpi.com/2078-2489/12/9/374
2019, pp. 29–42. [37] G. Singh, B. Kumar, L. Gaur, and A. Tyagi, “Comparison between
[14] F. Noor, M. Bakhtyar, and J. Baber, “Sentiment analysis in e-commerce multinomial and Bernoulli Naïve Bayes for text classification,” in Proc.
using SVM on Roman Urdu text,” in Proc. 2nd Int. Conf. Emerg. 2019 Int. Conf. Automat. Comput. Technol. Manage., 2019, pp. 593–
Technol. Comput., London, U.K., Springer, 2019, pp. 213–222. 596.
[15] Z. Mahmood et al., “Deep sentiments in roman urdu text using recurrent [38] A. M. Kibriya, E. Frank, B. Pfahringer, and G. Holmes, “Multinomial
convolutional neural network model,” Inf. Process. Manage., vol. 57, Naive Bayes for text categorization revisited,” in Proc. 2004 Adv. Artif.
no. 4, 2020, Art. no. 102233. Intell., 2004, pp. 488–499.
[16] B. Chandio et al., “Sentiment analysis of roman urdu on e-commerce [39] H. Chen, L. Wu, J. Chen, W. Lu, and J. Ding, “A comparative study
reviews using machine learning,” CMES-Comput. Model. Eng. Sci., of automated legal text classification using random forests and deep
vol. 131, pp. 1263–1287, 2022. learning,” Inf. Process. Manage., vol. 59, no. 2, 2022, Art. no. 102798.
[17] F. Mehmood, M. U. Ghani, M. A. Ibrahim, R. Shahzadi, W. Mah- [40] P. Jiang, H. Wu, J. Wei, F. Sang, X. Sun, and Z. Lu, “RF-DYMHC:
mood, and M. N. Asim, “A precisely xtreme-multi channel hybrid Detecting the yeast meiotic recombination hotspots and coldspots by
approach for Roman Urdu sentiment analysis,” IEEE Access, vol. 8, random forest model using gapped dinucleotide composition features,”
pp. 192740–192759, 2020. Nucleic Acids Res., vol. 35, no. suppl_2, pp. W47–W51, 2007.
[18] L. Khan, A. Amjad, K. M. Afaq, and H.-T. Chang, “Deep senti- [41] J. Kazmaier and J. H. van Vuuren, “The power of ensemble learning in
ment analysis using CNN-LSTM architecture of English and Roman sentiment analysis,” Expert Syst. Appl., vol. 187, 2022, Art. no. 115819.
Urdu text shared in social media,” Appl. Sci., vol. 12, no. 5, 2022, [42] L. Breiman, “Bagging predictors,” Mach. Learn., vol. 24, no. 2,
Art. no. 2694. pp. 123–140, 1996.
[19] B. A. Chandio, A. S. Imran, M. Bakhtyar, S. M. Daudpota, and J. Baber, [43] R. E. Schapire, “The strength of weak learnability,” Mach. Learn.,
“Attention-based RU-BiLSTM sentiment analysis model for Roman vol. 5, no. 2, pp. 197–227, 1990.
Urdu,” Appl. Sci., vol. 12, no. 7, 2022, Art. no. 3641. [44] D. H. Wolpert, “Stacked generalization,” Neural Netw., vol. 5, no. 2,
[20] A. J. Dueppen, M. L. Bellon-Harn, N. Radhakrishnan, and V. Mancha- pp. 241–259, 1992.
iah, “Quality and readability of English-language internet information [45] T. K. Ho, “The random subspace method for constructing decision
for voice disorders,” J. Voice, vol. 33, no. 3, pp. 290–296, 2019. forests,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 20, no. 8,
[21] M. A. Qureshi et al., “Sentiment analysis of reviews in natural language: pp. 832–844, Aug. 1998.
Roman Urdu as a case study,” IEEE Access, vol. 10, pp. 24945–24954, [46] Z. Sharf and S. U. Rahman, “Performing natural language processing on
2022. roman Urdu datasets,” Int. J. Comput. Sci. Netw. Secur., vol. 18, no. 1,
[22] M. Z. Asghar, A. Sattar, A. Khan, A. Ali, F. Masud Kundi, and S. pp. 141–148, 2018.
Ahmad, “Creating sentiment lexicon for sentiment analysis in Urdu: [47] A. Maas, R. E. Daly, P. T. Pham, D. Huang, A. Y. Ng, and C.
The case of a resource-poor language,” Expert Syst., vol. 36, no. 3, 2019, Potts, “Learning word vectors for sentiment analysis,” in Proc. 49th
Art. no. e12397. Annu. Meeting Assoc. Comput. Linguistics: Hum. Lang. Technol., 2011,
[23] D. Alessia, F. Ferri, P. Grifoni, and T. Guzzo, “Approaches, tools and pp. 142–150.
applications for sentiment analysis implementation,” Int. J. Comput. [48] M. Ghorbani, M. Bahaghighat, Q. Xin, and F. Özen, “ConvLSTMConv
Appl., vol. 125, no. 3, pp. 26–33, 2015. network: A deep learning approach for sentiment analysis in cloud
[24] X. Xie, S. Ge, F. Hu, M. Xie, and N. Jiang, “An improved algorithm for computing,” J. Cloud Comput., vol. 9, no. 1, pp. 1–12, 2020.
sentiment analysis based on maximum entropy,” Soft Comput., vol. 23, [49] N. S. M. Nafis and S. Awang, “An enhanced hybrid feature selec-
no. 2, pp. 599–611, 2019. tion technique using term frequency-inverse document frequency and
[25] M. Dragoni and G. Petrucci, “A fuzzy-based strategy for multi-domain support vector machine-recursive feature elimination for sentiment clas-
sentiment analysis,” Int. J. Approx. Reasoning, vol. 93, pp. 59–73, sification,” IEEE Access, vol. 9, pp. 52177–52192, 2021.
2018.
[26] B. Pang and L. Lee, “A sentimental education: Sentiment analysis using MUHAMMAD EHTISHAM HASSAN received
subjectivity summarization based on minimum cuts,” in Proc. 42nd the M.S. degree in engineering management from
Annu. Meeting Assoc. Comput. Linguistics, 2004, pp. 271–278. NUST-EME, Rawalpindi, Pakistan, in 2018. He
[27] C.-L. Liu, W.-H. Hsaio, C.-H. Lee, G.-C. Lu, and E. Jou, “Movie is currently working in the capacity of Doctoral
rating and review summarization in mobile environment,” IEEE Trans. Researcher with Ghulam Ishaq Khan Institute
Syst., Man, Cybern., Part C (Appl. Rev.), vol. 42, no. 3, pp. 397–407, (GIKI), Khyber Pakhtunkhwa, Pakistan, where he
May 2012. is also with the Department of Data Science and a
[28] L. Khan, A. Amjad, N. Ashraf, and H.-T. Chang, “Multi-class sentiment part with Data Engineering Management Analysis
analysis of Urdu text using multilingual BERT,” Sci. Rep., vol. 12, no. 1, (DEMA) Research Group. He is the Coordina-
2022, Art. no. 5436. tor with the International Collegiate Programming
[29] A. Jain and V. Jain, “Efficient framework for sentiment classification Contest (ICPC), Asia Topi Region. His research in-
using apriori based feature reduction,” EAI Endorsed Trans. Scalable terests include natural language processing, machine learning, deep learning,
Inf. Syst., vol. 8, no. 31, 2021, Art. no. e3. large language models (LLMs), and data science.

610 VOLUME 5, 2024


IFFAT MAAB received a master’s degree in com- USMAN HABIB received the M.S. degree in
puter engineering from the Ghulam Ishaq Khan telematics: communication network and networked
Institute for Engineering Sciences and Technol- services from the Norwegian University of Science
ogy, Khyber Pakhtunkhwa, Pakistan in 2016, and and Technology (NTNU), Trondheim, Norway, in
the Ph.D. degree in technology management for 2010, and the Ph.D. degree in engineering sci-
innovation from the Department of Technology ences from ICT Department, Technical University
Management for Innovation, Graduate School of of Vienna, Vienna, Austria, in 2016. He was with
Engineering, University of Tokyo, Japan, in 2024. the Ghulam Ishaq Khan Institute of Engineering
She was a Lecturer with the Ghulam Ishaq Khan Sciences and Technology, Khyber Pakhtunkhwa,
Institute for Engineering Sciences and Technology, Pakistan, and Swabi and COMSATS University
Khyber Pakhtunkhwa, Pakistan, and also remained Abbottabad Campus. He is currently an Associate
a Researcher with Stuttgart University, Stuttgart, Germany. She is currently a Professor and the Head of Software Engineering Department with the FAST
Project Researcher with the Digital Content and Media Sciences Research Di- School of Computing, National University of Computer and Emerging Sci-
vision, National Institute of Informatics (NII), Tokyo. Her research interests ences (NUCES), Islamabad, Pakistan. Since 2006, he has more than eighteen
encompass machine learning, Big Data analysis, natural language processing years of teaching and research experience, and has successfully completed
(NLP), computer vision, and deep learning. various industrial projects along with serving in academia. He is also ac-
tively engages in research and has authored numerous conference and journal
publications. His research interests include machine learning, data analytics,
MASROOR HUSSAIN is currently a Professor pattern recognition, security, and medical image processing.
of data science and the HoD with Computer En-
gineering and Data Science Department, Faculty
of Computer Science and Engineering, Ghulam
Ishaq Khan Institute (GIKI). His research interests
include adaptive meshing, data mining, data ware- YUTAKA MATSUO is currently a Professor with
housing, finite element methods, neural networks, the Department of Technology Management for
and parallel computing. He is also the Regional Innovation, Graduate School of Engineering, Uni-
Contest Director (RCD) with International Col- versity of Tokyo, Tokyo, Japan. He is also the
legiate Programming Contest (ICPC), Asia Topi Chairman with Japan Deep Learning Association,
Region. He is the Project Director and has expertise an outside Director with SoftBank Group, Chair-
in the areas of high-performance computing as well as deep neural networks. man with Cabinet Office’s AI Strategy Council,
He has supervised project(s) in the areas of sentiment analysis and fake news and an Expert Member with the Council for the
detection which will significantly influence the project. He possesses vast ex- Realization of New Capitalism. His research in-
perience in parallel programming and has supervised M.S. and Ph.D. degrees terests include artificial intelligence, deep learning,
students in this area. He had participated in many national programming and and web mining.
software competitions, as the representative of NUCES-FAST Lahore, like
Softec, Procom, and Softcom during his undergraduate studies. He was the
recipient of the 1st Prize in SoftCom Nov-2001 for the speed programming
competition.

VOLUME 5, 2024 611

You might also like