idea5
idea5
ABSTRACT The complex linguistic characteristics and limited resources present sentiment analysis in
Roman Urdu as a unique challenge, necessitating the development of accurate NLP models. In this study,
we investigate the performance of prominent ensemble methods on two diverse datasets of UCL and IMDB
movie reviews with Roman Urdu and English dialects, respectively. We perform a comparative examination
to assess the effectiveness of ensemble techniques including stacking, bagging, random subspace, and
boosting, optimized through grid search. The ensemble techniques employ four base learners (Support
Vector Machine, Random Forest, Logistic Regression, and Naive Bayes) for sentiment classification. The
experiment analysis focuses on different N-gram feature sets (unigrams, bigrams, and trigrams), Chi-square
feature selection, and text representation schemes (Bag of Words and TF-IDF). Our empirical findings
underscore the superiority of stacking across both datasets, achieving high accuracies and F1-scores: 80.30%
and 81.76% on the UCL dataset, and 90.92% and 91.12% on the IMDB datasets, respectively. The proposed
approach has significant performance compared to baseline approaches on the relevant tasks and improves
the accuracy up to 7% on the UCL dataset.
INDEX TERMS Chi-square, ensemble learning, grid search optimization, low resource language, machine
learning, natural language processing (NLP), n-gram, sentiment analysis.
embeddings, sentiment lexicons, neural networks, and ML presenting practical applications across various domains. SA
classifiers comprising Support Vector Machines (SVM), and seeks to identify and extract sentiment-related information
Naive Bayes (NB) [8], [9]. Deep neural networks (DNN), from data sources, essential for knowledge collection and
including Recurrent neural networks (RNN) and Long Short decision-making [23]. Numerous studies on SA have been
Term Memory Networks (LSTM), have been employed for conducted that utilize supervised, unsupervised, and semi-
various complex text classification tasks such as SA [10], supervised techniques, however, there exists limited work on
[11], with LSTM addressing issues like the vanishing and ex- Roman Urdu SA. Table 1 outlines the previous work on Ro-
ploding gradient problems in RNN [12]. However, employing man Urdu SA. This section summarizes the past research on
DNN architecture necessitates extensive hyperparameter tun- Roman Urdu SA and discusses effective approaches adopted
ing and large training parameters, which become challenging, by practitioners across various domains for sentiment analy-
particularly with large datasets. sis.
Extensive research on SA focuses on European languages Mehmood et al. [13] utilized machine learning (ML) ap-
and English, benefiting from their rich linguistic resources. proach (NB, SVM, LR, KNN, and DT) with N-gram features
The digital platforms support English and Roman scripts pre- for Roman Urdu SA. They created a Roman Urdu cor-
dominantly [14], [20], encouraging users to use their native pus comprising 779 reviews spanning five distinct domains:
language with Roman scripts for reviews like Urdu, Hindi, movie, politics, drama, mobile reviews, and miscellaneous.
Arabic, etc [21]. SA in languages like Roman Urdu faces lim- The study findings show that NB and LR classifiers outper-
itations due to resource constraints, including the absence of form other ML classifiers employed in the study. In another
standardized corpora. Urdu, the native and official language of study on Roman Urdu SA [14], SVM was applied to re-
Pakistan, is spoken in many South Asian states due to its his- views sourced from an e-commerce website (Daraz.pk). They
torical and cultural roots [22]. The adoption of Roman Urdu, used a vector space model and a TF-IDF weighting scheme
which transcribes Urdu using the English alphabet, on social to represent the reviews. Mehmood et al. [15] proposed a
media platforms serves users of diverse age groups, enhancing deep learning (DL) model to analyze emotions and attitudes
communication comfort. Consequently, the development of expressed in Roman Urdu, utilizing a dataset comprising
robust SA models for low-resource Roman Urdu is vital for 10,021 sentences related to various genres. They established
comprehending user emotions and sentiments. a manually annotated benchmark corpus for SA in Roman
In this research, we present a novel approach to sentiment Urdu and applied rule-based, N-gram, and Recurrent Con-
analysis in low-resource Roman Urdu, leveraging machine volutional Neural Network (RCNN) to classify sentiments.
learning and ensemble learning techniques. We investigate en- RCNN model acquires high performance compared to rule-
semble approaches performance optimized using grid search based and N-gram, achieving an accuracy of 65.2% for binary
on both UCL Roman Urdu and IMDB datasets. Our study classification. Another study by Chandio et al. [16] employed
focuses on improving sentiment classification performance, fine-tuned SVM utilizing Roman Urdu stemmer to classify
recognizing the complexity of linguistic characteristics and sentiments. The study adopted bag of words (BOW) and
limited resources in Roman Urdu. The main contributions of TF-IDF schemes and introduced the largest Roman Urdu e-
our study are as follows: commerce dataset (RUECD).
r Developed a robust approach to SA in low-resource Ro- In [17], three neural embeddings (Word2vec, GloVe, and
man Urdu and English dialects, leveraging optimized FastText) were introduced for Roman Urdu to enhance NLP
ensemble classifiers using grid search to enhance perfor- tasks. To establish a performance baseline for Roman Urdu
mance. SA, the study employed ML classifiers (SVM, NB, and LR),
r Devise a standardization method for word variations in DL models (RCNN and RNN), and proposed a multi-channel
Roman Urdu text. hybrid approach. The hybrid approach combining CNN and
r Applied Chi-square feature selection to identify and se- RCNN with pre-trained embeddings outperforms ML and
lect the statistically significant features in the N-gram DL methods on the Roman Urdu corpus with 3,241 senti-
sets i.e., unigrams, bigrams, trigrams, and their combi- ments categorized into positive, negative, and neutral classes.
nation. A deep learning approach CNN-LSTM was adopted by
The subsequent sections of the article are organized as fol- Khan et al. [18] with diverse word embeddings for SA in both
lows: Section II provides an overview of previous research Roman Urdu and English dialects. They employed machine
in the subject domain. Section III discusses the proposed learning classifiers alongside the proposed DL architecture.
methodology for sentiment polarity classification. The exper- They evaluated the performance of the DL and ML classifiers
imental setup adopted in this study is detailed in Sections IV, on four datasets, where SVM and Word2Vec continuous bag
and V discusses the results acquired from the proposed ap- of words (CBOW) demonstrated improved performance on
proach. Finally, Section VI concludes the work, highlighting Roman Urdu SA. Another study by Chandio et al. [19],
future directions. applies deep recurrent architecture RU-BiLSTM for SA in
Roman Urdu, combining bidirectional LSTM with word em-
II. LITERATURE REVIEW beddings and attention mechanism. The proposed DL model
Sentiment analysis (SA) has gained importance in re- outperforms the baseline models on two Roman Urdu datasets
cent times due to the proliferation of user-generated text, i.e., RUSA-19 and RUECD.
600 VOLUME 5, 2024
TABLE 1. Prior Research Work on Roman Urdu Sentiment Analysis
Xie et al. [24] proposed a maximum entropy model to and ensemble techniques to effectively classify sentiments in
extract emotion words from Wikipedia and corpus using Roman Urdu text.
probabilistic latent semantic analysis. A fuzzy logic-based
technique for exhibiting the polarity acquired through train-
III. METHODOLOGY
ing sets or a training set was presented by Dragoni and
In this study, we investigate the performance of machine
Petrucci [25]. The approach they adopted makes use of poten-
learning and ensemble classifiers for binary classification of
tial conceptual domain overlaps to develop a general model
sentiments in both Roman Urdu and English dialects, utilizing
that can determine the polarity of texts into arbitrary domains.
two distinct datasets i.e., UCL and IMDB movie reviews.
A new approach for employing machine learning methods on
Sample sentences from the UCL Roman Urdu dataset are pre-
the movie reviews dataset was proposed by Pang and Lee [26],
sented in Table 2. We employed five ensemble approaches in
in which text classification algorithms were employed on
the experiment analysis, namely bagging, random subspace,
the subjective parts of the documents using Minimum cuts
stacking, boosting and majority voting to enhance classifi-
formulation to determine the polarity of sentiments. The ap-
cation performance by combining predictions from multiple
proach adopted, in the initial step subdivided the objective
base learners. The base learners utilized in our study include
and subjective words belonging to the documents and al-
SVM, Random Forest (RF), LR, and NB, chosen for their
lowed the remaining words for the next step. To extract the
competitive performance on various NLP tasks [17], [29]. Our
results, NB and SVM classifiers were applied in the proposed
experimental setup comprises the exploration of various word
approach.
level N-gram feature sets (unigrams, bigrams, trigrams, and
In article [27], the authors reported designing and building a
unigram + bigram), along with Chi-square feature selection
system for summarising movie reviews and ratings for mobile
to identify significant features, and two text representation
platforms. The results of applying sentiment classification to
schemes, BOW and TF-IDF. Hyperparameter optimization is
movie reviews were used to determine the rating in the pro-
conducted through grid search with 10-fold cross-validation
posed approach. Recognizing the significance of identifying
to ensure robust model performance. The entire process of
product features for feature-based summarization, the authors
the proposed approach can be completed in various steps as
proposed employing Latent Semantic Analysis (LSA) for
depicted in Fig. 1. These steps are elaborated upon in de-
identifying these features. Another study by Khan et al. [28],
tail in the subsequent subsections, providing a comprehensive
employed rule-based, ML-based, and DL-based approaches
overview of the experimental methodology.
for Urdu SA on a multi-class dataset to establish baseline
results. They manually annotated the Urdu dataset compris-
ing 9,312 reviews into positive, negative, and neutral classes. A. PREPROCESSING MODULE
Text representation schemes of N-gram, FastText, and BERT The following preprocessing steps are incorporated in this
word embeddings were adopted in the study. The proposed study:
fine-tuned multilingual BERT outperformed other baseline r Removal of sentences with complete English language in
classifiers used in the research, achieving an F1-score of UCL dataset.
81.49%. r Conversion of text to lower case.
r Removal of digits, special characters, and punctuations.
r Eliminating ASCII (American Standard Code for Infor-
A. RESEARCH GAP mation Interchange) control characters and HTML (Hy-
Prior research focusing on resource-deprived Roman Urdu pertext Markup Language) tags.
language shows a notable absence of extensive application r Stop-words Removal
of ensemble approaches and feature selection for sentiment – Common words occurring in the text, both Roman
polarity classification. This study seeks to address the gap by Urdu and English such as “the”, “is”, “in”, “for”, “to”,
presenting a broader approach that employs machine learning “at” etc., which are articles and prepositions that add
no useful meaning to the sentiments were removed primarily groups Roman Urdu words with different varia-
from the text. Additionally, we compiled a separate tions in spellings conveying similar meanings. Prioritizing
list of stop words specific to Roman Urdu comprising word variations as the primary criterion, we also considered
terms like “aur”, “ap”, “hun”, and “hai”. phonetic similarity (e.g., “acha” and “accha”, both meaning
“good” in English) and contextual relevance (e.g., “zabardast”
and “bekaar”), ensuring that the standardized forms accu-
B. STANDARDIZATION OF ROMAN URDU TEXT rately reflect semantic consistency. The pseudocode of the
Roman Urdu is the Latin script of Urdu. To date as per standardization is given in Algorithm 1. The program system-
our knowledge, there is no standardization of Roman Urdu atically replaces each variation in the UCL dataset with its
script, resulting in multiple variations of the same word due standardized counterpart using a mapping dictionary and reg-
to differences in spelling. Consider the example of the word ular expressions. For example, variations of the Roman Urdu
“bekaar”, which translates to ‘useless’ in English. The word word “bakwaas” which translates to (rubbish) in English, like
can appear in various forms such as “bekaaar”, or “bekar”, “bakwaaas”, “bakwaass”, and “bakwass” are systematically
all conveying the same negative sentiment but with different mapped to “bakwaas”. Similarly, variations of “jhoota” (liar)
spellings. These variations can lead to inconsistency in sen- such as “jhota”, “jhooty”, and “jhooti” are replaced with
timent analysis (SA) results, as the algorithms employed for “jhoota”. To standardize the text in the IMDB dataset, lemma-
SA may not efficiently interpret and identify these variations. tization was performed to reduce words to their root forms
To address this challenge we devise a Python program to using the Natural Language Toolkit [30].
standardize these variations of words.
Our approach involves creating a mapping dictionary that C. FEATURE EXTRACTION
pairs each standard Roman Urdu word with its variations. In SA, feature extraction and representation using feature
The mapping dictionary, comprising 2,500 entries, was man- vectors are the key steps. These vectors enable the acquisi-
ually curated from the vocabulary of the UCL dataset. It tion of classification models to determine labels for unseen
4) RANDOM SUBSPACE
Random subspace method [45], is an ensemble technique that are tested for the ensemble i.e., 10, 20, and 50. In the stack-
employs feature based partitioning to acquire diversity in the ing ensemble, the hyperparameter final_estimator__C
base learners. Multiple base learners are trained on randomly presents the regularization strength of the meta-classifier (Lo-
selected feature subspaces and then combined in this ensem- gistic Regression) used for combining the predictions of the
ble scheme to improve the predictive performance. The weak base learners. We experimented with three values: 0.1, 1.0,
base classifiers trained on random samples of the feature space and 10.0. For AdaBoost, the n_estimators hyperparam-
are combined to produce a robust classifier similar to bagging. eter controls the maximum number of base estimators to
This study employs SVM, LR, NB, and RF as base estimators be sequentially trained. The AdaBoost ensemble classifier is
for random subspace ensemble. evaluated on values: 50, 100, and 150.
5) VOTING
IV. EXPERIMENT
Voting is another ensemble learning approach in which in-
This section outlines the details of the datasets adopted in
stances undergo classification by multiple learning algorithms
the study and the procedure for the experiment carried out to
through combined voting. One of the fundamental voting
classify the sentiments.
schemes utilized is majority voting, where the output class
label is determined by the predictions of more than half of
the employed base classifiers. Consequently, the output of A. EXPERIMENTAL DATASET
the ensemble is based on the class having the most votes. In In this research, we have evaluated the performance of the pro-
this study, we employed the majority voting scheme along- posed architecture on two datasets with statistics depicted in
side hard voting, a specific instance of majority voting where Table 4. The experiment was performed on only positive and
each classifier in the ensemble contributes equally to the fi- negative sentiments from both the UCL and IMDB datasets
nal decision. Hard voting aggregates the predictions made by to ensure consistency across both datasets and enhance model
each classifier and selects the class label with the most votes. performance in binary sentiment classification. Moreover, the
This approach offers increased robustness, improved gener- adopted approach avoids data imbalance in the UCL dataset
alization, and reduction in overfitting based on a collective with neutral reviews having a significantly different distribu-
decision-making process. tion compared to positive and negative reviews. The details of
the datasets are presented in the following subsections.
H. HYPERPARAMETER OPTIMIZATION
The grid search technique is adopted for hyperparameter op- 1) UCL ROMAN URDU DATASET
timization of ensemble classifiers. The hyperparameter tuning
The UCL Roman Urdu corpus comprises 20,228 sentences
is performed on configurations obtained by combining feature
that are further categorized into three classes: positive, nega-
extraction schemes (BOW and TF-IDF), N-gram feature sets,
tive, and neutral [15], [46]. The complete dataset comprises
and Chi-square feature selection. Predefined sets of hyperpa-
6,013 positive sentences, 5,286 negative sentences, and 8,929
rameters are utilized to conduct the grid searches to identify
neutral sentences.
the optimal values. The systematic approach assists in fine-
tuning the models effectively and improving their predictive
capabilities on both the UCL and IMDB datasets. Table 3 2) IMDB DATASET
summarizes the hyperparameters for each ensemble method The 50K IMDB movie reviews is a sentiment analysis dataset
used in the experiment analysis. Bagging and random sub- used in the experiment analysis comprising English text only
space n_estimators hyperparameter determines the num- and obtained from the Kaggle platform [47]. The dataset is
ber of base estimators in the ensemble. Three different values labeled and has 50,000 records with two attributes i.e., review
and sentiment. The acquired dataset is balanced with 25,000 The tailored approach to feature selection enhances the clas-
sentiments having ‘positive’ polarity and 25,000 having ‘neg- sification performance capturing the informative and relevant
ative’ polarity. features to train the models leading to optimal accuracy re-
sults. The base classifiers are trained on the selected features
of the training sets. The ensemble classifiers leverage the
B. EXPERIMENTAL PROCEDURE insights derived from the trained base classifiers to enhance
To establish robust models for classifying sentiment polarity, classification performance and predict sentiment labels on the
feature extraction schemes of BOW and TF-IDF, utilizing testing set. The performance metrics of accuracy and the F1
N-grams, are applied to both the training and testing sets. Both score are adopted to evaluate the classification performance of
UCL and IMDB datasets are split into training and testing the ensemble classifiers.
sets with an 80:20 ratio, ensuring a representative distribution Hyperparameter optimization is performed using grid
of classes in both sets. The chi-square test is used to select search with 10-fold cross validation to fine-tune the param-
the most informative and discriminative features across vari- eters of each ensemble classifier on the UCL and IMDB
ous N-gram configurations to reduce the word-level N-grams datasets. The optimal value of the n_estimators hyper-
in the UCL and IMDB datasets. In this research, a series parameter, set to 50, gives the best classification results for
of experiments were conducted using features ranging from both bagging and random subspace methods. For the stacking
the top 200 to 4000 words based on significant Chi-square classifier, the optimal value for the final_estimator__C
test score values for the UCL dataset, and from the top 500 hyperparameter is determined to be 0.1. Additionally, the Ad-
to 6000 words for the IMDB dataset. For optimal results, aBoost classifier achieves optimal performance with a setting
the top 1000 and 4000 word-level features are selected for of 150 for the n_estimators hyperparameter. The major-
the UCL and IMDB datasets, respectively, based on signif- ity voting aggregates the predictions from LR, RF, SVM, and
icant Chi-square test score values on the training sets. The NB classifiers to produce ensemble predictions. All the exper-
difference in the selected features reflects the distinct char- iments are conducted within the Google Colaboratory (Colab)
acteristics of each dataset. The large IMDB dataset has a notebook environment, leveraging scikit learn packages for
more diverse vocabulary size compared to the UCL dataset. machine learning classifiers.
TABLE 8. Classification Accuracy Results of Ensemble Algorithms on IMDB Dataset with (a) Unigrams, Bigrams, and trigrams (b) Unigram + Bigram
Feature Set
TABLE 9. Comparison of the Proposed Approach Performance With Closely Related Works
with the Chi-square feature selection employed on a combi- consistently yield strong results, particularly with TF-IDF
nation of N-gram features (i.e., unigram + bigram) compared representation, showcasing the effectiveness of the feature
to no feature selection on the same combinations. NB achieves extraction technique in capturing meaningful textual informa-
the highest accuracy of 79.17% using Chi-square feature se- tion. Boosting (AdaBoost) demonstrates comparatively lower
lection on the merge of unigram, and bigram features with performance across the UCL dataset, suggesting limitations
the TF-IDF weighting scheme. LR achieves comparatively in effectively leveraging weak learners to boost overall per-
better performance than SVM with an accuracy of 78.53% formance.
on the feature set of merged unigram, and bigram with the The consistent performance of the stacking ensemble is
BOW scheme and using Chi-square feature selection. Similar also observed in the IMDB dataset as shown in Table 8,
improved results of the base classifiers can also be observed achieving the highest accuracy of 90.92%. This significant
in the IMDB corpus on top 4000 N-grams selected using performance is achieved with the combination of unigram and
Chi-square. SVM demonstrates high performance on IMDB, bigram features (i.e., unigram + bigram) and TF-IDF repre-
achieving an accuracy of 89.91%, utilizing features selected sentation using Chi-square top k = 4000 features. Similar to
through Chi-square (top k = 4000) on the combined feature the UCL dataset, random subspace with SVM as base esti-
set of unigram, and bigram with the TF-IDF text representa- mator achieves a second high accuracy of 90.74% on merged
tion scheme. unigram and bigram features with the TF-IDF scheme. Ma-
jority voting, which incorporates all base classifiers, exhibits
B. PERFORMANCE OF THE PROPOSED ENSEMBLE competitive accuracy results ranging from 89.29% to 89.79%.
CLASSIFIERS Across bagging and random subspace methods, LR con-
Table 7 presents detailed classification results for the UCL sistently demonstrates strong performance, particularly with
Roman Urdu dataset in terms of accuracy. The results are TF-IDF representation. Considering the performance of text
obtained by incorporating ensemble algorithms which are fur- representation schemes on both datasets, BOWs generally
ther optimized through grid search. The stacking ensemble produce less accuracy as compared to TF-IDF. However,
approach gives consistent high performance across multiple both techniques showcase consistent trends across ensemble
combinations on UCL, achieving an impressive accuracy of methods. TF-IDF representation tends to capture the semantic
80.30%. The performance is attained using the combined un- relevance of words more effectively, resulting in higher clas-
igram and bigram feature set (i.e., unigram + bigram) with sification accuracy across various ensemble techniques.
TF-IDF representation and Chi-square top k = 1000 features. The stacking ensemble method achieves the highest F1-
Notably, stacking leverages all base classifiers, including LR, score of 81.76% on the UCL dataset and 91.12% on the IMDB
SVM, NB, and RF, with logistic regression serving as the dataset, utilizing the combination of unigram and bigram fea-
meta-classifier. Random subspace with SVM as base estima- ture set (i.e., unigram + bigram) with the TF-IDF scheme as
tor achieves the second best accuracy of 80.22% on merged depicted in Fig. 4.
unigram and bigram features and TF-IDF weighting scheme.
Majority voting, another ensemble technique employing all C. COMPARATIVE EVALUATION
base classifiers, also demonstrates competitive performance, The comparison of our proposed approach on both the
achieving accuracies ranging from 55.77% to 79.46%. Across datasets (i.e., Roman Urdu UCL and IMDB) with other
both bagging and random subspace methods, LR and NB baseline algorithms is presented in Table 9. The comparison
analysis shows that our proposed ensemble classifiers espe- potential in addressing the challenges of sentiment analysis
cially stacking outperform other existing techniques, achiev- in low-resource languages, offering insights applicable to di-
ing a high accuracy and F1-score of 80.30% and 81.76%, verse linguistic domains like sentiment emotion classification,
respectively, on the UCL dataset. Similarly, our proposed ap- sarcasm detection, and fake news identification.
proach also maintains a superior performance on the IMDB
dataset, achieving a remarkable accuracy of 90.92% and an
impressive F1-score of 91.12%. REFERENCES
[1] A. Lytos, T. Lagkas, P. Sarigiannidis, and K. Bontcheva, “The evolution
of argumentation mining: From models to social media and emerging
VI. CONCLUSION AND FUTURE DIRECTIONS tools,” Inf. Process. Manage., vol. 56, no. 6, 2019, Art. no. 102055.
This research provides valuable insights into the performance [2] M. Al-Smadi, M. Al-Ayyoub, Y. Jararweh, and O. Qawasmeh, “En-
hancing aspect-based sentiment analysis of arabic hotels’ reviews using
of ensemble learning techniques for sentiment analysis focus- morphological, syntactic and semantic features,” Inf. Process. Manage.,
ing on low-resource Roman Urdu UCL and IMDB datasets. A vol. 56, no. 2, pp. 308–319, 2019.
standardization approach was implemented to normalize the [3] S. Al-Dabet, S. Tedmori, and A.-S. Mohammad, “Enhancing arabic
aspect-based sentiment analysis using deep learning models,” Comput.
Roman Urdu dataset, reducing complexity by standardizing Speech Lang., vol. 69, 2021, Art. no. 101224.
word variations. [4] O. Araque, G. Zhu, and C. A. Iglesias, “A semantic similarity-based
Our study demonstrates the effectiveness of the stacking en- perspective of affect lexicons for sentiment analysis,” Knowl.-Based
Syst., vol. 165, pp. 346–359, 2019.
semble method in classifying sentiments in both Roman Urdu [5] A. Kumar, K. Srinivasan, W.-H. Cheng, and A. Y. Zomaya, “Hybrid
and English, achieving maximum accuracy of 80.30% and context enriched deep learning model for fine-grained sentiment anal-
90.92%, respectively. These empirical findings underscore the ysis in textual and visual semiotic modality social data,” Inf. Process.
Manage., vol. 57, no. 1, 2020, Art. no. 102141.
robustness and versatility of ensemble techniques in senti- [6] B. Zhang, X. Xu, X. Li, X. Chen, Y. Ye, and Z. Wang, “Sentiment analy-
ment analysis tasks across diverse datasets and languages. sis through critic learning for optimizing convolutional neural networks
Furthermore, our study highlights ensemble learning’s with rules,” Neurocomputing, vol. 356, pp. 21–30, 2019.
[7] V. K. Vijayan, K. Bindu, and L. Parameswaran, “A comprehensive study [30] E. Loper and S. Bird, “NLTK: The natural language toolkit,” in Proc.
of text classification algorithms,” in Proc. 2017 Int. Conf. Adv. Comput. ACL Workshop Effective Tools Methodol. Teach. Natural Lang. Process.
Commun. Inform., 2017, pp. 1109–1113. Comput. Linguistics, 2002, pp. 63–70.
[8] X. Wang, J. Wang, Y. Yang, and J. Duan, “Labeled LDA-kernel SVM: [31] P. C. Lane, D. Clarke, and P. Hender, “On developing robust models for
A short chinese text supervised classification based on sina weibo,” in favourability analysis: Model choice, feature sets and imbalanced data,”
Proc. 4th Int. Conf. Inf. Sci. Control Eng., 2017, pp. 428–432. Decis. Support Syst., vol. 53, no. 4, pp. 712–718, 2012.
[9] M. S. Haydar, M. Al Helal, and S. A. Hossain, “Sentiment extraction [32] G. Hackeling, Mastering Machine Learning With Scikit-Learn. Birm-
from Bangla text: A character level supervised recurrent neural network ingham, U.K.: Packt Publishing Ltd., 2017.
approach,” in Proc. 2018 Int. Conf. Comput. Commun. Chem. Mater. [33] W. Zhang, T. Yoshida, and X. Tang, “A comparative study of TF*IDF,
Electron. Eng., 2018, pp. 1–4. LSI and multi-words for text classification,” Expert Syst. Appl.,
[10] A. Yadav and D. K. Vishwakarma, “Sentiment analysis using deep vol. 38, no. 3, pp. 2758–2765, 2011. [Online]. Available: https://www.
learning architectures: A review,” Artif. Intell. Rev., vol. 53, no. 6, sciencedirect.com/science/article/pii/S0957417410008626
pp. 4335–4385, 2020. [34] A. Madasu and S. Elango, “Efficient feature selection techniques for
[11] A. Mahmoud and M. Zrigui, “Deep neural network models for para- sentiment analysis,” Multimedia Tools Appl., vol. 79, pp. 6313–6335,
phrased text classification in the Arabic language,” in Proc. 24th Int. 2020.
Conf. Appl. Natural Lang. to Inf. Syst., Salford, U.K., Springer, 2019, [35] U. Ali et al., “Automatic cancerous tissue classification using discrete
pp. 3–16. wavelet transformation and support vector machine,” J. Basic Appl. Sci.
[12] R. Pascanu, T. Mikolov, and Y. Bengio, “On the difficulty of training Res., vol. 6, no. 7, pp. 15–23, 2016.
recurrent neural networks,” in Proc. Int. Conf. Mach. Learn., PMLR, [36] B. Gaye, D. Zhang, and A. Wulamu, “A tweet sentiment classification
2013, pp. 1310–1318. approach using a hybrid stacked ensemble technique,” Information,
[13] K. Mehmood, D. Essam, and K. Shafi, “Sentiment analysis system for vol. 12, no. 9, 2021, Art. no. 374. [Online]. Available: https://www.
Roman Urdu,” in Proc. 2018 Comput. Conf. Intell. Comput., Springer, mdpi.com/2078-2489/12/9/374
2019, pp. 29–42. [37] G. Singh, B. Kumar, L. Gaur, and A. Tyagi, “Comparison between
[14] F. Noor, M. Bakhtyar, and J. Baber, “Sentiment analysis in e-commerce multinomial and Bernoulli Naïve Bayes for text classification,” in Proc.
using SVM on Roman Urdu text,” in Proc. 2nd Int. Conf. Emerg. 2019 Int. Conf. Automat. Comput. Technol. Manage., 2019, pp. 593–
Technol. Comput., London, U.K., Springer, 2019, pp. 213–222. 596.
[15] Z. Mahmood et al., “Deep sentiments in roman urdu text using recurrent [38] A. M. Kibriya, E. Frank, B. Pfahringer, and G. Holmes, “Multinomial
convolutional neural network model,” Inf. Process. Manage., vol. 57, Naive Bayes for text categorization revisited,” in Proc. 2004 Adv. Artif.
no. 4, 2020, Art. no. 102233. Intell., 2004, pp. 488–499.
[16] B. Chandio et al., “Sentiment analysis of roman urdu on e-commerce [39] H. Chen, L. Wu, J. Chen, W. Lu, and J. Ding, “A comparative study
reviews using machine learning,” CMES-Comput. Model. Eng. Sci., of automated legal text classification using random forests and deep
vol. 131, pp. 1263–1287, 2022. learning,” Inf. Process. Manage., vol. 59, no. 2, 2022, Art. no. 102798.
[17] F. Mehmood, M. U. Ghani, M. A. Ibrahim, R. Shahzadi, W. Mah- [40] P. Jiang, H. Wu, J. Wei, F. Sang, X. Sun, and Z. Lu, “RF-DYMHC:
mood, and M. N. Asim, “A precisely xtreme-multi channel hybrid Detecting the yeast meiotic recombination hotspots and coldspots by
approach for Roman Urdu sentiment analysis,” IEEE Access, vol. 8, random forest model using gapped dinucleotide composition features,”
pp. 192740–192759, 2020. Nucleic Acids Res., vol. 35, no. suppl_2, pp. W47–W51, 2007.
[18] L. Khan, A. Amjad, K. M. Afaq, and H.-T. Chang, “Deep senti- [41] J. Kazmaier and J. H. van Vuuren, “The power of ensemble learning in
ment analysis using CNN-LSTM architecture of English and Roman sentiment analysis,” Expert Syst. Appl., vol. 187, 2022, Art. no. 115819.
Urdu text shared in social media,” Appl. Sci., vol. 12, no. 5, 2022, [42] L. Breiman, “Bagging predictors,” Mach. Learn., vol. 24, no. 2,
Art. no. 2694. pp. 123–140, 1996.
[19] B. A. Chandio, A. S. Imran, M. Bakhtyar, S. M. Daudpota, and J. Baber, [43] R. E. Schapire, “The strength of weak learnability,” Mach. Learn.,
“Attention-based RU-BiLSTM sentiment analysis model for Roman vol. 5, no. 2, pp. 197–227, 1990.
Urdu,” Appl. Sci., vol. 12, no. 7, 2022, Art. no. 3641. [44] D. H. Wolpert, “Stacked generalization,” Neural Netw., vol. 5, no. 2,
[20] A. J. Dueppen, M. L. Bellon-Harn, N. Radhakrishnan, and V. Mancha- pp. 241–259, 1992.
iah, “Quality and readability of English-language internet information [45] T. K. Ho, “The random subspace method for constructing decision
for voice disorders,” J. Voice, vol. 33, no. 3, pp. 290–296, 2019. forests,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 20, no. 8,
[21] M. A. Qureshi et al., “Sentiment analysis of reviews in natural language: pp. 832–844, Aug. 1998.
Roman Urdu as a case study,” IEEE Access, vol. 10, pp. 24945–24954, [46] Z. Sharf and S. U. Rahman, “Performing natural language processing on
2022. roman Urdu datasets,” Int. J. Comput. Sci. Netw. Secur., vol. 18, no. 1,
[22] M. Z. Asghar, A. Sattar, A. Khan, A. Ali, F. Masud Kundi, and S. pp. 141–148, 2018.
Ahmad, “Creating sentiment lexicon for sentiment analysis in Urdu: [47] A. Maas, R. E. Daly, P. T. Pham, D. Huang, A. Y. Ng, and C.
The case of a resource-poor language,” Expert Syst., vol. 36, no. 3, 2019, Potts, “Learning word vectors for sentiment analysis,” in Proc. 49th
Art. no. e12397. Annu. Meeting Assoc. Comput. Linguistics: Hum. Lang. Technol., 2011,
[23] D. Alessia, F. Ferri, P. Grifoni, and T. Guzzo, “Approaches, tools and pp. 142–150.
applications for sentiment analysis implementation,” Int. J. Comput. [48] M. Ghorbani, M. Bahaghighat, Q. Xin, and F. Özen, “ConvLSTMConv
Appl., vol. 125, no. 3, pp. 26–33, 2015. network: A deep learning approach for sentiment analysis in cloud
[24] X. Xie, S. Ge, F. Hu, M. Xie, and N. Jiang, “An improved algorithm for computing,” J. Cloud Comput., vol. 9, no. 1, pp. 1–12, 2020.
sentiment analysis based on maximum entropy,” Soft Comput., vol. 23, [49] N. S. M. Nafis and S. Awang, “An enhanced hybrid feature selec-
no. 2, pp. 599–611, 2019. tion technique using term frequency-inverse document frequency and
[25] M. Dragoni and G. Petrucci, “A fuzzy-based strategy for multi-domain support vector machine-recursive feature elimination for sentiment clas-
sentiment analysis,” Int. J. Approx. Reasoning, vol. 93, pp. 59–73, sification,” IEEE Access, vol. 9, pp. 52177–52192, 2021.
2018.
[26] B. Pang and L. Lee, “A sentimental education: Sentiment analysis using MUHAMMAD EHTISHAM HASSAN received
subjectivity summarization based on minimum cuts,” in Proc. 42nd the M.S. degree in engineering management from
Annu. Meeting Assoc. Comput. Linguistics, 2004, pp. 271–278. NUST-EME, Rawalpindi, Pakistan, in 2018. He
[27] C.-L. Liu, W.-H. Hsaio, C.-H. Lee, G.-C. Lu, and E. Jou, “Movie is currently working in the capacity of Doctoral
rating and review summarization in mobile environment,” IEEE Trans. Researcher with Ghulam Ishaq Khan Institute
Syst., Man, Cybern., Part C (Appl. Rev.), vol. 42, no. 3, pp. 397–407, (GIKI), Khyber Pakhtunkhwa, Pakistan, where he
May 2012. is also with the Department of Data Science and a
[28] L. Khan, A. Amjad, N. Ashraf, and H.-T. Chang, “Multi-class sentiment part with Data Engineering Management Analysis
analysis of Urdu text using multilingual BERT,” Sci. Rep., vol. 12, no. 1, (DEMA) Research Group. He is the Coordina-
2022, Art. no. 5436. tor with the International Collegiate Programming
[29] A. Jain and V. Jain, “Efficient framework for sentiment classification Contest (ICPC), Asia Topi Region. His research in-
using apriori based feature reduction,” EAI Endorsed Trans. Scalable terests include natural language processing, machine learning, deep learning,
Inf. Syst., vol. 8, no. 31, 2021, Art. no. e3. large language models (LLMs), and data science.