DAQAS - Deep Arabic Question Answering System Based On Duplicate Question Detection and Machine Reading Comprehension
DAQAS - Deep Arabic Question Answering System Based On Duplicate Question Detection and Machine Reading Comprehension
a r t i c l e i n f o a b s t r a c t
Article history: As of late, various deep learning techniques and methods have shown their superiority to feature-based
Received 14 February 2023 and shallow learning techniques in the field of open-domain question–answering systems (OpenQAS).
Revised 23 July 2023 However, only a few works adopted these techniques to build Arabic OpenQAS that can extract exact
Accepted 6 August 2023
answers from large information sources (e.g., Wikipedia). In addition, no available Arabic OpenQAS inte-
Available online 12 August 2023
grated a module to identify duplicate questions to accelerate response time and reduce computation cost.
In this paper, we propose an Arabic OpenQAS (named DAQAS) based on deep learning methods. It consists
Keywords:
of three components: (1) Dense Duplicate Question Detection which returns answers to questions that
Question answering systems
Neural networks
already have been answered; (2) Retriever based on BM25 and Query Expansion by neural text genera-
Transformers tion; and (3) Reader able to extract exact answers given a question and the retrieved passages that prob-
Natural language processing ably contains the answer. All components of our system integrate deep learning models, specially
Duplicate answer detection transformers-based techniques, which have scored state-of-the-art in different NLP fields. We performed
several experiments with publicly available question answering datasets to show the effectiveness of our
system. DAQAS obtained promising results and scored 21.77% Exact Match and 54.71% F1 score when
using only top 5 retrieved passages.
Ó 2023 The Author(s). Published by Elsevier B.V. on behalf of King Saud University. This is an open access
article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).
1. Introduction data such as knowledge bases. While DBQA systems aim at extract-
ing answers from unstructured data such as raw textual docu-
Question Answering (QA) systems aim to provide precise ments. Typically, a DBQA can be designed to answer restricted-
answers for users’ questions. Building a QA is challenging since it or open-domain questions. Restricted domain QA systems concen-
needs to find answers from a huge amount of both structured trate on a specialized field and employ particular linguistic
and unstructured data. Approaches to deal with QA can be consid- resources to enhance the system’s performance. Open domain QA
ered in two broad classes according to the answers’ source; to be systems rely on huge amount of text from the web or collections
explicit, Knowledge-Based Question Answering (KBQA) and of documents like Wikipedia to answer questions from varied
Document-Based Question Answering (DBQA). The KBQA systems areas. Traditional open domain DBQA systems were commonly
apply specific techniques able to extract answers from structured built as pipeline, comprising various modules like question pro-
cessing, passage retrieval, and answer processing (Abouenour
⇑ Corresponding author. et al., 2012; Kurdi et al., 2014; Bekhti and Al-Harbi, 2013; Hamza
E-mail addresses: [email protected] (H. Alami), abdelkader.mah- et al., 2021). With the recent advances in neural networks tech-
[email protected] (A. El Mahdaouy), [email protected] (A. niques, modern open-domain DBQA systems (OpenQAS) follow a
Benlahbib), [email protected] (N. En-Nahnahi), ismail.berra- new structure. This latter combine traditional Information Retrie-
[email protected] (I. Berrada), [email protected] (S.E.A. Ouatik). val (IR) methods with machine reading comprehension (MRC)
Peer review under responsibility of King Saud University. models. The objective of MRC is to construct models that are able
to read a passage of text and answer comprehension questions
(Zeng et al., 2020; Mozannar et al., 2019). Fig. 1 illustrates the
new structure of OpenAQS. The advancements in OpenQAS in Eng-
Production and hosting by Elsevier lish and some Latin-based languages are highly promising. This is
https://doi.org/10.1016/j.jksuci.2023.101709
1319-1578/Ó 2023 The Author(s). Published by Elsevier B.V. on behalf of King Saud University.
This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).
H. Alami, A. El Mahdaouy, A. Benlahbib et al. Journal of King Saud University – Computer and Information Sciences 35 (2023) 101709
Fig. 1. New structure of open-domain DBQA. The rest of the paper is organized as follows: Section 2 discusses
the related work in the field of Arabic question answering system.
Section 3 describes DAQAS our proposed Arabic open domain ques-
tion answering system. Section 4 presents the obtained results and
due to the introduction and the usage of deep learning techniques, performance evaluations of the proposed system. Section 6 con-
namely transformers (Vaswani et al., 2017), BERT (Devlin et al., cludes and provides future works.
2018), GPT-X (Radford et al., 2018, 2020), T5 (Raffel et al., 2020),
in the field of natural language processing. Despite that, few works
related to Arabic OpenQAS integrated neural network techniques
2. Related work
in their pipeline to extract exact answers to natural Arabic ques-
tions (Mozannar et al., 2019; Malhas and Elsayed, 2022). In this
Various Arabic QAS have been proposed in the literature. In the
paper, we propose an Arabic OpenQAS based on neural network
next paragraphs, we describe various deep learning text represen-
approaches. It comprises three main modules, including 1) Dense
tations that reshaped the NLP field along with existing end-to-end
Duplicate Question Detection, 2) Retriever, and 3) Reader. First,
Arabic QAS.
we build a Dense Duplicate Question Detection module to answer
previously answered questions. This module gives the system the
ability to answer as quickly as possible previously answered ques-
tions (Hamza et al., 2020). Also, this component is designed to 2.1. Deep learning text representation
index a large amount of previously answered questions, i.e., able
to store a huge number of exact answers. As far as we know, we Recently, impressive progress has been made in various natural
are the first to propose a dense duplicate question detection mod- language processing tasks such as machine translation, text classi-
ule in Arabic OpenQAS. Second, following the architecture of neural fication, and OpenQAS. Numerous factors contributed to these suc-
OpenQAS (Retriever + Reader), we build a Retriever that aims at cesses, including 1) the advances in computing resources and
retrieving relevant passages to a given question. We use Arabic materials; 2) the development of neural network-based methods
Wikipedia as an information source. At its core this component is that significantly surpass the performance of previous techniques;
an IR system, queries are constructed from questions, then we 3) the availability of large volume data designed to train these sys-
expand each query with a context generated by neural generation tems; 4) the advancement in neural word representations which
techniques (AraGPT2-base, AraGPT2-large (Antoun et al., 2021), allow mapping words from their textual representation into a con-
and mT5-small (Xue et al., 2020)). The new query is then passed tinuous and distributed vector representation. Considering that the
to a BM25 retrieval model to retrieve top k relevant passages. Third word representation is one of the main factor in these advance-
and last, an extractive model is trained to predict the answer span ments, we discuss the most known word representations including
(i.e., the start and end of an answer) from retrieved passages. We Word2Vec(Mikolov et al., 2013; Mikolov et al., 2013), ELMo (Peters
fine-tuned different BERT-based encoders (Devlin et al., 2018; et al., 2018), BERT (Devlin et al., 2018), GPT-X models (Radford
Antoun et al., 2020; Lan et al., 2020; Abdul-Mageed et al., 2021), et al., 2019; Brown et al., 2020), T5 (Raffel et al., 2020) in the
which are proposed for the Arabic language, with the objectif of following:
answer span prediction. The final answer is then chosen according
to the retriever and reader scores. To evaluate our proposed sys-
tem, we build a large dataset for Arabic OpenQA by combining 2.1.1. Word2Vec
existing datasets namely ARCD (Mozannar et al., 2019), Arabic These representations capture a large number of syntactic and
SQuAD (Mozannar et al., 2019), MLQA (Lewis et al., 2020), XQuAD semantic relationships between words, and the size of the gener-
(Artetxe et al., 2020), and TyDi QA (Clark et al., 2020). We per- ated vectors is generally between 100–300. The idea behind this
formed various evaluations on each module to show the effective- model is that each word representation is extracted from its con-
ness of our system. The proposed system (DAQAS) obtained text (past words, future words). The authors (Mikolov et al.,
promising results and scored 21.77% Exact Match and 54.71% F1 2013; Mikolov et al., 2013) proposed two architectures (1) CBOW;
score when using only top 5 retrieved passages. (2) Continuous Skip-gram.
2
H. Alami, A. El Mahdaouy, A. Benlahbib et al. Journal of King Saud University – Computer and Information Sciences 35 (2023) 101709
1X V
log ½Pðwt jwtc ; . . . ; wt1 ; wtþ1 ; . . . ; wtþc Þ ð1Þ
V t¼1 Fig. 2. The overall architecture of Embeddings from Language Models.
1 2
https://sourceforge.net/projects/jirs/. http://users.dsic.upv.es/ybenajiba/downloads.html.
4
H. Alami, A. El Mahdaouy, A. Benlahbib et al. Journal of King Saud University – Computer and Information Sciences 35 (2023) 101709
Fig. 5. A diagram of T5 framework. It generate for every task (text classification, question answering, and machine translation) we consider text. Thus, treating every text
processing problem as text-to-text problem.
choices. IDRAAQ achieved a precision of 0,13 and c@1 equals 0,21. grates partial matching to evaluate performance over both multi-
The c@1 is a simple measure to assess non-response introduced in answer and single-answer questions. To evaluate the proposed sys-
(Peñas and Rodrigo, 2011). tem, they used two experimental setup with the QRCD dataset: 1)
JAWEB (Kurdi et al., 2014) is based on AQUASYS (Bekhti and Al- splitting the dataset in 75% training set and 25% testing set; 2) per-
Harbi, 2013). It consists of four modules: 1) User Interface; 2) form 5-fold Cross Validation (CV). The proposed system outper-
Question Analyzer; 3) Passage Retrieval; and 4) Answer Extractor. formed the baseline fine-tuned AraBERT reader system by 6.12
The system handles questions that expect named entities answers and 3.75 points according to partial average precision scores, in
(person, location, time, etc.). As an extension to AQUASYS, JAWEB the train-test split and CV setups, respectively.
provides a web interface to the system, which is an additional sup- We summarize the characteristic of the aforementioned sys-
port for Arabic language presentation in web browsers. The system tems in Table 1. It is worth noting that only one Arabic OpenQAS
was tested with an expansion of the corpus presented in (Bekhti adapted a neural approach.
and Al-Harbi, 2013). JAWEB scored 100% recall and 80% precision.
LEMAZA (Azmi and Alshenaifi, 2017) is built to answer Arabic
why-questions. It is composed of four components, including 3. Proposed system
transforming the input question into a query; preprocessing the
documents collection with the same method used for why- In this section, we present the main modules of the DAQAS sys-
questions; retrieve candidate passages related to the input ques- tem, including Dense Duplicate Question Detection module, Retrie-
tion, and extracting the answer. The system applied the Rhetorical ver module, and Reader module. The dense duplicate question
Structure Theory to extract answers. The experiments were con- detection module aims at searching if a question has a duplicate.
ducted on 110 Why-questions using 700 documents compiled If a duplicate exists, the system returns the answer to the duplicate
from open-source Arabic corpora. LEMAZA achieved about 72.7% question; otherwise, the system should pass the question to the
Recall, 79.2% Precision, and 78.7% c@1. SOQAL (Mozannar et al., retriever module that retrieves relevant passages to the question
2019) is the first Arabic OpenQAS that adopted a neural approach. using Arabic Wikipedia as information source. Finally, the reader
It is composed of two main components: retriever and reader. The module extracts the answer from relevant passages. The flowchart
retriever aims at retrieving spans of text that are most related to of our Arabic OpenQAS is depicted in Fig. 6.
the user’s questions. It uses hierarchical TF-IDF to retrieve first a
set of documents related to the question and then extract passages
that most probably contain the answer. The reader is a neural read-
ing comprehension model based on BERT. It is trained to extract Table 1
A summary of existing Arabic OpenQAS. The table presents for each system its name,
the answer given the question and the passage that likely contains
adopted approach, number of test questions, and the information source used.
the answer. To evaluate the system, Arabic Wikipedia was used as
the information source. In addition, the authors proposed two new System Approach # Test Information Source
Questions
datasets, including 1) ARCD which contain 1395 open-domain
questions, it constructed based on Arabic Wikipedia articles; 2) QARAB (Hammo Traditional 113 Al-Raya newspaper
Arabic SQuAD, which is based on the translation of the SQuAD et al., 2002)
ArabiQA (Benajiba Traditional 200 11,000 Arabic Wikipedia’s
dataset proposed in (Rajpurkar et al., 2016). SOQAL achieved when et al., 2007b) articles
considering top 5 answers 20,7%, 42,5%, and 51,7% in exact match QASAL (Brini et al., Traditional 43 Google search engine
score, f1 score, and sentence match score, respectively. 2009)
The authors in (Malhas and Elsayed, 2022) built the first AQUASYS (Bekhti Traditional 80 150,000 tagged tokens
and Al-Harbi, (ANERcorp + ANERgazet)
machine reading comprehension on the Holy Qur’an. It a restricted
2013)
domain system which aims to extract an answer given a Qur’anic IDRAAQ (Abouenour Traditional 160 QA4MRE@CLEF 2012 test
passage a question in modern standard Arabic. They introduced et al., 2012) documents
CL-AraBERT which is an AraBERT model pre-trained with large JAWEB (Kurdi et al., Traditional - Extension of
Classical Arabic dataset. They leveraged cross-lingual transfer 2014) (ANERcorp + ANERgazet)
LEMAZA (Azmi and Traditional 110 700 documents from open
learning by fine-tuning CL-AraBERT with Arabic SQuAD and ARCD Alshenaifi, 2017) source Arabic corpora
(Mozannar et al., 2019) prior ro fine-tuning the model with QRCD. SOQAL (Mozannar Neural 702 Arabic Wikipedia
The latter is a new dataset proposed by the authors for extractive et al., 2019)
machine reading comprehension on the Holy Qur’an. Furthermore, (Malhas and Neural 348 or 5- Qur’an
Elsayed, 2022) fold CV
they introduced a new metric partial average precision that inte-
5
H. Alami, A. El Mahdaouy, A. Benlahbib et al. Journal of King Saud University – Computer and Information Sciences 35 (2023) 101709
3.1. Dense duplicate question detection module can be scaled to contain billions of questions. Given an input ques-
tion qInput , we compute its embedding using EInput and use a threshold
We propose a new Dense Duplicate Question Detection Module to retrieve the top k duplicate questions candidates that probably
inspired by dense passage retrieval method (Karpukhin et al., have the same answer as the input question. Finally, these retrieved
2020). The aim of this module is twofold. First, it reduces the questions and the posed question are passed to a BERT-based dupli-
response time for questions that already have answers stored in cate question classifier to determine a list of scores. The duplicate
an offline data source. Second, it provides trusted exact answers question candidate with the highest score is considered a duplicate
to previously known questions. To train the model, two dense input question. Thus the known answer is returned as the final
encoder EInput and EKnown are used to map input questions and answer. If no duplicate questions are detected, the system should
known questions into d dimensional vectors. These vectors consti- pass the question to the retriever module. Fig. 7 illustrates the pro-
tute the representation of the special token [CLS] computed by a cess of selecting the final duplicate question.
BERT-based model, including mBERT (Devlin et al., 2018), AraBERT
(Antoun et al., 2020), GigaBERT (Lan et al., 2020), MARBERT and
ARBERT (Abdul-Mageed et al., 2021). The similarity vector between 3.2. Retriever module
the input and known questions, q1 and q2 , is defined by the follow-
ing equation: This module aims at extracting from source documents the
most relevant passages given an input question. It consists of four
simðq1 ; q2 Þ ¼ exp EInput ðq1 Þ EKnown ðq2 Þ ð5Þ stages: source documents preparation, query formulation, query
expansion, and relevant passages retrieval.
This similarity vector is then fed to a softmax layer to predict
whether the questions pair is duplicate or not. The model is then
trained to optimize the cross-entropy loss. 3.2.1. Source documents preparation
After the training phase, we compute the embeddings of previ- We used the Arabic Wikipedia dump4 from Sept. 20, 2020, as the
ously answered questions using the EKnown encoder. These embed- source documents where the system can find answers to questions.
dings are indexed offline with FAISS (Johnson et al., 2019). FAISS3 We divided each document into several disjoint passages, which
is an extremely efficient, open-source library for similarity search serve as our elementary retrieval units. Each passage contains a text
and clustering of dense vectors, which can easily be applied to bil- block of 100 words, following (Wang et al., 2019; Karpukhin et al.,
lions of vectors. Thus, the data source that contains known questions 2020). The total number of obtained passages is 2,189,238. We pre-
3 4
https://github.com/facebookresearch/faiss. https://dumps.wikimedia.org/arwiki/20201120/.
6
H. Alami, A. El Mahdaouy, A. Benlahbib et al. Journal of King Saud University – Computer and Information Sciences 35 (2023) 101709
fixed every passage with the title of the Wikipedia document that 3.2.3. Query expansion
contains the passage and the particular token [SEP]. In order to improve the performance of the retriever, a pre-
trained language model was fine-tuned to generate new contexts
3.2.2. Query formulation and passage preprocessing relevant to a question (Mao, 2020). The contexts were expanded
We present the main preprocessing steps we applied for both to the initial query to add semantic information and thus facilitate
query formulation and passage preprocessing. The preprocessing retrieving relevant passages according to the query. Three contexts
pipeline includes the following steps. 1) Diacritics removal: index- were taken as the generation targets: 1) The Wikipedia page title
ing text with diacritical marks is computationally expensive since a containing the answer. Queries augmented by a valid generated
large number of words must be considered. Therefore, diacritics title have more chance to retrieve relevant passage; 2) The answer
removal is computationally highly effective. Also, noting that to the question is naturally helpful for retrieving passages that con-
retrieval is usually tolerant of ambiguity, we removed all diacritics tain the answer itself. Still, generating the answer to a question
from text (Sanderson, 1994; Darwish and Magdy, 2014); 2) Kashi- directly is challenging; thus, the performance of the retriever
das removal: kashidas are simple word elongation characters; thus may diminish according to the performance of the answer genera-
they are typically removed; 3) Letter normalization: we apply let- tor; 3) The windowed passages are the context of an answer. They
ter normalization since it decreases the vocabulary size; thus it is are extracted by taking a window of 10 words before and after the
computationally efficient; 4) Segmentation and Tokenization: Ara- answer from the full context.
bic word segmentation consists of breaking words into its prefix Three pre-trained language models were fine-tuned to generate
(es), stem, and suffix(es). Tokenization is splitting sentences into the three context discussed above. The mT5-small (Xue et al.,
tokens based on spaces and punctuations. We used Farasa seg- 2020), AraGPT2-base, and AraGPT2-large (Antoun et al., 2021)
menter (Abdelali et al., 2016) to segment and tokenize questions models. The AraGPT2 architecture follows GPT2 (Radford et al.,
and passages; 5) Stopwords removal: Stopwords carry out various 2019) architecture but pre-trained with large Arabic corpora. The
tasks in sentences. However, they are ineffective for retrieval. Ara- model optimizes the CLM objective, i.e., maximazes the probability
bic stopwords can be attached to prefixes and suffixes; therefore, of a word given the previous words in a sentence. The Eq. 6 pre-
we apply segmentation before stopwords removal. sents the CLM objective:
7
H. Alami, A. El Mahdaouy, A. Benlahbib et al. Journal of King Saud University – Computer and Information Sciences 35 (2023) 101709
Y
n Table 2
pðsÞ ¼ pðwn jw1 ; . . . ; wn1 Þ ð6Þ An example of title, answer, and windowed passage generation.
i¼1
(2019) for testing. Fig. 8 shows the size of Arabic samples from dif-
ferent datasets.
9
H. Alami, A. El Mahdaouy, A. Benlahbib et al. Journal of King Saud University – Computer and Information Sciences 35 (2023) 101709
Fig. 12. Precision, Recall and filtered ratio obtained using the third quartile of
Fig. 9. Illustration of the positions of duplicate questions in the retrieved top 5 various retrieved duplicate questions ranks as threshold.
similar questions.
Fig. 13. Precision, Recall and filtered ratio obtained using the upper limit of various
Fig. 10. Box-plots of similarity scores between input questions and their duplicate retrieved duplicate questions ranks as threshold.
questions according to their positions in the top 5 retrieved questions.
Table 5
Ratio of exact answers within retrieved passages.
% of exact answers
Top 5 Top 10 Top 15
Without Aug 25.27 31.52 35.88
Title Augmentation 27.74 36.02 41.25
Answer Augmentation 22.88 29.77 33.48
Windowed Passage Aug 23.02 29.05 33.04
Table 6
Obtained results with the full ARCD dataset as a test set according to Exact match, F1,
and sentence match scores.
Table 7
Obtained results with the ARCD test set according to to Exact match, F1, and sentence
match scores.
Table 8
DAQAS performances evaluation with ARCD test set.
Table 9
Impact of Duplicate Question Detection module on DAQAS performances.
Table 4
Precision and Recall score of the passage retrieval module with different query augmentation context.
11
H. Alami, A. El Mahdaouy, A. Benlahbib et al. Journal of King Saud University – Computer and Information Sciences 35 (2023) 101709
Table 10
Step by step example of different modules of DAQAS.
We evaluated the reader module with three main measures, We evaluate the performance of the overall DAQAS system with
namely exact match, F1 score, and sentence match. We trained var- ARCD test set, the query is expanded with the generated title only
ious BERT-based models to predict answer span with Arabic SQuAD using the AraGPT2-medium model, and the reader is based on the
(Mozannar et al., 2019), MLQA (Lewis et al., 2020), XQuAD (Artetxe GigaBERT model. The performances are evaluated with the top
et al., 2020), and TyDi QA (Clark et al., 2020) datasets. Then we 5,10, and 15 retrieved passages with the retrieved module. The
tested the models on the full ARCD (Mozannar et al., 2019) dataset results are presented in Table 8.
and the ARCD test set. Tables 6 and 7 present the obtained results. To evaluate the impact of the duplicate questions detection
These latter show that GigaBERT model achieves the best perfor- module, we randomly select 10% of selected questions from the
mances on the test set. Hence, this is the model that we will be ARCD test set. Then, we encode these questions using the known
using in our system. question encoder and index the resulting encoding with FAISS.
Fig. 17. Illustration of the wikipedia article from where the answer was extracted.
12
H. Alami, A. El Mahdaouy, A. Benlahbib et al. Journal of King Saud University – Computer and Information Sciences 35 (2023) 101709
Table 9 shows the results obtained. It is obvious that incorporating Benajiba, Y., Rosso, P., Lyhyaoui, A., 2007b. Implementation of the arabiqa question
answering system’s components. In: Proc. Workshop on Arabic Natural
a duplicate question detection module improves the performance
Language Processing, 2nd Information Communication Technologies Int.
of the overall system. Symposium, ICTIS-2007, Fez, Morroco, April, pp. 3–5.
A step by step example of different modules outputs is pre- Brini, W., Ellouze, M., Trigui, O., Mesfar, S., Belguith, L.H., Rosso, P., 2009. Factoid and
sented in Table 10. In addition, in Fig. 17 we provide an illustration definitional arabic question answering system. Post-Proc. NOOJ-2009, Tozeur,
Tunisia, June, 8–10.
of the wikipedia article from where the answer was extracted. Brown, T.B., Mann, B., Ryder, N., Subbiah, M., Kaplan, J., Dhariwal, P., Neelakantan,
A., Shyam, P., Sastry, G., Askell, A., Agarwal, S., Herbert-Voss, A., Krueger, G.,
Henighan, T., Child, R., Ramesh, A., Ziegler, D.M., Wu, J., Winter, C., Hesse, C.,
5. Conclusion and perspectives Chen, M., Sigler, E., Litwin, M., Gray, S., Chess, B., Clark, J., Berner, C., McCandlish,
S., Radford, A., Sutskever, I., Amodei, D., 2020. Language models are few-shot
learners. arXiv:2005.14165.
In this paper, we proposed a new Arabic OpenQAS DAQAS. The Clark, J.H., Palomaki, J., Nikolaev, V., Choi, E., Garrette, D., Collins, M., Kwiatkowski,
system is composed of three main components: 1) Duplicate ques- T., 2020. Tydi QA: A benchmark for information-seeking question answering in
tion detection component, which aims to search and return typologically diverse languages. Trans. Assoc. Comput. Linguistics 8, 454–470.
URL https://transacl.org/ojs/index.php/tacl/article/view/1929.
answers to duplicate input questions; 2) Retriever component, that Darwish, K., Magdy, W., 2014. Arabic information retrieval. Found. Trends Inf. Retr.
performs information retrieval to retrieve relevant passages to 7, 239–342. https://doi.org/10.1561/1500000031.
input questions; and 3) Reader component, which extracts answer Devlin, J., Chang, M.W., Lee, K., Toutanova, K., 2018. BERT: Pre-training of deep
bidirectional transformers for language understanding. arXiv preprint
span from relevant passages. As far as we know, we are the first arXiv:1810.04805.
that integrated a duplicate question detection system in the pipe- Hammo, B., Abu-Salem, H., Lytinen, S.L., Evens, M., 2002. QARAB: A: question
line of an Arabic OpenQAS. All the components applied deep learn- answering system to support the arabic language. In: Proceedings of the
Workshop on Computational Approaches to Semitic Languages, SEMITIC@ACL
ing techniques such as BERT, GPT, and T5 to improve the system’s
2002, Philadelphia, PA, USA, July 11, 2002, Association for Computational
performance. Our system combines various techniques, including Linguistics. URL: https://www.aclweb.org/anthology/W02-0507/, https://doi.
text classification, IR, Information extraction, and more. We per- org/10.3115/1118637.1118644.
Hamza, A., Alaoui Ouatik, S.E., Zidani, K.A., En-Nahnahi, N., 2020. Arabic duplicate
formed various experiments to show the effectiveness of our sys-
questions detection based on contextual representation, class label matching,
tem. Our system scored about 54.71% F1 score when retrieving and structured self attention. J. King Saud Univ.- Comput. Informat. Sci. https://
the top 5 relevant passages. Future work will aim to improve the doi.org/10.1016/j.jksuci.2020.11.032. URL: https://
retrieval and answer extraction components, hence improving www.sciencedirect.com/science/article/pii/S1319157820305735.
Hamza, A., En-Nahnahi, N., Zidani, K.A., El Alaoui Ouatik, S., 2021. An arabic question
the performance of the overall DAQAS system. classification method based on new taxonomy and continuous distributed
representation of words. J. King Saud Univ.- Comput. Infr. Sci. 33, 218–224. URL:
https://www.sciencedirect.com/science/article/pii/S1319157818308401,
Declaration of Competing Interest https://doi.org/10.1016/j.jksuci.2019.01.001.
Johnson, J., Douze, M., Jégou, H., 2019. Billion-scale similarity search with gpus. IEEE
Trans. Big Data 1–1. https://doi.org/10.1109/TBDATA.2019.2921572.
The authors declare that they have no known competing finan-
Karpukhin, V., Oguz, B., Min, S., Lewis, P., Wu, L., Edunov, S., Chen, D., Yih, W.t., 2020.
cial interests or personal relationships that could have appeared Dense passage retrieval for open-domain question answering. In: Proceedings
to influence the work reported in this paper. of the 2020 Conference on Empirical Methods in Natural Language Processing
(EMNLP), Association for Computational Linguistics, Online. pp. 6769–6781.
URL: https://www.aclweb.org/anthology/2020.emnlp-main.550, https://doi.
References org/10.18653/v1/2020.emnlp-main.550.
Kurdi, H., Alkhaider, S., Alfaifi, N., 2014. Development and evaluation of a web based
question answering system for arabic language. Comput. Sci. Infr. Technol. (CS &
Abadi, M., Agarwal, A., Barham, P., Brevdo, E., Chen, Z., Citro, C., Corrado, G.S., Davis,
IT) 4, 187–202.
A., Dean, J., Devin, M., Ghemawat, S., Goodfellow, I., Harp, A., Irving, G., Isard, M.,
Lan, W., Chen, Y., Xu, W., Ritter, A., 2020. Gigabert: Zero-shot transfer learning from
Jia, Y., Jozefowicz, R., Kaiser, L., Kudlur, M., Levenberg, J., Mane, D., Monga, R.,
english to arabic, in. In: Proceedings of The 2020 Conference on Empirical
Moore, S., Murray, D., Olah, C., 2016. Tensorflow: Large-scale machine learning
Methods on Natural Language Processing (EMNLP).
on heterogeneous distributed systems URL: https://arxiv.org/pdf/1603.04467.
Lewis, P.S.H., Oguz, B., Rinott, R., Riedel, S., Schwenk, H., 2020. MLQA: evaluating
pdf, arXiv:1603.04467.
cross-lingual extractive question answering. In: Jurafsky, D., Chai, J., Schluter,
Abdelali, A., Darwish, K., Durrani, N., Mubarak, H., 2016. Farasa: A fast and furious
N., Tetreault, J.R. (Eds.), Proceedings of the 58th Annual Meeting of the
segmenter for arabic. In: Proceedings of the 2016 Conference of the North
Association for Computational Linguistics. Association for Computational
American Chapter of the Association for Computational Linguistics:
Linguistics, pp. 7315–7330. https://doi.org/10.18653/v1/2020.acl-main.653.
Demonstrations, pp. 11–16.
Lin, C.Y., 2004. ROUGE: A package for automatic evaluation of summaries. In: Text
Abdul-Mageed, M., Elmadany, A.A., Nagoudi, E.M.B., 2021. ARBERT & MARBERT:
Summarization Branches Out, Association for Computational Linguistics,
deep bidirectional transformers for arabic. CoRR abs/2101.01785. URL: https://
Barcelona, Spain, pp. 74–81. URL: https://www.aclweb.org/anthology/W04-
arxiv.org/abs/2101.01785, arXiv:2101.01785.
1013.
Abouenour, L., Bouzoubaa, K., Rosso, P., 2012. Idraaq: New arabic question
Malhas, R., Elsayed, T., 2022. Arabic machine reading comprehension on the holy
answering system based on query expansion and passage retrieval. In: CLEF
qur’an using cl-arabert. Infr. Process. Manage. 59, 103068. URL: https://
(Online Working Notes/Labs/Workshop).
www.sciencedirect.com/science/article/pii/S0306457322001704, https://doi.
Antoun, W., Baly, F., Hajj, H., 2020. Arabert: Transformer-based model for arabic
org/10.1016/j.ipm.2022.103068.
language understanding. In: LREC 2020 Workshop Language Resources and
Mao, Y., 2020. Generation-augmented retrieval for open-domain question
Evaluation Conference 11–16 May, p. 9.
answering. CoRR abs/2009.08553. URL: https://arxiv.org/abs/2009.08553.
Antoun, W., Baly, F., Hajj, H., 2021. AraGPT2: Pre-trained transformer for Arabic
Mikolov, T., Chen, K., Corrado, G., Dean, J., 2013a. Efficient estimation of word
language generation. In: Proceedings of the Sixth Arabic Natural Language
representations in vector space. In: Bengio, Y., LeCun, Y. (Eds.), 1st International
Processing Workshop, Association for Computational Linguistics, Kyiv, Ukraine
Conference on Learning Representations, ICLR 2013, Scottsdale, Arizona, USA,
(Virtual). pp. 196–207. URL: https://www.aclweb.org/anthology/2021.wanlp-1.
May 2-4, 2013, Workshop Track Proceedings. URL: http://arxiv.org/abs/1301.
21.
3781.
Artetxe, M., Ruder, S., Yogatama, D., 2020. On the cross-lingual transferability of
Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J., 2013. Distributed
monolingual representations. In: Jurafsky, D., Chai, J., Schluter, N., Tetreault, J.R.
representations of words and phrases and their compositionality. In:
(Eds.), Proceedings of the 58th Annual Meeting of the Association for
Advances in Neural Information Processing Systems (NIPS), pp. 3111–3119.
Computational Linguistics. Association for Computational Linguistics, pp.
Mozannar, H., Maamary, E., Hajal, K.E., Hajj, H.M., 2019. Neural arabic question
4623–4637. https://doi.org/10.18653/v1/2020.acl-main.421.
answering. In: El-Hajj, W., Belguith, L.H., Bougares, F., Magdy, W., Zitouni, I.
Azmi, A.M., Alshenaifi, N.A., 2017. Lemaza: An arabic why-question answering
(Eds.), Proceedings of the Fourth Arabic Natural Language Processing Workshop,
system. Natural Lang. Eng. 23, 877–903. https://doi.org/10.1017/
WANLP@ACL 2019, Florence, Italy, August 1, 2019, Association for
S1351324917000304.
Computational Linguistics. pp. 108–118. https://doi.org/10.18653/v1/w19-
Bekhti, S., Al-Harbi, M., 2013. Aquasys: A question-answering system for arabic, in:
4612.
WSEAS International Conference. In: Proceedings. Recent Advances in
OpenAI, 2023. Gpt-4 technical report. arXiv:2303.08774.
Computer Engineering Series, WSEAS, pp. 19–27.
Paszke, A., Gross, S., Massa, F., Lerer, A., Bradbury, J., Chanan, G., Killeen, T., Lin, Z.,
Benajiba, Y., Rosso, P., Benedíruiz, J.M., 2007a. Anersys: An arabic named entity
Gimelshein, N., Antiga, L., Desmaison, A., Köpf, A., Yang, E., DeVito, Z., Raison, M.,
recognition system based on maximum entropy. In: International Conference
Tejani, A., Chilamkurthy, S., Steiner, B., Fang, L., Bai, J., Chintala, S., 2019.
on Intelligent Text Processing and Computational Linguistics. Springer, pp. 143–
Pytorch: An imperative style, high-performance deep learning library. In:
153.
13
H. Alami, A. El Mahdaouy, A. Benlahbib et al. Journal of King Saud University – Computer and Information Sciences 35 (2023) 101709
Wallach, Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. Silberztein, M., Váradi, T., Tadić, M., 2012. Open source multi-platform NooJ for NLP.
(Eds.), 2020, Advances in Neural Information Processing Systems 32: Annual In: Proceedings of COLING 2012: Demonstration Papers, The COLING 2012
Conference on Neural Information Processing Systems 2019, NeurIPS 2019, 8- Organizing Committee, Mumbai, India. pp. 401–408. URL: https://www.aclweb.
14 December 2019, Vancouver, BC, Canada, pp. 8024–8035. URL: http://papers. org/anthology/C12-3050.
nips.cc/paper/9015-pytorch-an-imperative-style-high-performance-deep- Soriano, J.M.G., Montes-y-Gómez, M., Arnal, E.S., Pineda, L.V., Rosso, P., 2005.
learning-library. Language independent passage retrieval for question answering. In: Gelbukh, A.
Peñas, A., Rodrigo, Á., 2011. A simple measure to assess non-response. In: Lin, D., F., de Albornoz, A., Terashima-Marín, H. (Eds.), MICAI 2005: Advances in
Matsumoto, Y., Mihalcea, R. (Eds.), The 49th Annual Meeting of the Association Artificial Intelligence, 4th Mexican International Conference on Artificial
for Computational Linguistics: Human Language Technologies, Proceedings of Intelligence. Springer, pp. 816–823. https://doi.org/10.1007/11579427_83.
the Conference, 19–24 June, 2011, Portland, Oregon, USA, The Association for Taylor, W.L., 1953. ‘‘cloze procedure”: A new tool for measuring readability.
Computer Linguistics. pp. 1415–1424. URL: https://www.aclweb.org/ Journalism Quart. 30, 415–433. https://doi.org/10.1177/107769905303000401.
anthology/P11-1142/. arXiv:https://doi.org/10.1177/107769905303000401.
Peters, M., Neumann, M., Iyyer, M., Gardner, M., Clark, C., Lee, K., Zettlemoyer, L., Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł.,
2018. Deep contextualized word representations. In: North American Polosukhin, I., 2017. Attention is all you need. In: Advances in Neural
Association for Computational Linguistics (NAACL), pp. 2227–2237. Information Processing Systems (NIPS), pp. 5998–6008.
Radford, A., Narasimhan, K., Salimans, T., Sutskever, I., 2018. Improving language Wang, Z., Ng, P., Ma, X., Nallapati, R., Xiang, B., 2019. Multi-passage BERT: A globally
understanding by generative pre-training. Technical Report. OpenAI. normalized BERT model for open-domain question answering. In: Proceedings
Radford, A., Wu, J., Child, R., Luan, D., Amodei, D., Sutskever, I., 2019. Language of the 2019 Conference on Empirical Methods in Natural Language Processing
models are unsupervised multitask learners. OpenAI blog 1, 9. and the 9th International Joint Conference on Natural Language Processing
Raffel, C., Shazeer, N., Roberts, A., Lee, K., Narang, S., Matena, M., Zhou, Y., Li, W., Liu, (EMNLP-IJCNLP). Association for Computational Linguistics, Hong Kong, China,
P.J., 2020. Exploring the limits of transfer learning with a unified text-to-text pp. 5878–5882. https://doi.org/10.18653/v1/D19-1599. URL: https://www.
transformer. J. Mach. Learn. Res. 21, 140:1–140:67. URL: http://jmlr.org/papers/ aclweb.org/anthology/D19-1599.
v21/20-074.html. Wolf, T., Debut, L., Sanh, V., Chaumond, J., Delangue, C., Moi, A., Cistac, P., Rault, T.,
Rajpurkar, P., Zhang, J., Lopyrev, K., Liang, P., 2016. SQuAD: 100,000+ questions for Louf, R., Funtowicz, M., Davison, J., Shleifer, S., von Platen, P., Ma, C., Jernite, Y.,
machine comprehension of text. In: Empirical Methods in Natural Language Plu, J., Xu, C., Le Scao, T., Gugger, S., Drame, M., Lhoest, Q., Rush, A., 2020.
Processing (EMNLP), pp. 2383–2392. Transformers: State-of-the-art natural language processing. In: Proceedings of
Salton, G., 1971. The Smart Retrieval System-experiments in Automatic Document the 2020 Conference on Empirical Methods in Natural Language Processing:
Processing. Englewood Cliffs. System Demonstrations, Association for Computational Linguistics, Online. pp.
Sanderson, M., 1994. Word sense disambiguation and information retrieval. In: 38–45. URL: https://www.aclweb.org/anthology/2020.emnlp-demos.6,
Croft, W.B., van Rijsbergen, C.J. (Eds.), Proceedings of the 17th Annual https://doi.org/10.18653/v1/2020.emnlp-demos.6.
International ACM-SIGIR Conference on Research and Development in Xue, L., Constant, N., Roberts, A., Kale, M., Al-Rfou, R., Siddhant, A., Barua, A., Raffel, C.,
Information Retrieval. Dublin, Ireland, 3–6 July 1994 (Special Issue of the 2020. mt5: A massively multilingual pre-trained text-to-text transformer. CoRR
SIGIR Forum), ACM/Springer. pp. 142–151. https://doi.org/10.1007/978-1- abs/2010.11934. URL: https://arxiv.org/abs/2010.11934, arXiv:2010.11934.
4471-2099-5_15. Zeng, C., Li, S., Li, Q., Hu, J., Hu, J., 2020. A survey on machine reading
Seelawi, H., Mustafa, A., Al-Bataineh, H., Farhan, W., Al-Natsheh, H.T., 2019. NSURL- comprehension–tasks, evaluation metrics and benchmark datasets. Appl. Sci.
2019 shared task 8: Semantic question similarity in arabic. CoRR abs/ 10. URL: https://www.mdpi.com/2076-3417/10/21/7640, https://doi.org/10.
1909.09691. URL: http://arxiv.org/abs/1909.09691, arXiv:1909.09691. 3390/app10217640.
14