0% found this document useful (1 vote)
86 views

Ontology-Based Question Answering System

Uploaded by

RACCHIT JAIN
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (1 vote)
86 views

Ontology-Based Question Answering System

Uploaded by

RACCHIT JAIN
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 18

Soft Comput (2018) 22:213–230

https://doi.org/10.1007/s00500-016-2328-2

METHODOLOGIES AND APPLICATION

QAPD: an ontology-based question answering system


in the physics domain
Asad Abdi1 · Norisma Idris1 · Zahrah Ahmad2

Published online: 3 September 2016


© Springer-Verlag Berlin Heidelberg 2016

Abstract The tremendous development in information tech- have been utilized extensively during the ontology building
nology led to an explosion of data and motivated the need process. The original characteristic of system is the strategy
for powerful yet efficient strategies for knowledge discov- used to fill the gap between users’ expressiveness and formal
ery. Question answering (QA) systems made it possible to knowledge representation. This system has been developed
ask questions and retrieve answers using natural language and tested on the English language and using an ontol-
queries. In ontology-based QA system, the knowledge-based ogy modeling the physics domain. The performance level
data, where the answers are sought, have a structured organi- achieved enables the use of the system in real environments.
zation. The question-answer retrieval of ontology knowledge
base provides a convenient way to obtain knowledge for use. Keywords Question answering · Ontology modeling ·
In this paper, QAPD, an ontology-based QA system applied Natural language interface · Information retrieval
to the physics domain, which integrates natural language
processing, ontologies and information retrieval technologies
to provide informative information for users, is presented. 1 Introduction
This system allows users to retrieve information from for-
mal ontologies using input queries formulated in natural In recent years, the huge amount of information and knowl-
language. We proposed inferring schema mapping method, edge repositories increase rapidly. The users need search
which uses the combination of semantic and syntactic infor- engine to obtain more and more information conveniently.
mation, and attribute-based inference to transform users’ However, for a specific question, it collects a large amount
questions into ontological knowledge base query. In addi- of information; there may be massive redundant or irrelevant
tion, a novel domain ontology for physics domain, called data. Therefore, users have to spend too much time in seek-
EAEONT, is presented. Relevant standards and regulations ing for useful data. Users often have specific questions in
mind, for which they hope to get answers. They would like
Communicated by V. Loia.
the answers to be short and precise, and they always prefer
to express the questions without any query formation rules,
B Asad Abdi or even a specific knowledge domain. Question answering
[email protected] (QA) system can be an essential technology to tackle this
Norisma Idris problem. It can provide answers of the user queries in con-
[email protected] cise form (Bertola and Patti 2015; Lu et al. 2012). QA system
Zahrah Ahmad is a technology that locates, extracts, and represents a spe-
[email protected] cific answer to a user question expressed in natural language
1 (Abacha and Zweigenbaum 2015; Pavlić et al. 2015; Peral
Department of Artificial Intelligence, Faculty of Computer
Science and Information Technology, University of Malaya, et al. 2014). The difficulty of identifying and verifying the
50603 Kuala Lumpur, Malaysia correct answer makes question answering more difficult than
2 Physics Division, Centre for Foundation Studies in Science, the common information retrieval task performed by search
University of Malaya, 50603 Kuala Lumpur, Malaysia engines.

123
214 A. Abdi et al.

A QA system aims to find the correct answers to user’s priate mapping between user’s query in natural language
questions in both nonstructured (e.g., news wire collections) and constrained vocabulary in an ontological knowledge
and structured collection of data (e.g., databases). In an base.
ontology-based QA system, the data, where answers are In this paper, we explain QAPD, an interactive nat-
discovered, have a structured organization defined by ontol- ural language interface for querying ontology databases.
ogy. An ontology describes a conceptual representation of Our ontology-driven QA system, QAPD, is designed to
concepts and their relationships within a specific domain. answer questions about “Electricity” and “Electromag-
Ontology can be considered as a knowledge base which has netism” domain. QAPD is an ontology-based QA system,
a more sophisticated form than a relational database. In our which takes an arbitrary English language sentence (user’s
work, we propose physical concepts ontology, EAEONT, to query) and ontology as inputs, and returns answers drawn
create a knowledge base, as a kernel of our proposed sys- from the available ontological knowledge base. QAPD tack-
tem. We organize knowledge in the form of domain theory les aforementioned challenges from a perspective based on
(such as ‘Electricity’ and ‘Electromagnetism’). The follow- applying the various ways user asks for information in the
ing points motivated us to propose our EAEONT ontology: domain.
(1) a prospective EAEONT ontology can work as a com- There are three main phases in QAPD: users’ query data-
mon vocabulary for the domain, which can be employed base (UQD), inferring schema mappings (ISM) and ontology
as a representative of the domain at the initial development construction. In users’ query database stage, the users’
stages of our proposed method. (2) a EAEONT ontology questions from given domains (“Electricity” and “Electro-
can be used as an external semantic resource for many magnetism”) are collected. These questions are then analyzed
domain-specific information system applications including and categorized into different clusters where each cluster
information extraction/retrieval systems or natural language includes various users’ question for the same information
interfaces just as the more general conceptual ontology. (3) a in the domain. Each cluster is associated with a structured
EAEONT domain ontology can also be employed as a com- query language (SQL) which allows accessing the informa-
pressed information source for researchers in the domain with tion requested by the queries in the cluster. As a result, users’
its explanations of the concepts, their properties, and relations query database includes a set of pairs <query, SQL>. In
in a hierarchical structure. inferring schema mappings phase, the user’s query in natural
In a QA system, a user expresses a question in natural language is processed in order to obtain the correspond-
language and the system will answer to user’s query using ing cluster in the users’ query database. Finally, the user’s
several processes such as question analyzing, information requirement is returned using the corresponding SQL from
retrieval and answer extraction. An ontological knowledge the ontological knowledge base. In ontology construction
base can have key role as background knowledge for ques- phase only information about “Electricity” and “Electromag-
tion answering and information retrieval. In order to obtain netism” has been selected to populate our domain ontology.
a correct answer for user’s query from an ontological knowl- The main contributions of the current work can be sum-
edge base, the user’s question in natural language need to be marized as follows: EAEONT ontology is the first explicit
precisely mapped to the query statement of the ontological ontology developed for the electricity and electromagnetism
knowledge base. However, this is a difficult task involving domain. It can contribute to the implementation and as
complex semantic annotation and knowledge representation. well as other information-based systems acting as a generic
In addition to this, language is complex and ambiguous, and vocabulary for the electricity and electromagnetism domain.
the same question can be asked using various expressions, Furthermore, to the best of our knowledge, QAPD is the
for example, “What is the unit of Resistance?”, “Please tell first interactive natural language interface to electricity and
me the unit of Resistance?” and “show me what is the unit of electromagnetism domain. We developed QAPD for query-
the Resistance?”. Due to the previous researches, there are ing knowledge base. We proposed an approach for mapping
two major obstacles in the ontological QA systems that must natural language question into ontological knowledge base
be considered. The first obstacle is to understand the natural query, ISM. We also collected a set of rules and query patterns
language query and represent the query in a formal way. The inferring mappings between user’s query in natural language
second obstacle is to translate this formal representation into and corresponding cluster in query database.
a correct query adapted to the formal underlying ontology The structure of this paper is as follows. Section 2 provides
query schema. a short overview of the previous systems. Section 3 intro-
The first challenge can be solved using Natural Lan- duces the System Architecture, Ontology Construction and
guage Processing (NLP) technology, while the second chal- Users’ Query Database. Section 4 discusses the performance
lenge is important for an ontology-based QA system to analysis and presents the results of the analysis. Finally, in
obtain answer. The second challenge, translation user’s Sect. 5, we summarize the works discussed and the progress
question into a formal ontology schema, requires an appro- of the project.

123
QAPD: an ontology-based question answering system in the physics domain 215

2 Brief review of literature by understanding the intention of the user and the meaning
of the concepts in the searching query.
The main task of question answering (QA) system is to In closed-domain QA system, the user’s question is trans-
extract concise answers to a natural language query. There is lated into a database query. Then, the database query is
a big difference between question answering and information applied to database for answering the user’s question. In con-
retrieval (IR). In other words, QA system is considered more trast, in open-domain QA system, the user query is compared
complicated than IR system and needs wide NLP techniques with documents or passages, then documents or passages
to prepare a concise and accurate answer to a question in with high similarity score are considered relevant. Finally,
natural language. IR systems provide a set of documents that answer candidates are extracted from those retrieved pas-
relate to user information need, but do not exactly present sages and the candidate with the top-ranking is chosen as
the correct answer. In IR system, the relevant document is the answer. The main problem in closed-domain or struc-
obtained by matching the keywords from user’s query with tured data is the mapping between user query and schema
a set of keywords from the set of documents. However, the structure in database (Dalmas and Webber 2007). The open-
main aim of QA system is to provide textual answer instead domain or unstructured QA has two major problems. First,
of a set of documents. answer redundancy, the answer can appear in different place
The question answering systems includes three main com- in a source text. Second, in closed-domain or structured QA,
ponents: The first is an information retrieval. IR aims to there is a strong connection between data in database and
reduce the number of source documents that must be ana- the process of mapping user’s question into database query.
lyzed in the next steps. In other words, the main task of this This strong connection is employed to identify and remove
step is to find a small set of documents that comprises the any ambiguity in user questions before database access. For
sources that is globally similar to the user query. In this step, instance, if a closed-domain QA knows that the object of verb
the procedure includes the comparison of the user question “help” can be either a title or a noun, where titles and noun are
with any source document. Then, the source documents with in various database fields, it can help the user in disambiguat-
high similarity score are assumed as the source document ing whether the object of his/her question “Which police help
for a user question. Therefore, this step reduces the search The blind man?” was meant to be the title or noun. In open-
space by removing the source document which does not have domain QA there is not a strong connection between query
significant similarities with the user question. The second and the data in source text. Therefore, identifying ambiguity
component is the question analysis. The objective of this step is performed after data access.
is to understand the user query and produce a query out of Ontology (Damljanovic et al. 2010) is a technology
the NL question. The third component is answer extraction. employed to enable the domain knowledge at a high level
This involves selecting the answer based on the user need. and improve the query time used in question answering sys-
Question answering systems are categorized as two main tem. Many systems simply use an ontology as a mechanism
groups such as closed domain, restricted domain or structured to support query expansion in information retrieval. In this
data and open domain or unstructured data. Closed-domain section we explain some of these systems. Asiaee et al.
question answering system needs pre-defined knowledge (2015) proposed an ontology-based QA system, OntoNLQA.
sources, such as a domain ontology and relational database It includes five main components such as Linguistic pre-
management system (RDBMS) (Dalmas and Webber 2007; processing, Entity recognition, Ontology element matching,
Dragoni et al. 2012). Due to the researches there are two Semantic association discovery and Query formulation and
paths to query from database (Li and Jagadish 2014): (1) the answer retrieval. The first two components identify entities in
structured query method such as SQL. (2) the natural lan- user query. These major words are noun and verbs, and cor-
guage query method. In this approach, the user puts his/her respond to the concept and relationships in the domain. The
information need using natural language question without third maps the identified entities into classes and properties of
any terminology restrictions. Although the structured query the ontology. The last two components find the semantic rela-
method is powerful, it is difficult and is not easy for a tionships between the entities, and then they are translated
naïve user. The natural language query method is friendly into SPARQL to query the data. Xie et al. (2015) proposed a
to use. In this approach, users can express complex query question answering system based on ontology. The ontology
in natural language. Keyword-based search is used widely data extracted from the course of “Natural Language Process-
in natural language query method (Aijun 2006; Kalaivani ing”. The system contains four main stages: the knowledge
and Duraiswamy 2012). But sometimes there is a problem base of Ontology, question analysis module, answer extrac-
obtaining incorrect answer for various meaning of identical tion module, and the standard answers’ extension module.
concept. Therefore, semantic research is used to resolve this The system extracts the main keywords using the word seg-
problem. Semantic search is employed to solve this problem mentation and part of speech tagging. The system converts

123
216 A. Abdi et al.

the user query into the Jena query statement of Ontology to 3 The QAPD system architecture
obtain the precise answer.
Besbes et al. (2015) proposed a question answering frame- In this section, a QA system for a structured knowledge
work based on ontology. This framework performs three source is presented. The overview of our proposed system
main steps in question answering: question processing, infor- is shown in Fig. 1. The QAPD is a question answering sys-
mation retrieval (from the web) and answer extraction. The tem which takes queries expressed in natural language, query
framework takes user question in natural language. In sec- database and an ontology as input, and returns answers drawn
ond step, the framework searches the user query in question from the knowledge base, which instantiate the input ontol-
database. If exist the similar question in database the frame- ogy with domain-specific information. The architecture of
work directly returns the answer. Otherwise, the framework the QAPD system consists of five main components such as
performs the question processing and then reformulates the ontology construction component, user interface component,
question. The reformulated question is submitted to a search question processing component, answer extraction compo-
engine. An algorithm is applied on retrieved information to nent and users’ query database component. We describe
extract the relevant passages and finally the relevant answer. each of the aforementioned components in the subsequent
Lee et al. (2007) proposed an ontology bas QA system. In sections.
this system, they defined sixteen types of queries. Then,
for each query the corresponding inferring approach was
3.1 User interface component
defined and implemented. Lopez et al. (2007) is an ontology-
based question answering system, AquaLog. It processes
User’s query—the users interact with QAPD system in a
input questions and categorizes them into 23 groups. If the
user-friendly environment where no knowledge of computers
input question is categorized into one of these groups, the
and knowledge base terms are required. The interaction with
system will process it correctly. Vargas-Vera et al. (2003)
our system is via suitable visual forms, buttons, and menus.
present an ontology-based QA system, AQUA. AQUA com-
Answer—the answer will be shown the required answer
bines natural language processing (NLP), ontologies, Logic
to the user’s queries in the text format. The user can accept
and information retrieval (IR). They introduce a similarity
the answer or if the user needs more information regarding
algorithm as a key feature in AQUA. The main task of this
it, the new query will be given to system once again.
algorithm is employed to determine similarity between con-
cept and relationship in query and concept and relation in the
ontology structure. 3.2 Question processing component
Raj (2013) proposed a QA system for a particular domain
based on the ontological information. This system includes This component aims to perform a basic linguistic analysis on
four main components: the question analysis, which analy- user’s query for further processing. The component consists
sis user’s question. Second component is used to collect of several functions, such as tokenization, part-of-speech tag-
relevant documents. The third component processes the col- ging, stemming, stop word removal and annotations.
lected documents and then the last component provides the Tokenisation—this process is used to split the user’s query
concise answer. Several NLP techniques are used for process- into a sequence of primitive units called tokens that are
ing user’s question, documents and answers extraction. The treated as a single logical unit.
ontology is used for mapping query and determining rela- Stemming—it is used to reduce word to its stem form. It
tionship. Küçük et al. (2010) present an ontology-based QA, is useful to identify words that belong to the same stem (e.g.,
PQONT. This system includes three main stages: The first went and gone, both come from the verb go). This process
stage is the power quality database which is used to store the obtains the root of each word using the lexical database,
data. The second stage, query handler, is used to transform WordNet. WordNet is a lexical database for English which
user’s query into SQL command, apply SQL command to the was developed at Princeton University (Miller and Charles
database and return answer to the query interface. Finally, the 1991) It includes 121,962 unique words, 99,642 synsets (each
query interface displays the user question and the answer. The synset is a lexical concept represented by a set of synonymous
ontological knowledge base constructed for electrical power words) and 173,941 senses of words.
quality domain. Damljanovic et al. (2010) introduced an Part-of-speech (POS) tagging—in this process a part-
ontology-based QA, FREyA. This system combines syntac- of-speech tagger assigns to each word its morphological
tic parsing with an ontological knowledge. It used ontological category such as (noun, verb, adjective, adverb, preposition,
knowledge in order to understand user’s question and trans- pronoun and conjunction). The result of this function is sent
late NL query into a set of ontology concepts. Finally, it used to sections, keyword extraction and title word extraction. We
the output of syntactic parsing to provide the more accurate used an English part-of-speech tagger which was developed
answer. by Tsuruoka and Tsujii (2005) in University of Tokyo.

123
QAPD: an ontology-based question answering system in the physics domain 217

Fig. 1 The architecture of the QAPD

Stop word removal—stop words, are meaningless words, Table 1 Examples of stop words
such as articles, propositions and conjunctions (van Rijs- Stop words
bergen 1986). Removing such words speeds the system
processing and improves the performance of the method. “About, above, across, after, afterward, again, against, all, almost,
alone,”
Using stop word removal, the words that are very com-
“already, also, although, always, am, among, amongst, amongst,
mon within a user’s query and are also considered as
amount, an,”
noisy terms are removed. Obviously, their removal can
“another, any, anyhow, anyone, anything, anyway, anywhere, are,
be effective before the accomplishment of a natural lan- around, as,”
guage processing task. Removal of such words can improve “along, and, in, the, of, …”
accuracy and time requirements for comparisons by sav-
ing memory space and thus by increasing the speed of
processing (Paul and Jamal 2015). Such removal is usually
Ontology lexicon—it is employed to create a mapping
performed by word filtering with the aid of a list of stop
between words in user’s question and ontology concepts.
words. In our work, the stop words extracted from the Eng-
The ontology lexicon includes entity name, relation name
lish stop word list (http://jmlr.csail.mit.edu/papers/volume5/
and properties. This information is directly extracted from
lewis04a/a11-smart-stop-list/english.stop). Table 1 shows
the data source.
some of these stop words that may appear in a sentence.
The corresponding process of annotation is shown in
Annotations—this function uses algorithm 1to label the
Algorithm 1.
user’s query with the ontology concepts.

123
218 A. Abdi et al.

Algorithm 1. Entity Detection


Input: User’s query, Ontology Lexicon (OL);
Output:User’s query is labelled with the ontology concepts;
1: Identify the list of candidate words (noun, verb, adjective and adverb)
2: Let W be a word of the user’s query;
3: Let RW be the root of word W, it is obtained using stemming;
4: Let L be the length of the user’s query;
5: Set l = 0;
6: For each W,
i. l=l+1;
ii. Get RW;
iii. Look for RW in ontology lexicon;
iv. If the RW was in OL, then assign corresponding label to W, jump to
step 6;
v. otherwise, repeat step 6 but with WordNet synonyms of the current
word;
/* “two or more words are synonyms if one may substitute for another in a text
without changing the meaning of the text.”*/

7: Jump to step 6; iterate until l L;

3.3 Answer extraction component (ARC) However, the ISM tries to infer semantic deductions between
a new user’s question and the pre-defined set of domain-query
Answer extraction component includes two main phases. The patterns from query database. The overall process of applying
input user’s query is processed by the inferring schema map- semantic and syntactic analysis in ISM phase is as follows.
ping (ISM), which tries to specify the semantic relationship
between the user’s query and the query patterns included 3.3.1.1 An n-gram (Abdi et al. 2015a; Lin 2004) is a
in the query database. When the inferring schema mapping sequence of n items from a text. An item refers to the word.
determines the corresponding pattern in the query database, An n-gram of size 1, 2 and 3 is referred to as “unigram”,
it returns a SQL command that allows the expected answer “bigram” and “trigram”, respectively. The large size is called
to be obtained. “n-gram” (e.g., six-gram). An n-gram measures the simi-
larity between two texts based on the number of n-grams
3.3.1 Inferring schema mapping (ISM) co-occurring in two compared texts. The N-gram match ratio
is calculated as follows:
Human language technologies caver a broad range of activ-  
ities with the final goal of enabling people to communicate S  {Query pattern} N -gram ε S Countmatch (N -gram)
Cn =  
with machines by using natural communication skills (Lloret S  {Query pattern} N -gram ε S Count (N -gram)
2012). All these activities face a common challenge: They (1)
have to deal with natural language, which is not a trivial
issue. When dealing with natural language text it is necessary where N is used for the length of the N-gram and Countmatch
to analyze and process it in order to be able understand the (N -gram) is the total number of N -grams co-occurring in a
meaning behind it (Varile and Zampolli 1997). For instance, query pattern and a new user’s question. Count (N -gram) is
an ambiguous text might represent several meanings or peo- the number of N -grams in the query pattern. The N -gram
ple can express the same meaning using various sentences in match ratio (Eq. 1) is used to compare a user’s query and the
terms of word content. query patterns in query database as follows:
The inferring schema mapping (ISM) is the backbone
of our question answering system. The ISM component m
N -gram (m, n) = e( n=1 Wn ×log C n ) (2)
is calling after the user’s query has been analyzed by
question processing component. ISM acts as an interme-
where m ≥ n, n and m range from 1 to 4, Wn =
diate language for mapping natural language queries to an
1/ (m − n + 1).
ontology-compliant query. In ISM phase, the user’s query is
reformulated by corresponding domain-specific concepts, to
obtain the requested information from the knowledge base. 3.3.1.2 The Jaccard measure (Jaccard 1912)—the Jaccard
measure is the most popular measure to calculate text simi-

123
QAPD: an ontology-based question answering system in the physics domain 219

larity based on the VSM. The following steps are performed c. If similarity values exist, then the weight of the w
to measure the semantic similarity between a user’s query in the semantic vector is set to the highest similarity
and a query pattern in query database. value. Otherwise, go to the next step;
d. If there is no similarity value, then the weight of the
1. The Word Set (Abdi et al. 2015b) Given two queries Uq w in the semantic vector is set to 0.
and Pq , a “word-set” is produced using distinct words
5 The semantic vector is created for each of the two queries
from the pair of sentences. Let WS = {W1 , W2 ,... , W N }
The semantic similarity measure is computed based on
denote word set, where N is the number of distinct words
the two semantic vectors. The following equation is used
in the word set. The word set between two queries is
to calculate the semantic similarity between sentences:
obtained through main steps as follows:
 
1.1 Two queries are taken as input. SimJaccard Uq , Pq
1.2 Using a loop for each word, W , from Uq , main tasks m  
j=1 w1 j · w2 j
are undertaken, which include: = m m m  
j=1 w1 j + j=1 w2 j − j=1 w1 j · w2 j
2 2
a. Determining the root of the w (denoted by the
RW). (4)
b. If the RW appears in the WS, jumping to the step
ii and continuing the loop using the next word where Uq = (w11 , w12 ,... , w1m ) and Pq = (w21 ,
from Uq , otherwise, jumping to step (C); w22 ,... , w2m ) are the semantic vectors of queries Uq
c. If the RW does not appear in the WS, then assign- (user’s query) and Pq (query pattern in query database),
ing the RW to the WS and then jumping to the respectively; w pj is the weight of the jth word in vector
step ii to continue the loop using the next word U p , m is the number of words.
from Uq .
d. Conducting the same process for Pq . 3.3.1.3 Attribute-based inference (ABI) (Zayaraz 2015).
2. Dice measure (Vani and Gupta 2014) The similarity Given two question Uq (user’s question) and Pq (query pat-
measure between two words w1 and w2 based on their tern in query database), Let User query = {U W1 , U W2 ,... ,
synonyms can be defined as follows: U Wn } represent all ontology attributes, ontology entities
and ontology relationships from a user’s question. Let
 Patternquery = {P W1 , P W2 ,... , P Wn } represent all ontol-
2·|syns(w1 )∩syns(w2 )|
|syns(w1 )|+|syns(w2 )| if w1 = w2
SimDice (w1 , w2 ) = ogy attributes, ontology entities and ontology relationships
1 if w1 = w2
from a query pattern in query database. The ABI procedure
(3) positively weights the user’s query that includes ontology
attributes, ontology entity or ontology relationship equiva-
where syns (w) is the set of words (synonyms) anno- lent to the ones within the query pattern. Two attributes are
tated to w based on the lexical database WordNet, while equivalents whether they are expressed in the same manner
|syns (w)| represents the cardinality of the set syns (w). or using a synonymous word. The final weight is calculated
3. To create the semantic vector using Eq. (5).
The semantic vector is created using the word set (refer
to section 1. the word set) and corresponding query. Each   
ai ∈Uq ,b j ∈Pq Eql ai , b j
cell of the semantic vector corresponds to a word in the ABIweight =   (5)
Uq 
word set, so the dimension equals the number of words
in the word set.
4. To weight each cell of the semantic vector where Uq and Pq include the attributes, entities and relation-
Each cell of the semantic vector is weighted using the ship that appear in the user’s query and each query pattern,
calculated semantic similarity between words from the respectively. The following
 formula is used to calculate the
word set and corresponding query. As an example: value of Eql ai , b j :

a. If the word, w, from the word set appears in the Uq , 


  1, if ai , b j or ai is a paraphrase of b j
the weight of the w in the semantic vector is set to 1. Eql ai , b j =
Otherwise, go to the next step; 0, otherwise
b. If the sentence Uq does not contain the w, then com- (6)
pute the similarity score between the w and the words
from sentence u q using the Dice measure approach The final ISM coefficient is computed using the sum of all
(refer to section 2.Dice measure). weights divided by the number of inferences. We empirically

123
220 A. Abdi et al.

optimized a threshold using the number of user query pat- explaining different features and attributes of the concept,
terns from the query database. Therefore, if the final ISM and restrictions (often called facets) on slots.
coefficients are higher than the threshold (γ ) the answer is An ontology using a set of individual instances of con-
constructed using corresponding query pattern. cepts makes a knowledge base. It is created to share common
understanding of domain knowledge, which is the main goal
ISMcoefficient in developing ontologies, and to enable reuse of domain
  knowledge. In general, a knowledge base is a centralized
ABIweight + SimJaccard Uq , Pq + N -gram (m, n)
= repository for information: a public library, a database of
3 related information about a particular subject (Tomiyama
(7)
1994). Ontology is often used by people, databases and appli-
cations.
3.3.1.4 Query simplification techniques The following steps We propose physical concept ontology, EAEONT, to build
present an example of queries handled by the QAPD system. a knowledge base as a kernel of the QAPD system. We
organize knowledge in the form of domain theories (such
1. The QAPD system takes user’s question expressed in as Electricity and Electromagnetism). Our domain knowl-
natural language as input. edge structure is implemented in terms of an ontology
“What is the unit of voltage?” represented in a database. The data sources that we aim
2. The question processing component processes the nat- to use originally provided by textbook (NRXUSE BESTARI
ural language (NL) question in order to apply several SPM 4.5, http://www.mphonline.com/books/nsearchdetails.
functions such as stemming, POS tagger, annotation, tok- aspx?&pcode=9789835961168), lecture notes and course
enizer and removal. The NL question is labeled with materials mostly available online (Cullity and Graham 2011).
additional information and ontology concepts. The out- Most of these data sources are presented in unstructured form
put of the question processing component is as follows. (as simple text), and our goal is to find out structure from their
“What[wp,wh-element] is[vbz,Aux.v|Be] unit[noun,r elation] text. We have done a lot of theoretical research on how this
voltage[noun,Electeric-element(concept)] ?” might be done.
3. General form of the query pattern and SQL statement The ontology created in our system aims to provide a
returned by the ISM approach. conceptualized explanation of the physics domain. It mainly
“What + [Be/Have] + [Relation] + [Concept/Attribute]”, covers electricity and electromagnetism. From our system
“SELECT ?attrib From table Where RELATION= perspective, the instances of the entity can exist indepen-
[?relation] AND Elec-elem=[? Electeric-element (concept) ]”. dently whereas the instances of the relationship attribute
4. Incorporating the original NL query entities into the SQL and physic law have to be attached to the instances of the
statement. entity. In the following, the physical concept ontology and
“SELECT ?attrib From table Where RELATION = “unit” methodology to integrate domain theories using ontology are
AND Elec-elem = ”voltage””. described. To create the EAEONT ontology we follow the
six steps as discussed in paper (Noy and McGuinness 2001):
step 1, determine the domain and scope of the ontology; step
3.4 Ontology construction component 2, consider reusing existing ontologies; step 3, enumerate
important terms in the ontology; step 4, define the classes
The knowledge base of this proposed system is domain spe- and the subclass, step 5, define the properties of classes—
cific. The storage of ontology is the necessary one to retrieve slots, step 6, create instances.
the relevant and correct answer to users’ questions.

• Entity
3.4.1 EAEONT ontology

Significant interoperability between physics-based models The definition of an entity is a thing with distinct and
needs a common, unambiguous understanding and standard- independent existence such as “Resistance”, “Voltage”, and
ized description of the physical concepts and physical laws “Current”. In other words, an entity indicates an atomic phys-
governing physical objects. A structure, often called ontol- ical object. In order to explain the relationship among entities;
ogy, can be used for standardized description of the physical we define an entity with its “Name” and “Domain”, where
concepts and the relationships between these concepts. In “name” denotes the name of an entity and “Domain” indi-
other words, an ontology is a formal explanation of concepts cates a domain theory (“Electricity” or “Electromagnetism”)
(often called classes) in a domain, relationships between that an entity belongs to it. A formal definition of an entity
the concepts, properties (often called slot) of each concept is as follows.

123
QAPD: an ontology-based question answering system in the physics domain 221

Name: = < entity_ Name >


< entity_ Name>:= a symbol or name which is unique in the domain.

Domain: = < domain_ name>


< domain_ name>:=a symbol to indicate the domain of an entity

An example of an entity is as follows.


An example of a relation is as follows.

Name: = ‘Ris_vol_cur’
Head: = ‘Resistance’
Has Connections: Connection1 (‘Ris_vol_cur’, ‘Resistance’, ‘current’),
Connection2 (‘Ris_vol_cur’, ‘Resistance’, ‘voltage’)

Name: = ‘Resistance’
Domain: = ‘Electricity’ • Attribute

• Relation An attribute is defined as a property, characteristic or addi-


tional information of an entity. We define an attribute with
A relation explains the interactions between entities. We its “Name” and “Head”, where “name” denotes the name of
define a relation between two entities with its “Name” and an attribute and “head” indicates the name of an entity. In
“ head”, where “name” denotes the name of a relation and addition to these two descriptions, we use “Declarations” to
“head” indicates the name of an entity. In addition, we need describe additional information such as, unit of entity, values,
to define a relation with references to entities. Therefore, the definitional relation with other entities.
we use “Has Connections” slot and “Connection” predicate
for this purpose. The “Connection” predicate includes three
parameters. The first parameter is the name of the defined
relation and other parameters are the reference names of the
entities. A formal definition of an entity is as follows.

Name: = < relation_ Name>


< relation_ Name>:= a symbol or name which is unique in the knowledge base.

Head: = < head_ name >


< head_ name>:= a symbol or name to indicate an entity

Has Connections: = <relation1>, <relation2>,…, <relationn>


<relation>:= Connection (relation_ Name, entity_ Name1,
entity_ Name2).

123
222 A. Abdi et al.

Name: = < attribute_ Name>


< Attribute_ Name>:= a symbol or name which is unique in the knowledge base.

Head: = < head_ name >


< head_ name>:= a symbol to indicate an entity

Declarations: = <description1>, < description2 >, …, < descriptionn >


< description1 >:= Has Description (…),
< description2 >:= Has Unit (…),
< description3 >:= Has value (…),
….

An example of an attribute is as follows.

Name: = ‘Ris_Attribute’
Head: = ‘Resistance’
Declarations: =
Has Description (‘the resistance, R, of a conductor is defined as the ratio of the
Potential difference, V, across the conductor to the current, L,
flowing through it.’).
Has Unit (‘ohm’ or ‘Ω’).
Has Description (‘1 Ω is the resistance of a conductor when a potential deference of 1
volt applied across it produces a current of 1 A through it.’).

• Physical law 3.5 Users’ query database component

A physical law represents a simple relationship among the In this work, a domain ontology was created to provide a
entities and their attributes. We define a physical law with common vocabulary for the selected domains such as Elec-
its “ Name” and “Head”, where “name” denotes the name tricity and Electromagnetism. On the basis of the ontology,
of a physical law and “head” indicates the name of an a set of query patterns, which are predicted to be asked by
entity. In addition to these two descriptions, we use “Enti- users in the domain, were collected along with their corre-
ties” and “Expression”. “Entities” defines related entities of sponding SQL command. For a new user’s question, we use
the defined law. “Expression” defines the relationship among ISM approach to determine the corresponding query pattern,
entities by using mathematical equation.

Name: = <law_ Name>


<law_ Name>: = a symbol or name which is unique in the knowledge base .
Head: = < head name >
< head name>:= a symbol to indicate an entity

Entities: = < entity_ Name1>, < entity_ Name2>, < entity_ Name3> …

Expression: = <equation>

An example of an attribute is as follows.


and then use the SQL command of the selected query pat-
Name: = ‘ohm’s law’
tern to produce a complete query to answer to the user’s
Head: = ‘Resistance’
question.
Entities: = ‘Voltage’,
 ‘Current’
Voltage
There are several various types of question: we can indi-
Expression: = resistance = Current . cate the affirmative negative query type, which are those
Figure 2 displays sample of extracted ontology. questions that need a yes or not as an answer, i.e., “is volt

123
QAPD: an ontology-based question answering system in the physics domain 223

Fig. 2 Sample extraction


ontology

the unit of voltage?”. The big sets of questions are those Once the questions are categorized, we associate a SQL
established by a “wh-question”, such as “what, who, when, statement to each cluster. This statement will enable to
where, are there any, does anybody/anyone or how many”. obtain the answer for any of the questions in the cluster.
Also imperative query like show, give, and tell.; are treated The SQL statement includes the generic concepts (e.g.,
as “Wh-question”. The process of creating the users’ query Electric_element). Therefore, before applying the SQL
database contains three main steps: commend to database, these general concepts must be
replaced by the original data instances determined in the
user’s query.
1. We generate a set of questions according to the ontology It is worth noting, different questions may be represented
domain. For this purpose, 10 human experts were asked to by the same cluster, since, in natural language there can
query our representation of the physics domain (Electric- be various ways of asking for a specific ontology concept.
ity and Electromagnetism). The ontology, together with a For example, the questions: “what is the unit of voltage?”
list of real entities and attributes extracted from our data or “what is the unit of potential difference?” requested
source are shown to them in order to generate questions the unit of voltage using two ways (“ voltage”, “potential
for any kind of data they were interested in. The questions difference”). Thus, we identified and stored the various
include real data instances for the concepts defined in the ways of mentioning each ontology concept in a question.
ontology (i.e., “what is the unit of resistance?”, “is ohm This knowledge can be useful for the ISM approach in
the unit of the resistance?” and “what is transformer?”). order to determine paraphrases regarding the concepts
2. This step aims to identify and categorize the entity that appear in the users’ question.
instances, and then replace them by their correspond-
ing ontology concept. To do this, we identify all named
entities in this set of questions. For example, the ques- Figure 3 displays the process of creating the query data-
tion “what is the unit of voltage?” is annotated “what base. As a result, the query database is made up several
+ [Be/Have] + [Relation] + [Concept/Attribute]”. Once clusters. Each cluster includes several semantically identi-
this process was finished, we delete all duplicated cal questions, their corresponding SQL statement to obtain
questions. the answer to user’s question. Furthermore, Table 2 presents
3. Eventually, the set of questions is classified into clusters set of query pattern samples.
or categories according to their semantic equivalences.
Two questions are semantically identical when they illus-
trate same information and talk about similar idea or 4 Experiments
provide the same ontological concepts. For example, the
questions: “what is the unit of voltage?” or “what is the QAPD permits user who has a question in mind and knows
unit of potential difference?” belong to the same seman- something about the domain to ask questions about the
tic category. Each cluster demonstrates how to deal with domain concepts. The main objective is to provide a sys-
its elements, what inference process is required and what tem and a friendly environment to interact with computers in
kind of answer can be expected. natural language. The system does not require user to learn

123
224 A. Abdi et al.

Fig. 3 Query database creation

Table 2 Query pattern sample as the affirmative negative query type, “Wh-question” and
Query pattern (sample) imperative query. In the first experiment, we measured the
performance of the system against human judgment. In sec-
What → [Verb] + [Concept/Attribute] | ond experiment, we compare the performance of the system
[Be/Have] + [Relation] + [Concept/Attribute] | [Aux
with the existing system.
V.] + [Concept/Attribute] + [verb] | [Be] + [Concept/Attribute]
Yes/No → [Be/Have] + [Relation] + [Concept/Attribute]
+ [Concept/Attribute] | [Be/Have] + [Concept/Attribute] 4.1 Dataset
+[Relation] + [Concept/Attribute] | [Concept/Attribute] + [verb]
+ [ Concept/Attribute]
In this section, we describe the data that used throughout
Noun = unit, value, …
our experiments. For assessment of the performance of the
Verb = calculate, define, compute, invent, …
QAPD, we used several users’ queries and corresponding
Be = am, is, are, was, were, been, being
query pattern and correct answer for each of user’s ques-
Have = have, has, had, having
tion. We asked 10 human experts, none of whom have been
Aux V.= am, is, are, have, has, can, was, do, does, did, may, …
involved in the query database creation (refer to Sect. 3.5), to
Noun + Aux.V= has_unit, …
generate questions for the system. Our experiments were per-
Aux V. + Verb = is_defined, … formed on 3750 queries. These users’ queries were randomly
Relation = Noun | Verb | Aux V. | Noun + Aux.V | Aux V. + Verb divided into two separate dataset. In the first experiment,
Concept = voltage, resistance, current, …. 2625 queries (training dataset) were used for parameter
Attribute = ohm, volt, ampere, … tuning [the threshold (λ)]. In the second experiment, the
performance of the proposed method is evaluated using the
remaining queries (test dataset).
any programming language skills, or to know the structure
of the knowledge base, but s/he has to have at least some 4.2 Evaluation metrics
idea of the contents of the domain. Our system, QAPD, was
applied for ontology-based question answering. In this sec- In order to evaluate the performance of QAPD, we used
tion, we carry out experimental studies to evaluate QAPD. three various standard measures, Precision (denoted as Prec),
We conducted two experiments on set of user’s questions. Recall (denoted as Rec), F- measure (denoted as F1 ). The
In our experiment, there are various types of questions such precision and recall are computed using Eqs. (8) and (9).

123
QAPD: an ontology-based question answering system in the physics domain 225

Table 3 Description of TP, TN, FP, FN Table 4 Performance of the QAPD against various threshold (λ) on
training dataset
Correct answer Detected as
correct answer Weighting (λ) Precision Recall F-measure

TP Yes Yes 0.1 0.6231 0.5382 0.5775


TN Yes No 0.2 0.6314 0.5341 0.5787
FP No Yes 0.3 0.6406 0.5761 0.6066
FN No No 0.4 0.6527 0.5935 0.6217
0.5 0.6869 0.5801 0.6290
0.6 0.8128 0.6819 0.7416
Precision denotes what portion of the answers identified by 0.7 0.7218 0.6923 0.7067
system are correct answers (Manning et al. 2008). Recall 0.8 0.7434 0.7095 0.7261
denotes what portion of the correct answers are identified by 0.9 0.7561 0.6355 0.6906
the system (Manning et al. 2008). These measures can be
interpreted as follows. As shown in Table 3, TP means “True
Positive”—answers obtained to be correct—an identification
4.3 Experiment 1: Evaluation of the system with the
which is correct. FP means “False Positive”, that an answer
human judgment
that should have been identified as correct answer, was not.
TN, “True Negatives”, means that a answer was considered
4.3.1 Parameter setting
as correct answer, which is incorrect. And finally, FN, “False
Negatives”, indicates an answer that was not identified as
The QAPD requires parameter to be determined before use: a
correct answer, when the correct decision would have been
threshold (λ) (refer to section 3.3.1.3. Attribute-based infer-
the opposite.
ence) has to be set to identify the corresponding query pattern
in query database. The threshold in the current experiment
TP was found using training data. We ran our system, QAPD,
Prec = (8) on the training dataset. We evaluate QAPD for each (λ)
TP + FP
TP between 0.1 and 0.9 with a step of 0.1. Table 4 presents
Rec = (9) our experimental results obtained by using various the (λ)
TP + FN
values. We evaluate the results in terms of precision, recall
There is an anti–correlation between precision and recall and F-measure. By analyzing the results, we find that the
(Manning et al. 2008). It means the recall drops when the pre- best performance is achieved by a λ = 0.6. This λ produced
cision rises and vice versa. In other words, a system attempts the scores for three metrics as follows: 0.8128 (precision),
for recall will get lower precision and a system attempts for 0.6819 (recall), 0.7416 (F-measure). The best values of
precision will get lower recall. To take into consideration Table 4 have been marked in boldface. As a result, using
the two metrics together, a single measure, called F- score, the current dataset, we obtain the best result when we use 0.6
is used. F- score is a statistical measure that merges both as the (λ) value. Therefore, we can recommend this the (λ)
precision and recall. It is calculated as follows: values for use on the testing data.
 2 
1 γ + 1 Prec · Rec
F1 = = (10) 4.3.2 Influence of N-gram measure, Jaccard measure, and
δ· 1
P + (1 − δ) 1
R
(γ 2 )Prec + Rec attribute-based inference on inferring schema
mapping (ISM)
where γ 2 = 1−δ δ , α ∈ [0, 1], and γ ∈ [0, ∞]. If a large
2

value (γ > 1) assigns to the γ , it indicates that precision has To examine the efficiency of n-gram, Jaccard measure and
more priority. If a small value (γ < 1) assigns to the γ , it attribute-based inference on our system, QAPD, we applied
indicates that recall has more priority. If γ = 1 the precision our system to current dataset using three different tests:
and recall are assumed to have equally priority in computing
F- score. F− score for γ = 1 is computed as follows:
1. Test 1—NJA, to calculate ISMcoefficient using n-gram
2 · Prec · Rec measure (NG), Jaccard measure (JM), and attribute-
F1 = (11) based inference (ABI).
Prec + Rec
2. Test 2—NGA, to calculate ISMcoefficient using n-gram
where, Prec is precision and Rec is recall. measure (NG) and attribute-based inference (ABI).

123
226 A. Abdi et al.

Table 5 Performance of the QAPD against various tests (NJA, NGA, Table 7 Performance comparison between QAPD and other systems
JMA)
System Precision Recall F-score
System Precision Recall F-measure
Precision, Recall and F-score values of the systems
QAPD(NJA) 0.86 0.81 0.83 QAPD 0.8510 0.6400 0.7306
QAPD(NGA) 0.65 0.57 0.61 OVQAF 0.7623 0.6310 0.6905
QAPD(JMA) 0.57 0.45 0.50 AQUEOS 0.7218 0.6385 0.6776
CRO-QA 0.6927 0.5525 0.6147
Table 6 Performance of the QAPD against the threshold (λ = 0.6) on ACQA 0.6116 0.5553 0.5821
testing dataset IQAO 0.5700 0.4800 0.5211
QAPD evaluation

Precision 0.87
According to the results presented in Table 6, the QAPD
Recall 0.67
obtained 87 % precision, 67 % recall and 75 % F-measure.
F-measure 0.75
Table 6 displays in total 1125 questions, 992 of which were
Correct answer: total number of (992 of 1125) 88 %
successful queries
handled correctly by system i.e., approximately 88 % of the
total. This is a good result. An analysis shows that 97 of the
Answer failure: the system returns (97 of 1125) 9 %
a wrong answer 1125 questions, 9 % of the total, present an answer failure
Pattern failure: queries pattern are (36 of 1125) 3 % (the system returns a wrong answer). Furthermore 3 % of the
not defined in query database total questions (36 of the 1125 questions) cannot be answered
because the system presents a pattern failure (when there is
not any corresponding pattern in query database for user’s
3. Test 3—JMA, to calculate ISMcoefficient using Jaccard question).
measure (JM) and attribute-based inference (ABI).
4.4 Experiment 2: Comparison with related systems
We aim to determine what combination (NJA, NGA and
JMA) should be chosen to calculate ISMcoefficient . Table 5 In this section, the performance of our system is compared
displays the results obtained with recall, precision and F- with other well-known or recently proposed methods. In
measure for different tests. We compare QAPD(NJA) with particular, to evaluate our methods on dataset, we select
QAPD(NGA) and QAPD(JMA) . We set λ = 0.6 in our exper- the following methods: CRO-QA (Zayaraz 2015), OVQAF
iment. The results of QAPD(NJA) with QAPD(NGA) and (Besbes et al. 2015), AQUEOS (Toti 2014), ACQA (Hu
QAPD(JMA) are displayed in Table 5 on test dataset where et al. 2012) and IQAO (Xu and Li 2010). These systems
three types of ISMcoefficient tested, i.e., the NJA, NGA and use different data sources in their experiments. This makes a
JMA. The results show that the combined measure, NJA, direct comparison between evaluation results of the different
out-performs the NGA measure and JMA measure. From approaches impossible. In addition, they used different eval-
this table, it can be seen that the performance of QAPD(NJA) uation measures. Therefore, we re-examined the mentioned
is better than QAPD(JMA) and QAPD(NGA) , in terms of approaches upon the same dataset. The evaluation metrics
the results of F- measure. Due to the results, we used the values are reported in Table 7 and Fig. 4.
combined measure (NJA) to calculate ISMcoefficient between
user’s query and query pattern in query data base. 4.4.1 Detailed comparison

4.3.3 Performance analysis With comparison to the precision and F-measure values
for other systems, our system achieved significant improve-
In this experiment, we aim to evaluate to what extent the ment. Table 8 shows the improvement of QAPD for all
QAPD system satisfies user expectations about the range of three metrics. It is clear that QAPD obtains the high F-
questions the system should be able to answer. To confirm measure values and outperforms all the other systems. We use
the aforementioned results (Table 4), we validate our system, the relative improvement ((Our system-Other system)/(Other
QAPD. To do this, we measure the performance of the QAPD system)) ×100, for comparison. In Table 8 “+” means the
to answer user’s question using unused dataset, testing data. QAPD improves the related systems. Table 8 displays among
We apply QAPD to the testing dataset only with the lambda the existing systems the OVQAF shows the best results
(λ) value 0.6. The evaluation metrics values are presented in compared to AQUEOS, CRO-QA, ACQA and IQAO. In
Table 6. comparison with the system OVQAF, QAPD improves the

123
QAPD: an ontology-based question answering system in the physics domain 227

Fig. 4 Performance QAPD OVQAF AQUEOS CRO-QA ACQA IQAO


comparison between QAPD and 0.90
other systems 0.80

Evaluation metrics values


0.70
0.60
0.50
0.40
0.30
0.20
0.10
0.00
Precision Recall F_measure
Evaluation metrics

Table 8 Performance
Metrics QAPD improvement (%)
evaluation compared between
the QAPD and other systems OVQAF AQUEOS CRO-QA ACQA IQAO

Precision +11.6358 +17.8997 +22.8579 +39.1455 +49.2982


F_measure +5.8087 +7.8174 +18.8543 +25.5099 +40.1861

Table 9 Median values and standard deviation of systems on dataset The median values and standard deviation (SD) of pre-
System Precision F-measure cision and F-measure of each system for the dataset are
presented in Tables 9. As is evident from Table 9 the median
Median SD Median SD
values of precision and F-measure for QAPD system on
QAPD 0.7366 4.61E−02 0.6725 5.22E−02 dataset are better than that for the other systems. To establish
OVQAF 0.6858 2.30E−01 0.6229 8.25E−02 that this goodness is statistically significant, Table 10 reports
AQUEOS 0.6758 1.03E−01 0.6003 1.01E−01 the P values produced by Wilcoxon’s matched-pairs signed
CRO-QA 0.5998 1.03E−01 0.5233 1.02E−01 rank test for comparison of two groups (one group corre-
ACQA 0.5748 1.10E−01 0.4978 1.06E−01 sponding to QAPD and another group corresponding to some
IQAO 0.5648 1.03E−01 0.4876 1.02E−01
other system) at a time. As a null hypothesis, it is assumed
that there are no significant differences between the median
values of two groups. Whereas, the alternative hypothesis is
that there is significant difference in the median values of
performance of the OVQAF system as follows: 11.6358 % the two groups. It is clear from Table 10 that P values are
(precision) and 5.8087 % (F_measure). much less than 0.05 (5 % significance level). For example,
the Wilcoxon’s matched-pairs signed rank test between the
systems QAPD and OVQAF for dataset provides a P value of
4.4.2 Statistical significance test
0.036 (precision), which is very small. This is strong evidence
against the null hypothesis, indicating that the better median
In order to statistically compare the performance of QAPD
values of the performance metrics produced by QAPD is sta-
with other systems, we use a nonparametric statistical
tistically significant and has not occurred by chance. Similar
significance test, called Wilcoxon’s matched-pairs signed
results are obtained for all other systems compared to QAPD
rank-based statistical test, to determine the significance of
system, establishing the significant superiority of the pro-
our results. The statistical significance test for independent
posed system. From the statistical results, we observe that
samples has been conducted at the 5 % significance level.
our QAPD system significantly outperforms the other sys-
Six groups, corresponding to the six systems: 1. OVQAF,
tems.
2. AQUEOS, 3. CRO-QA, 4. ACQA, 5. IQAO, 6. QAPD
A visual comparison of statistical significance is provided
have been created for dataset. Two groups are compared at a
in Fig. 5. This figure shows the median values of precision
time one corresponding to QAPD system and the other cor-
and F-measure obtained by each system on the dataset. It can
responding to some other system considered in this paper.
be observed that precision and F-measure values of QAPD
Each group consists of the precision and F-measure for the
are noticeably better than that of other systems. In addition,
dataset produced by each corresponding system.

123
228 A. Abdi et al.

Table 10 P values produced by


Systems OVQAF AQUEOS CRO-QA ACQA IQAO
Wilcoxon’s matched-pairs
signed rank test by comparing Dataset Comparing medians of Precision metric of QAPD with other systems
QAPD with other systems
0.036 0.018 0.005 0.005 0.005
Comparing medians of F-measure metric of QAPD with other systems
0.028 0.009 0.005 0.005 0.005

Fig. 5 Median values of F-measure Precision


different system on dataset
IQAO

ACQA
Systems
CRO-QA

AQUEOS

OVQAF

QAPD

0.00 0.10 0.20 0.30 0.40 0.50 0.60 0.70 0.80


Metrics

according to the statistical significance test, QAPD is more 2. A user’s query without any corresponding query pat-
stable than the other systems. For convincing, we address tern. The query asks for a concept/attribute, although the
the readers to pay an attention to the values of the standard current concept/attribute is in our ontology but there is
deviation (SD) in Table 9. not any corresponding query pattern in query database
to answer the user’s query. For example, the question
4.5 Discussion (“1 KWH is equal to how many Joule?”) asks for two
attributes (“KWH” and “Joule”), although two attributes
According to the results presented in Table 6, the system (“KWH” and “Joule”) and the relationship between them
answered to 992 questions (i.e., precision rate of 87 %). The can be found in our ontology, but without correspondence
system is able to capture the semantics of the user’s question with the specific query pattern the system is not able to
and employ it in the answer retrieval. The QAPD is very easy answer user’s question. To solve this problem, we can
to follow and requires minimal text processing cost. Further- add any new query pattern to the user query database.
more, the system, QAPD can be applied to other domains 3. In some cases, the user’s query belongs to one of clusters
and languages. Therefore, we only need an ontology mod- in query database, but the system returns a wrong query
eling and a query database for a new system to work on a pattern. In other hand, in some cases, a set of query pattern
different domain. The system also fails to answer the rest of is obtained, the reason is that some query patterns are very
the queries (e.g., 97 Answer Failure and 36 Pattern Failure). similar and it is difficult to distinguish between using the
In this section, we explain the common errors we encoun- current ISM approach.
tered during our experiment. We distinguish these failures as 4. A query can be a combination of two relationships or two
follows. user’s questions. For instance, in “What is transformer
and how many types are there?”. In this case, the ISM
cannot map the user’s query into a query pattern. In order
1. The concept, attribute or relation does not exist in the
to improve the current drawback, the user’s question can
ontology. However, in some cases, the noun/term in the
be divided into several questions.
NL query is not in the ontology and ISM cannot find
noun/term or even similarity with any of the terms in
the ontology. Therefore, the failure will be occurred. For
example, the question (“what is the unit of e.m.f?”) asks 5 Conclusion and future work
for a concept, “e.m.f” that it is not in the ontology. To
tackle this problem, we can add concept, attribute and In this paper, we present QAPD, an ontology-based QA sys-
relation to the ontology. tem developed for answering questions related to the physics

123
QAPD: an ontology-based question answering system in the physics domain 229

domain (“Electricity” and “Electromagnetism”). The key Compliance with ethical standards
component to ontology-based QA systems is the ability to
Conflict of interest I hereby and on behalf of the co-authors, declare
capture the semantics of the user’s question and employ it in all the authors agreed to submit the article exclusively to this journal and
the answer retrieval. The ISM approach is used to answer any also declare that there is no conflict of interests regarding the publication
new user’s question using a corresponding query pattern in of this article.
the query database. This approach uses three similarity met-
Ethical approval This article does not contain any studies with human
rics to calculate ISMcoefficient : n-gram (NG), Jaccard measure participants performed by any of the authors
( JM) and attribute-based inference (ABI). We analyzed the
influence of three similarity metrics on our system, QAPD.
Due to the results as shown in Table 5, we selected the best References
combination of them. Furthermore, in this project we create
a users’ query database. For this purpose, we collected a set Abacha AB, Zweigenbaum P (2015) MEANS: a medical question-
of user’s question, and then grouped manually them based answering system combining NLP techniques and semantic Web
technologies. Inf Process Manag 51:570–594
on the several features. Finally, we assigned to each user’s Abdi A, Idris N, Alguliev RM, Aliguliyev RM (2015a) Automatic
question a corresponding query pattern and SQL statement summarization assessment through a combination of semantic
for answer retrieval. The system, QAPD can be applied to and syntactic information for intelligent educational systems. Inf
other domains and languages. Therefore, we only need an Process Manag 51:340–358
Abdi A, Idris N, Alguliyev R, Aliguliyev R (2015b) Query-based multi-
ontology modeling and a query database for a new system to documents summarization using linguistic knowledge and content
work on a different domain. word expansion. Soft Comput. doi:10.1007/s00500-015-1881-4
The evaluation of QAPD is conducted over 3750 queries Aijun Z (2006) Research and implementaion of ontology-based intelli-
that comprise a wide variety of questions. The QAPD is very gent question answer system. Comput Appl Softw 5:027
Asiaee AH, Minning T, Doshi P, Tarleton RL (2015) A framework for
easy to follow and requires minimal text processing cost. ontology-based question answering with application to parasite
Initially, a parameter of QAPD is optimized over the 2625 immunology. J Biomed Semant 6:1–25
question (training dataset). Later we used the remaining ques- Bertola F, Patti V (2015) Ontology-based affective models to organize
tion (test dataset) to assess the performance of QAPD using artworks in the social semantic web. Inf Process Manage 52:139–
162
the recall, precision and F-measure. The results displayed Besbes G, Baazaoui-Zghal H, Ghezela HB (2015) An ontology-driven
that QAPD is able to obtain 87 % precision, 67 % recall and visual question-answering framework. In: Information visuali-
76 % F-measure. sation (IV), 19th international conference on, 2015. IEEE, pp
This paper presents the following suggestions for future 127–132
Cullity BD, Graham CD (2011) Introduction to magnetic materials.
work. Our method used WordNet as the main semantic Wiley, New York
knowledge base for the calculation of semantic similarity Dalmas T, Webber B (2007) Answer comparison in automated question
between words. The comprehensiveness of WordNet is deter- answering. J Appl Log 5:104–120
mined by the proportion of words in the text that are covered Damljanovic D, Agatonovic M, Cunningham H (2010) Natural lan-
guage interfaces to ontologies: Combining syntactic analysis
by its knowledge base. However, the main criticism of Word- and ontology-based lookup through the user interaction. In: The
Net concerns its limited word coverage to calculate semantic semantic web: research and applications. Springer, Berlin, pp 106–
similarity between words. Obviously, this disadvantage has a 120
negative effect on the performance of our proposed method. Dragoni M, da Costa Pereira C, Tettamanzi AG (2012) A conceptual
representation of documents and queries for information retrieval
One solution is that, in addition to WordNet, other knowledge systems by using light ontologies. Expert Syst Appl 39:10376–
resources, such as Wikipedia and other large corpus should be 10388
used. Furthermore, the system, QAPD cannot cover all kinds Hu D, Wang W, Xie N, Cao C (2012) ACQA_onto: an ontology
of users’ questions. Especially, it con not provide answer to approach for restrain domain question answering system. In:
Information science and control engineering (ICISCE 2012), IET
a complex question where a user’s query combines two or international conference on, 2012. IET, pp 1–5
more question, or asks about two or more concepts or rela- Jaccard P (1912) The distribution of the flora in the alpine zone. New
tionships. On the other hand, the experiment revealed that Phytol 11:37–50
there are some common failures that we encountered during Kalaivani S, Duraiswamy K (2012) Comparison of question answering
systems based on ontology and semantic web in different environ-
our evaluation. We would like to improve and complete the ment. J Comput Sci 8:1407
ISM approach and also produce more complex query pattern Küçük D, Salor Ö, İnan T, Çadırcı I, Ermiş M (2010) PQONT: a domain
in future. ontology for electrical power quality. Adv Eng Inf 24:84–95
Lee S-M, Ryu P, Choi K-S (2007) Ontology-based question answering
Acknowledgements This work is supported by the GPLAODQA system. In: The 6th international semantic web conference
Project (P.No: RACE CR009- 2014). Li F, Jagadish HV (2014) NaLIR: an interactive natural language inter-
face for querying relational databases. In: Proceedings of the 2014

123
230 A. Abdi et al.

ACM SIGMOD international conference on management of data. Toti D (2014) AQUEOS: a system for question answering over seman-
ACM, pp 709–712 tic data. In: Intelligent networking and collaborative systems
Lin R (2004) A package for automatic evaluation of summaries. In: Text (INCoS), 2014 international conference on. IEEE, pp 716–719
summarization branches out: proceedings of the ACL-04 work- Tsuruoka Y, Tsujii Ji (2005) Bidirectional inference with the easiest-first
shop, pp 74–81 strategy for tagging sequence data. In: Proceedings of the confer-
Lloret E (2012) Text summarisation based on human language tech- ence on human language technology and empirical methods in
nologies and its applications. Proces Leng Nat 48:119–122 natural language processing. Association for Computational Lin-
Lopez V, Uren V, Motta E, Pasin M (2007) AquaLog: an ontology-driven guistics, pp 467–474
question answering system for organizational semantic intranets van Rijsbergen CJ (1986) (invited paper) A new theoretical framework
Web Semantics: science. Serv Agents World Wide Web 5:72–105 for information retrieval. In: Proceedings of the 9th annual inter-
Lu W, Cheng J, Yang Q (2012) Question answering system based on national ACM SIGIR conference on research and development in
web. In: Proceedings of the 2012 fifth international conference on information retrieval. ACM, pp 194–200
intelligent computation technology and automation. IEEE Com- Vani K, Gupta D Using K-means cluster based techniques in external
puter Society, pp 573–576 plagiarism detection. In: Contemporary computing and informat-
Manning CD, Raghavan P, Schütze H (2008) Introduction to informa- ics (IC3I), 2014 international conference on, 2014. IEEE, pp
tion retrieval, vol 1. Cambridge University Press, Cambridge 1268–1273
Miller GA, Charles WG (1991) Contextual correlates of semantic sim- Vargas-Vera M, Motta E, Domingue J (2003) An ontology-driven
ilarity. Lang Cogn Process 6:1–28 question answering system (AQUA), new directions in question
Noy NF, McGuinness DL (2001) Ontology development 101: a guide to answering. MIT Press, Cambridge
creating your first ontology. Stanford knowledge systems labora- Varile GB, Zampolli A (1997) Survey of the state of the art in human
tory technical report KSL-01-05 and Stanford medical informatics language technology, vol 13. Cambridge University Press, Cam-
technical report SMI-2001-0880 bridge
Paul M, Jamal S (2015) An improved SRL based plagiarism detection Xie X, Song W, Liu L, Du C, Wang H (2015) Research and implemen-
technique using sentence. Rank Proc Comput Sci 46:223–230 tation of automatic question answering system based on ontology.
Pavlić M, Han ZD, Jakupović A (2015) Question answering with a In: Control and decision conference (CCDC), 2015 27th Chinese.
conceptual framework for knowledge-based system development IEEE, pp 1366–1370
“Node of Knowledge”. Expert Syst Appl 42:5264–5286 Xu J, Li Y (2010) Design and implementation of intelligent question
Peral J, Ferrández A, De Gregorio E, Trujillo J, Maté A, Ferrández LJ answering system based on ontology. In: 2010 Second inter-
(2014) Enrichment of the phenotypic and genotypic data ware- national conference on computational intelligence and natural
house analysis using question answering systems to facilitate the computing, pp 213–216
decision making process in cereal breeding programs. Ecol Inform Zayaraz G (2015) Concept relation extraction using Naïve Bayes clas-
26:203–216 sifier for ontology-based question answering systems. J King Saud
Raj P (2013) Architecture of an ontology-based domain-specific Univ Comput Inf Sci 27:13–24
natural language question answering system. arXiv preprint
arXiv:1311.3175
Tomiyama T (1994) From general design theory to knowledge-intensive
engineering. Artif Intell Eng Des Anal Manuf 8:319–333

123

You might also like