Sjanta is a natural language question answering system developed by Md. Arafat Rahman to address the need for systems that can answer questions in everyday language with precise answers and context. Sjanta takes natural language questions as input, analyzes them to determine the answer type, retrieves relevant documents, extracts candidate answers from the documents, and generates the final answer. It consists of four main modules: question analysis, document retrieval, extracting answer candidates, and answer generation. The system is being implemented using keyword-based information retrieval tools and techniques in Java.
Sjanta is a natural language question answering system developed by Md. Arafat Rahman to address the need for systems that can answer questions in everyday language with precise answers and context. Sjanta takes natural language questions as input, analyzes them to determine the answer type, retrieves relevant documents, extracts candidate answers from the documents, and generates the final answer. It consists of four main modules: question analysis, document retrieval, extracting answer candidates, and answer generation. The system is being implemented using keyword-based information retrieval tools and techniques in Java.
As users struggle to navigate the wealth of on-line information now available, the need for automated question answering systems becomes more urgent. We need systems that allow a user to ask a question in everyday language and receive an answer quickly and succinctly, with sufficient context to validate the answer. Current search engines can return ranked lists of documents, but they do not deliver answers to the user. Question Answering System addresses this problem. It tries to find out the exact and precise answer of the natural language question. Sjanta is a question answering system that addresses the above mentioned problem
Introduction
There is a large amount of textual data on a variety of digital mediums such as digital archives, the Web and the hard drives of our personal computers. Efficiently locating information on these digital mediums has become one of the most important challenges in the last decade.
Search engines have been used to locate the documents which are related to user information need. Natural language questions are the best way of expressing user information need but these questions cannot be used directly by search engines. A natural language question is transformed into a query which is a set of keywords. These keywords describe the user information need. After a query is entered into a search engine, the search engine retrieves a set of documents that are ranked according to their relevance to the query. To find the desired information, the user reads through the returned document set.
However, in many situations a user wants a particular piece of information rather than a document set. Question Answering (QA) which is a kind of Information Retrieval has addressed this problem. The benefit of Question Answering Systems is two-fold: 1) They take natural language questions rather than queries 2) They return explicit answers rather than set of documents
Question Answering is the task of returning a particular piece of information in response to a natural language question. The aim of a question answering system is to present the needed information directly, instead of documents containing potentially relevant information.
Motivation
Inspiration for this research project came from the fact that much research has been put into QA over the last decade along with a trend towards an open advancement of question answering. Being both an interesting interdisciplinary research area & having practical application, question answering has gained some public attention in the past years. The best known example of a QA system could be IBM Watson which won a Jeopardy! competition live on television. Other well known examples would be Apples Siri and Google Now. But no existing QA systems are able to address the problem fully. In particular context-aware question answering i.e, answering question with respect to previous context is not address properly. So it is still a challenging research project to build an efficient QA system. So we would like to investigate the problem to propose an open domain question answering system that takes advantage of Web data to answer both factoid and non-factoid questions
Proposed System: Sjanta We would like to build a system namely Sjanta which will take natural language (or natural language-like) questions and answer them accordingly.
System Description
Like a typical QA system, Sjanta also consists of several modules in its framework. Sjanta has 4 modules as follow-
Who is the president of Bangladesh?
Abdul Hamid
Figure: Different modules of Sjanta QAS
Question Analysis
Document Retrieval
Extracting Answer Candidates
Answer Generation Question Answer Here I will brief each module by an example natural language question: Who is the president of Bangladesh?
A. Question Analysis:
This module extracts focus from questions and analyzes what is asked i.e the answer type of the question. A question may be categorized into two general type as-
1. Factoid questions: The answer of these questions is fixed and requires a single phrase as its answer. For example the taken example (Who is the president of Bangladesh?) is a factoid question and requires a named entity as its answer.
2. Non-factoid Questions: These are descriptive type question whose answers may vary both in contents and size. For example describe BFS graph traversal technique. is a non- factoid type question.
The question analysis module also extracts keywords from question for further processing. For example the words president and Bangladesh will be extracted as keywords for the above mentioned example question.
B. Document Retrieval:
The document retrieval module searches related documents (or passages) using extracted focus of question and keywords. For example the related pages of Wikipedia that contain information about the president of Bangladesh may be retrieved in this phase.
C. Extracting Answer Candidates (EAC):
The EAC module searches sentences or phrases from retrieved document that may be the answer of the query. It also ranks the candidate answer.
D. Answer Generation:
The answer generation module generates answer based on the ranking of answer candidates. For example the phrase Abdul Hamid will be generated as the answer of the question who is the president of Bangladesh?
Implementation plan
There are many approaches of building QA system; the machine learning approach; complex pattern matching approach and keyword based approach are some well known approaches. Since keyword based approach is more intuitive we would like to investigate this approach here.
We will use different open source Information Retrieval tools like Stanford NLP parser, Lucene etc together with our own devised tools for performing the tasks of different module. Java will be the primary language for building Sjanta QA system. Conclusion
Since the amount of data in the web is growing very fast it is becoming difficult for users to locate needed information within short time. So it has become necessary to build an autonomous system that will make the information finding job easier. Question Answering System is such a system that will make the job of finding precise and exact information easier.
References
[1] Giuseppe Attardi, Antonio Cisternino, Francesco Formica, Maria Simi, and Alessandro Tommasi ; PiQASso: Pisa Question Answering System.
[2] Jimmy Lin and Boris Katz; Question Answering from the Web Using Knowledge Annotation and Knowledge Mining Techniques.
[3] Silviu Cucerzan and Eugene Agichtein; Factoid Question Answering over Unstructured and Structured Web Content.