GoWeb - A Semantic Search Engine For The Life Science Web
GoWeb - A Semantic Search Engine For The Life Science Web
science web
1 Background
With the tremendous growth of the World Wide Web, search engines became key
tools to find documents. Search engines retrieve documents for a user’s keywords
from a large index and rank them by various criteria. While such keyword-based
search is fast and powerful to retrieve single documents, it is far from the vision
of answering a user’s questions by “understanding” the user’s query and answers
in the documents as put forward already in the late 1960s [1].
Consider e.g. a biomedical researcher, who might ask questions such as the
following: Which model organisms are used to study the Fgf8 protein? Which
processes are osteoclasts involved in? What are common histone modifications?
Which diseases are associated with wnt signaling? Which functions does Rag
C have? Which disease can be linked to fever, anterior mediastinal mass, and
central necrosis? What is the role of PrnP in mad cow disease?
The Web holds answers to these questions, but classical keyword-based search
is not suitable to answer them, since the keywords are required to appear literally
in text. However, documents do contain statements such as e.g. “wnt signalling
is linked to cancer” or “we studied fgf8 expression in Zebrafish development”. If
there is background knowledge that cancer is a disease and that zebrafish is a
model organism, then the above questions can be answered.
The use of such knowledge is at the heart of the semantic web, which pro-
motes the use of formal statements and reasoning to deliver advanced services
not available on the Web now [2]. To facilitate machine-readability and knowl-
edge processing, a set of standards, query languages, and the semantic stack
was proposed by the W3C. The stack comprises at the base unique identifiers
and XML as common markup language. On top of XML, it defines the Re-
source Description Framework, RDF to capture subject-predicate-object triples.
Furthermore, there is the modelling language RDFS and the query language
SPARQL. The basic class definitions and triples of RDF are extended at the
next level by the Web ontology language OWL, which provides description logic
as modelling language and by a rule layer [3].
Besides the expressiveness of OWL, mark up for vocabularies and meta-data
emerged such as Simple Knowledge Organisation Systems (SKOS) [4], Dublin
Core1 , Friend of a Friend (FOAF) [5] and the Semantically-Interlinked Online
Communities Project (SIOC) [6]. Additionally, there are formats to embed se-
mantic annotations within web documents, such as embedded RDF (eRDF),
Microformats2 or RDFa [7].
All of the above standards serve the need to formally represent knowledge
and facilitate reasoning over this knowledge. They require explicit statements
of knowledge. As a consequence, the amount of such structured data is still
small in comparison to the unstructured data. Thus, to support semantic search
there are essentially two approaches: Those, searching structured documents and
reasoning over them and those, searching unstructured documents and extracting
knowledge and reasoning over it. The knowledge extraction step of the latter uses
combinations of natural language processing, information retrieval, text-mining,
and ontologies for the knowledge extraction.
Table 1 summarises a number of semantic search engines, which work on
structured and unstructured documents. The former comprise Swoogle [8], Se-
mantic Web Search Engine (SWSE) [9], WikiDB [10], Sindice [11], Watson [12],
Falcons [13], and CORESE [14]. They include existing RDF repositories and
crawl the internet for formal statements, e.g. OWL files. A search retrieves a
list of results with URIs. For SWSE and Falcon the result is enriched with a
description and a filtering mechanism for result types. CORESE uses concep-
tual graphs for matching a query to its databases. WikiDB is slightly different
from the others in that it extracts formal knowledge implicit in meta tags of
Wikipedia pages and converts it into RDF offering querying with SPARQL.
As mentioned, the above systems are limited by the availability of structured
documents, a problem addressed by approaches such as the semantic media wiki
[15] and large efforts such as Freebase [16], which provides an environment to
1
dublincore.org
2
microformats.org
Table 1. Comparison of semantic search engines
clustering of
scientifically
highlighting
structured/
textmining
result type
documents
documents
number of
ontologies
evaluated
Semantic
Engines
type of
Search
results
Swoogle rdf 1 Mio 9 20 yes
SWSE rdf 1 Mio 9 14 20 yes
Sindice rdf 1 Mio 9 20 yes
Watson rdf 1 Mio 9 20 yes
Falcons rdf 1 Mio 9 14 20 yes yes
CORESE rdf 1 Mio 9 20 yes
WikiDB rdf 1 Mio 9 20
Hakia txt 4 Bio 10 15 21 yes
START txt 4 Bio 10 22 yes
Ask.com txt 4 Bio 10 23
BrainBoost txt 4 Bio 10 24 yes
AnswerBus txt 4 Bio 10 25 yes
Cuil txt 4,8 Bio 10 15 21 yes
Clusty txt 5 Bio 10 16 23,26 yes
Carrot txt 5 Bio 11 16 23,26 yes yes
PowerSet wiki 4,8 Mio 10 15 23,25 yes
QuAliM wiki/txt 4,8 Mio 11,10 22 yes
GoWeb txt 2,3 6,7,8 Bio 11 17 23,27 yes yes
askMedline xml 3 Mio 12 28 yes
EAGLi xml 2 4,6 Mio 12 18 22,28 yes yes
GoPubMed xml 2,3 6,7,8 Mio 12 17 23,27,28 yes yes
ClusterMed xml 3 5 Mio 12 16 26,28 yes yes
IHop xml 3 6,7 Mio 12 19 24,28 yes yes
EBIMed xml 2,3 6,7 Mio 12 17 24,27 yes yes
XplorMed xml 3 5,6 Mio 12 17 21,28 yes yes
Textpresso xml 2 6 Mio 13 17 28 yes yes
Chilibot xml 7 Mio 12 24 yes yes
author formal statements. The second class of tools works on unstructured text
and therefore does not suffer from this limit. The systems can be distinguished
by the document source they work on (Web, Biomedical, Wiki), the use of back-
ground knowledge in the form of ontologies, the use of text-mining techniques
such as stemming, concept identification, deep/shallow parsing.
Hakia, START [17], Ask.com, BrainBoost (Answers.com), AnswerBus [18],
Cuil3 , Clusty4 , and Carrot5 are engines that work on Web documents. Hakia,
START and AnswerBus use natural language processing to understand docu-
ments, while Cuil, Clusty and Carrot cluster search results and aim to label
clusters with phrases, which are offered as related queries. Cuil, Clusty and Car-
rot are not semantic search engines in a strict sense, since these phrases are
not part of an ontology or vocabulary. However, they do have the benefit of be-
ing generally applicable and Cuil offers definitions for phrases where available.
Ask.com uses its ExpertRank, an algorithm for computing query-specific com-
munities and ranking in real-time, to identify relevant pages [19]. They include
structured knowledge to generate answers. BrainBoost is a meta-search engine.
It uses the proprietary AnswerRank algorithm applying machine learning and
natural language processing. It ranks answers extracted from the top websites.
Wikipedia is a valuable resource to answer questions and hence some engines
are specifically applied to it. PowerSet applies e.g. natural language processing
to Wiki in a similar manner to Hakia. QuAliM [20] uses a pattern based ap-
proach for sentence analysis. Semantic type checking for answers and a fallback
mechanism with web search is implemented in QuAliM.
The above tools are intended to be general and as a result they generally do
not cover the biomedical domain well. Searching for example for a protein such
as Fgf8, PowerSet and Hakia do not offer an answer for model organisms. They
offer information on the protein, but are not able to find zebrafish as a model
organism.
Engines such as askMedline, EAGLi [21], GoPubMed [22], ClusterMed, IHOP
[23], EBIMed [24], XplorMed [25], Textpresso [26] and Chilibot [27] address this
by processing biomedical literature in full text (Textpresso) or abstracts as avail-
able in the PubMed literature database. With a focused domain, these engines
can use background knowledge. GoPubMed and EBIMed use e.g. the GeneOntol-
ogy and the Medical Subject Headings, MeSH; XplorMed filters by eight MeSH
categories and extracts topic keywords co-occurrences; Chilibot extracts rela-
tions and generates hypotheses; IHOP uses genes and proteins as hyperlinks
between sentences and abstracts; EAGli and askMedline process questions as
input for the search.
Finally, besides all of the automated approaches, Google, Yahoo! and Mi-
crosoft use humans to answer questions in their services Google Answers, Yahoo!
Answers and MSN Live Search QnA.
3
www.cuil.com
4
clusty.com
5
www.carrot-search.com
Closely related to semantic search, is semantic hyperlinking as implemented
in the Conceptual Open Hypermedia Service (COHSE). COHSE annotates a
given web page with concepts and offers services based on the identified concepts
([19], [20]).
None of the above systems combines the simplicity of keyword search on
the vast amounts of Web documents with the use of biomedical background
knowledge to filter large keyword results with biomedical ontologies. Here, we
address this by introducing the GoWeb search engine. GoWeb issues queries to
Yahoo and indexes the snippets semantically with ontology terms. These are then
offered to filter results by concepts. In order to demonstrate the power of this
approach in question answering, we evaluate GoWeb on three benchmarks with
questions on gene/function, symptom/disease, and protein/disease relationships
and compare it to existing solutions.
2.1 Algorithm
The search is executed by a traditional keyword based search service. We use
Yahoo! Search BOSS service6 . The result of a submitted search is a list of tex-
tual summaries for web documents, called snippets. Next, GoWeb uses entity
6
developer.yahoo.com/search/boss/
Fig. 1. GoWeb screen shot, shown with example query Fgf8. On the left are the semantic filters. On the right side are the search results,
with the query field and summary on top.
recognition techniques to map concepts from the background knowledge to the
snippets.
The algorithm for the identification of ontological concepts in text is based
on the GoPubMed algorithms [22]. For the identification of protein and gene
names we use the approach by Hakenberg et al. [29], which achieved the best
results in the gene identification task of BioCreAtIvE 2 (Critical Assessment
of Information Extraction systems in Biology) in the year 2007. Further entity
recognition services can be integrated into GoWeb. Currently the OpenCalais
service [30] is used to identify names and places.
The identified entities of each result and found keywords are the basis for a
co-occurrence based semantic filtering mechanism of GoWeb. The filter uses the
part-of and is-a relationships from GO and the tree structure of MeSH. These
relations are used to induce the relevant search result for each concept from the
background knowledge. The induction result for all search results for a query is
also used to select important concepts. These top concepts are selected for the
entire background knowledge and for each sub category.
2.2 Architecture
The workflow for GoWeb can be described as follows. The user submits a query
through the search form on the GoWeb website to the server. The server pre-
processes the query and sends a search request to the search service. The search
service returns the first results. The first results are then annotated, highlighted
(concepts and keywords), rendered and sent to the user.
The user can now already browse the first results. Once the first results are
processed, the server starts fetching the remaining results. This is done for up
to 1000 results. Then all fetched results are annotated. To reduce the response
time, the fetching and annotation of results is done in parallel. The annotation
information is then used to induce a tree representation and top concepts of
the ontological background knowledge for the submitted query (result tree).
Then this information is rendered and sent to the user-interface using AJAX
technologies like JSON to reduce the required time and bandwidth. An overview
is available in Figure 2.
If the user selects a concept in the result tree by clicking it, a request is made
to update the presented documents. This includes a filtering step of the result
set and a re-ranking step. For an illustration see also Figure 3. The new ranking
is based on the found concepts, keywords and the original ranking.
Once the user has decided to open a web page, GoWeb offers to highlight the
page with the keywords and concepts from the background knowledge. This is
done with a proxy-based solution. The server checks if this page is annotatable,
e.g. the content is HTML based. Then the GoWeb-server fetches the site and
analyzes the content, adds the annotations and sends the result to the user. If
the content is not processable by the proxy there is an automatic forward to the
original content.
Fig. 2. General workflow for GoWeb showing the main components and the interactions
between the external services.
Fig. 3. Workflow for a request containing a concept selected from the result tree in the
user browser.
3 Results
The goal of GoWeb is to use ontologies and text-mining in semantic web search
to answer questions. Here, we give some examples and evaluate the question
answering capabilities of GoWeb on three benchmarks. In the introduction we
raised questions such as the following: Which model organisms are used to study
the Fgf8 protein? Which processes are osteoclasts involved in? What are common
histone modifications? Which diseases are associated with wnt signaling?
An answer to these questions can be found using GoWeb. For example Fgf8 is
studied in Mice, Zebrafish; osteoclasts are involved in bone resorption; common
histone modifications are Methylation and Acetylation and the wnt signaling
pathway is associated with neoplasms like breast cancer, tumors or leukemia.
The answers where directly obtained with GoWeb using simple keyword searches
and the induced background knowledge. For example the answer to the first
question can be found in the following way: first submit the query Fgf8 the
answer is directly shown as listed concepts in the organism’s part of the back-
ground knowledge (see also Figure 1). To retrieve the corresponding search re-
sults click on the organism. This simple strategy can be generalised to support
semi-automatic question answering. Next, we will demonstrate this using three
independent benchmarks.
Table 2. Comparison of Google, GoPubMed and GoWeb for symptoms and diseases
benchmark
Case 5 6 7 8 9 10 11 12 14 15 16 17 18 19 22 25
√ √ √ √ √ √ √ √
Google
√ √ √ √ √ √ √
GoPubMed
√ √ √ √ √ √ √ √ √ √ √ √
GoWeb
For example for the case study number 28 GoWeb finds 126 articles for the
query “ANCA haematuria haemoptysis”. Under diseases one can find the MeSH
concept “Churg-Strauss Syndrome”. A click on the concepts in the tree retrieves
three snippets containing the concept. The resulting snippets are:
The GoWeb system performs better than GoPubMed because the underlying
search engine has a larger repository of documents. Additionally, it can also
index the full text, if it is available on the web. The MEDLINE search for all
PubMed based search engines, like GoPubMed, is only based on abstracts. This
corresponds with the fact that the MEDLINE search returns often none or only
one article abstract.
Linking proteins and disease is a key task for many molecular biomedical re-
searchers. The third benchmark for GoWeb is based on the questions from the
TREC Genomics Track 2006[36]. The results of TREC Genomics Track 2006
comprise a benchmark that focused on passage retrieval for question answering.
It is based on full-text documents from the biomedical literature. For the year
2006 there were 28 questions. With GoWeb one can answer 22 of these 28 ques-
tions (78,6%). In 13 of these cases the semantic filter helped to reduce the result
set. For a summary of all questions please have a look at Table 3.
For GoWeb the questions were transformed into keywords. The complete
listing of questions and keywords is available in Table 5. A question was marked
as successfully handled, if there was a snippet that contained a valid answer.
The second aspect addressed with this benchmark was to show the capabilities
of the filtering feature. Filtering by background knowledge helps to reduce large
results to a smaller set of relevant documents. It was marked as applied, if the
answers, where found by using the filtering feature.
The answers for the first four questions (160–164) are shown in Table 6. They
also demonstrate what kind of textual evidence GoWeb can provide as answers.
The answer to question 160 (What is the role of PrnP in mad cow disease), for
instance, was found by submitting the keywords and selecting the MeSH concept
“Encephalopathy, Bovine Spongiform” (mad cow disease is a synonymous label
Table 3. Summary of TREC Genomics 2006 answering capabilities of GoWeb
Question 160 161 162 163 164 165 166 167 168 169
√ √ √ √ √ √ √ √ √ √
Answered
√ √ √ √ √
Filter
Question 170 171 172 173 174 175 176 177 178 179
√ √ √ √ √ √
Answered
√ √ √
Filter
Question 180 181 182 183 184 185 186 187 Count
√ √ √ √ √ √
Answered 22
√ √ √ √ √
Filter 13
for the concept) as semantic filter. The selected answer was now in the first
part of the remaining relevant results. The given number 378 corresponds to the
original position. This demonstrates that without the filter this answer would not
have been found normally [37]. For the question 161 the keywords were specific
enough. This corresponds with the original rank of first and second position for
the answers.
There are two main reasons for GoWeb to not be able to find a answer for
all question in the benchmark. The first is that the question is too complex.
The answer is too long to be formulated in a sentence or snippet. For example
the question 171 contains actually two questions. The second reason is that the
question domain is not sufficiently modelled in the background knowledge. For
question 178, for instance, skin biology has no corresponding concept and is too
general to be mentioned directly in text.
References
1. Green, B., Wolf, A., Chomsky, C., Laughery, K.: BASEBALL: An Automatic
Question Answer. In: Computers and Thought. McGraw-Hill (1963) 207–216
2. Berners-Lee, T., Hendler, J., Lassila, O.: The Semantic Web. Scientific American
284(5) (May 2001) 34–43
3. Grigoris, A., van Harmelen, F.: A Semantic Web Primer (Cooperative Information
Systems). The MIT Press (April 2004)
4. Miles, A., Bechhofer, S.: SKOS Simple Knowledge Organization System – Ref-
erence. W3C Working Draft (June 2008) URL: www.w3.org/TR/2008/WD-skos-
reference-20080609/.
5. Brickley, D., Miller, L.: FOAF Vocabulary Specification 0.91 (November 2007)
URL: http://xmlns.com/foaf/spec/.
6. Berrueta, D., Brickley, D., Decker, S., Fernández, S., Görn, C., Harth, A., Heath,
T., Idehen, K., Kjernsmo, K., Miles, A., Passant, A., Polleres, A., Polo, L.: SIOC
Core Ontology Specification (July 2008) URL: http://rdfs.org/sioc/spec/.
7. Adida, B., Birbeck, M.: RDFa primer – Bridging the Human and Data Webs. W3C
Working Draft (June 2008) URL: www.w3.org/TR/2008/WD-xhtml-rdfa-primer-
20080620/.
8. Ding, L., Finin, T., Joshi, A., Pan, R., Cost, R.S., Peng, Y., Reddivari, P., Doshi,
V., Sachs, J.: Swoogle: a search and metadata engine for the semantic web. In:
CIKM ’04: Proceedings of the thirteenth ACM international conference on Infor-
mation and knowledge management, New York, NY, USA, ACM (2004) 652–659
9. Harth, A., Umbrich, J., Decker, S.: Multicrawler: A pipelined architecture for
crawling and indexing semantic web data. In Cruz, I.F., Decker, S., Allemang,
D., Preist, C., Schwabe, D., Mika, P., Uschold, M., Aroyo, L., eds.: International
Semantic Web Conference. Lecture Notes in Computer Science, Springer (2006)
258–271
10. Clements, M.: WikiDB URL: http://www.kennel17.co.uk/testwiki/WikiDB.
11. Tummarello, G., Delbru, R., Oren, E., Cyganiak, R.: Sindice.com: A semantic
web search engine. Presentation, Digital Enterprise Research Institute National
University of Ireland, Galway (November 2007)
12. d’Aquin, M., Baldassarre, C., Gridinoc, L., Angeletou, S., Sabou, M., Motta, E.:
Characterizing Knowledge on the Semantic Web with Watson. In: Workshop on
Evaluation of Ontologies and Ontology-based tools, 5th International EON Work-
shop, collocated with the International Semantic Web Conference (ISWC’07), Bu-
san, Korea. (2007)
13. Cheng, G., Ge, W., Qu, Y.: Falcons: Searching and browsing entities on the se-
mantic web. In: World Wide Web Conference, Beijing, China (April 2008)
14. Dieng-Kuntz, R., Corby, O.: Conceptual graphs for semantic web applications.
In: International Conference on Conceptual Structures (ICCS). Volume 3596 of
LNCS., Springer (2005)
15. Völkel, M., Krötzsch, M., Vrandecic, D., Haller, H., Studer, R.: Semantic
Wikipedia. In: WWW ’06: Proceedings of the 15th international conference on
World Wide Web, New York, NY, USA, ACM (2006) 585–594
16. Bollacker, K., Evans, C., Paritosh, P., Sturge, T., Taylor, J.: Freebase: a collabora-
tively created graph database for structuring human knowledge. In: SIGMOD ’08:
Proceedings of the 2008 ACM SIGMOD international conference on Management
of data, New York, NY, USA, ACM (2008) 1247–1250
17. Katz, B., Borchardt, G., Felshin, S.: Natural language annotations for question
answering. In: Proceedings of the 19th International FLAIRS Conference (FLAIRS
2006), Melbourne Beach, FL (May 2006)
18. Zheng, Z.: Answerbus question answering system. In: Proceedings of the second
international conference on Human Language Technology Research, San Francisco,
CA, USA, Morgan Kaufmann Publishers Inc. (2002) 399–404
19. Yang, T.: Large scale internet search at ask.com. In: First International Conference
on Scalable Information Systems, INFOSCALE. (2006) Keynote.
20. Kaisser, M.: The QuALiM Question Answering Demo: Supplementing Answers
with Paragraphs drawn from Wikipedia. In: Proceedings of the 46th Annual Meet-
ing of the Association for Computational Linguistics, ACL 2008, Columbus, Ohio
(2008)
21. Gobeill, J., Ehrler, F., Tbahriti, I., Ruch, P.: Vocabulary-driven passage retrieval
for question-answering in genomics. In: The Fifteenth Text REtrieval Conference
(TREC 2007) Notebook. (2007)
22. Doms, A., Schroeder, M.: GoPubMed: exploring PubMed with the Gene Ontology.
Nucleic Acids Res 33(Web Server issue) (Jul 2005) W783–W786
23. Good, B.M., Kawas, E.A., Kuo, B.Y.L., Wilkinson, M.D.: iHOPerator: user-
scripting a personalized bioinformatics Web, starting with the iHOP website. BMC
Bioinformatics 7 (2006) 534
24. Rebholz-Schuhmann, D., Kirsch, H., Arregui, M., Gaudan, S., Riethoven, M.,
Stoehr, P.: EBIMed–text crunching to gather facts for proteins from Medline.
Bioinformatics 23(2) (Jan 2007) e237–e244
25. Perez-Iratxeta, C., Pérez, A., Bork, P., Andrade, M.: Update on XplorMed: A
web server for exploring scientific literature. Nucleic Acids Res 31(13) (Jul 2003)
3866–8
26. Müller, H.M., Kenny, E.E., Sternberg, P.W.: Textpresso: an ontology-based infor-
mation retrieval and extraction system for biological literature. PLoS Biol 2(11)
(Nov 2004) e309
27. Chen, H., Sharp, B.: Content-rich biological network constructed by mining
PubMed abstracts. BMC Bioinformatics 5 (Oct 2004) 147
28. Ashburner, M., Ball, C., Blake, J., Botstein, D., Butler, H., Cherry, J., Davis, A.,
Dolinski, K., Dwight, S., Eppig, J., Harris, M., Hill, D., Issel-Tarver, L., Kasarskis,
A., Lewis, S., Matese, J., Richardson, J., Ringwald, M., Rubin, G., Sherlock, G.:
Gene Ontology: tool for the unification of biology. the Gene Ontology Consortium.
Nat Genet 25(1) (May 2000) 25–9
29. Hakenberg, J., Royer, L., Plake, C., Strobelt, H., Schroeder, M.: Me and my
friends: Gene mention normalization with background knowledge. In: Proceedings
2nd BioCreAtIvE Challenge Evaluation Workshop. Number 2, Madrid (April 2007)
30. ClearForrest: Calais: Connect. Everything. WebService, provided by ClearForest,
a Thomson Reuters Company (2008) URL: opencalais.com.
31. Blaschke, C., Leon, E.A., Krallinger, M., Valencia, A.: Evaluation of BioCreAtIvE
assessment of task 2. BMC Bioinformatics 6 Suppl 1 (2005) S16
32. Tang, H., Ng, J.H.K.: Googling for a diagnosis–use of Google as a diagnostic aid:
internet based study. BMJ 333(7579) (Dec 2006) 1143–1145
33. Taubert, M.: Use of Google as a diagnostic aid: bias your search. BMJ 333(7581)
(Dec 2006) 1270; author reply 1270
34. Twisselmann, B.: Use of Google as a diagnostic aid: summary of other responses.
BMJ 333(7581) (Dec 2006) 1270–1271
35. Wentz, R.: Use of Google as a diagnostic aid: is Google like 10,000 monkeys? BMJ
333(7581) (Dec 2006) 1270; author reply 1270
36. Hersh, W., Cohen, A.M., Roberts, P., Rekapalli, H.K.: Overview of the TREC 2006
question answering track. In: The Fifteenth Text REtrieval Conference (TREC
2006) Proceedings. (2006)
37. Granka, L.A., Joachims, T., Gay, G.: Eye-tracking analysis of user behavior in www
search. In: SIGIR ’04: Proceedings of the 27th annual international ACM SIGIR
conference on Research and development in information retrieval, New York, NY,
USA, ACM (2004) 478–479
38. Tomuro, N., Lytinen, S.L.: Selecting features for paraphrasing question sentences.
In: Proceedings of the Workshop on Automatic Paraphrasing at Natural Language
Processing Pacific Rim Symposium (NLPRS). (2001) 55–62
39. Ely, J., Osheroff, J., Gorman, P., Ebell, M., Chambliss, M., Pifer, E., Stavri, P.: A
taxonomy of generic clinical questions: classification study. BMJ 321(7258) (Aug
2000) 429–32
Table 4. Overview of the GoWeb results for the symptoms and diseases benchmark.