0% found this document useful (0 votes)
90 views36 pages

Semantic Web (CS1145) : Department Elective (Final Year) Department of Computer Science & Engineering

This document provides an overview of the Semantic Web. It begins by describing some limitations of the current web, such as search engines only being able to match keywords and not understand meanings. The Semantic Web aims to address this by adding metadata to web documents to provide machines with information about the meaning of content. This will allow machines to better process and integrate data across applications. The document then discusses metadata standards like Dublin Core and how metadata can be generated and embedded. It also compares traditional web search engines to a hypothetical semantic web search engine that uses common vocabularies and ontologies to understand relationships between concepts.

Uploaded by

qwerty u
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
90 views36 pages

Semantic Web (CS1145) : Department Elective (Final Year) Department of Computer Science & Engineering

This document provides an overview of the Semantic Web. It begins by describing some limitations of the current web, such as search engines only being able to match keywords and not understand meanings. The Semantic Web aims to address this by adding metadata to web documents to provide machines with information about the meaning of content. This will allow machines to better process and integrate data across applications. The document then discusses metadata standards like Dublin Core and how metadata can be generated and embedded. It also compares traditional web search engines to a hypothetical semantic web search engine that uses common vocabularies and ontologies to understand relationships between concepts.

Uploaded by

qwerty u
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 36

Semantic Web

(CS1145)
Department Elective (Final Year)
Department of Computer Science & Engineering
UNIT I
THE BASICS OF SEMANTIC WEB
From Traditional Web to Semantic Web
What is WWW?
World Wide Web/Internet
How are we using the Internet?/Usage
Search : eg. SOAP - Soaps
Integration: eg. Restaurant selection; Composition:
Airline Ticketing
Web data mining: Air traffic control tower Atlanta
international airport take off rate crawler (agent)
stock information (prices) every 10min
Internet is Huge distributed database
Drawbacks of Internet usage
Web search
Web severs- integration
Web data mining
The Internet is constructed in such a way that its documents
only contain enough information for the computers to present
them, not to understand them.
Semantic Web
Semantic Web
The Semantic Web is an extension of the current Web in which
information is given well-defined meaning, better enabling
computers and people to work in cooperation. . . . a web of data
that can be processed directly and indirectly by machines.
Tim Berners-Lee, James Hendler, Ora Lassila
... the idea of having data on the Web defined and linked in a way
that it can be used by machines not just for display purposes, but
for automation, integration, and reuse of data across various
applications.
W3C Semantic Web Activity
Summarizing Semantic Web in general
(Machine readable view)
The current Web is made up of many Web documents
(pages).
Any given Web document, in its current form (HTML tags
and natural text), only gives the machine instructions about
how to present information in a browser for human eyes.
Therefore, machines have no idea about the meaning of the
document they are presenting; in fact, every single document
on the Web looks exactly the same to machines.
Machines have no way to understand the documents and
cannot make any intelligent decisions about these documents.
Developers cannot process the documents on a global scale
(and search engines will never deliver satisfactory
performance).
One possible solution is to modify the Web documents, and
one such modification is to add some extra data to these
documents; the purpose of this extra information is to enable
the computers to understand the meaning of these documents.
Assuming that this modification is feasible, we can then
construct tools and agents running on this new Web to process
the document on a global scale; and this new Web is now
called the Semantic Web.
Metadata
data about data;- it is data that describes information
resources.
Metadata is a systematic method for describing resources and
thereby improving their access. In the Web world, systematic
means structured and, furthermore, structured data implies
machine readability and understandability.
Metadata of each Web document has its own unique structure,
and it is simply not possible for an automated agent to process
these metadata in a uniform and global way.
Metadata provides the essential link between the page content
and content meaning.
A standard is a set of agreed-on criteria for describing data. For
instance, a standard may specify that each metadata record should
consist of a number of predefined elements representing some
specific attributes of a resource (in this case, the Web document),
and each element can have one or more values. This kind of
standard is called a metadata schema.
Dublin Core (DC) is one such standard. It was developed in
the March 1995 Metadata Workshop sponsored by the Online
Computer Library Center (OCLC) and the National Center for
Supercomputing Applications (NCSA). It has 13 elements
(subsequently increased to 15), which are called Dublin Core
Metadata Element Set (DCMES); it is proposed as the minimum
number of metadata elements required to facilitate the discovery
of document-like objects in a networked environment such as
the Internet
Metadata Considerations
Embedding the Metadata in Your Page
<metadata> tag in the <head>section
Using a text-parsing crawler to create Metadata
Once the crawler reaches a page and finds that it does not have any
metadata, it attempts to discover some meaningful information by
scanning through the text and creates some metadata for the page
Using Metadata Tools to Add Metadata to Existing Pages
http://www.ukoln.ac.uk/metadata/dcdot/
The problem with this solution is that you have to visit the Web pages one
by one to generate the metadata, and the metadata that is generated is only
DC metadata
the generated metadata cannot be really added to the page itself, because
you normally do not have access to it; you need to figure out some other
place to store them
DCdot can be used to generate DC metadata for the
page you submit
The DC metadata generated by DCdot
Search Engine For The Traditional Web
Search Engines: Google, Yahoo, AltaVista, Lycos
Indexation process spider or crawler
To initially kick off the process, the main control
component of a search engine will provide the crawler with
a seed URL
Building the index table
Indexation process
The quality of the generated index table to a large extent
decides the quality of a query result.
indexation process is conducted by a special piece of software
usually called a spider , or crawler
A crawler visits the Web to collect literally everything it can
find by constructing the index table during its journey.
To initially kick off the process, the main control component of
a search engine will provide the crawler with a seed URL (a set
of seed URLs), and the crawler, after receiving seed URL, will
begin its journey by accessing this URL:
it downloads the page pointed to by this URL
and does the following:
Step 1: Build an index table for every single word on this
page.
Step 2: From the current page, find the first link (which is
again a URL pointing to another page) and crawl to this
link, meaning to download the page pointed to by this link
Step 3: After downloading this page, start reading each
word on this page, and add them all to the index table.
Step 4: Go to step 2, until no unvisited link exists
Two possibilities
1. the current word from this new page has never been
added to the index table, or
2. it already exists in the index table

Google, one of the most popular search engines, at best can


index about 4 billion to 5 billion Web pages, representing
only 1% of the World Wide Web (www).
the search engine presented, we store the number of times the word
appears on a page. Based on this number, our engine is able to assign a
weight to the page; the more often the word appears on the page, the
more relevant this page is.
Conducting The Search
The next step is to remember where the word appears. It could
appear in the title of the page, near the top of the page, in
subheadings, in links, or in the metadata tags. Clearly, these
different locations of a given word signify the different levels of
importance of the word.
metadata can play a unique role in guiding how the crawler
should build the index table
Another example is when the page owner decides to use formal
classification schemes in the Subject element, indicating the
general topic of the page (Entertainment, Business, Education,
etc.).
The counterpart of depth-first search is breadth-first search
The Web is made up of billions of Web pages.
Most Web page have three different kinds of codes: HTML/XHTML
tags (required), CSS tags (optional), and JavaScript (optional).
All these tags in a Web page just tell the machines how to display the
page; they do not tell the machines what they mean.
When a crawler reaches a given page, it has no way to know what this
page is all about. In fact, it is not even able to tell whether the page is a
personal page or a corporate Web site.
Thus, the crawler can only collect keywords, turning all search engines
into essentially keyword matchers.
Search Engine For The Semantic Web:
A Hypothetical Example
A Hypothetical Usage Of The Traditional Search Engine
www.cheapCameras.com
www.buyItHere.com
Building A Semantic Web Search Engine
Step 1: Build a common language.
It is a standard way to express the meaning/knowledge of a
specific domain.
It is a common understanding/language /meaning/ knowledge
shared by different parties over the Web.
Step 2: Markup the pages.
Step 3: A much smarter crawler.
Step:1

Camera is a root concept in this vocabulary; we can also call it a root


term or a root class. Camera has two subconcepts: Digital camera or
Film camera. Also, Digital camera can have two subclasses: SLR
camera or Point-and-Shoot camera.
Step: 2
Inference engine
Using the Semantic Web search Engine
Step:1 select your semantics
Step:2 select the index table and return the results
If this field points to NULL, discard this document.
If the field is not NULL and if it does not point to
mySimpleCamera.owl, then discard this document.
If the markupURL field is not NULL and it does point to
mySimpleCamera .owl, then include this document in the
candidate set.
Web Page Markup Problem
W3C can find the resources to implement an application
to convince the public.
Another solution to handle this is to do automatic
markup by running a smart crawler this crawler is
different from the search engine crawler in that its task is
not to build an index table but to automatically markup
the Web pages it has visited. At the current stage, this
still seems quite hard to accomplish given that Web pages
are not machine readable.
Common Vocabulary Problem
One of the difficulties is that for a single domain, there
could exist several of these files, each of which tries to
capture the common terminologies and their relations in
the given domain. Then we have the problem of
overlapping semantics.
Query-building Problem
the search engine would present the user the vocabulary
files and the user would decide which file defines the
semantics that he or she prefers.
How good is this solution?
What if the user is not quite clear about the semantics of
the concept he or she is searching for?
What if the concept shows up in several definition files and
each of them looks fairly close?
Other issues of performance and scalability

You might also like