Semantic Web (CS1145) : Department Elective (Final Year) Department of Computer Science & Engineering

This document provides an overview of the Semantic Web. It begins by describing some limitations of the current web, such as search engines only being able to match keywords and not understand meanings. The Semantic Web aims to address this by adding metadata to web documents to provide machines with information about the meaning of content. This will allow machines to better process and integrate data across applications. The document then discusses metadata standards like Dublin Core and how metadata can be generated and embedded. It also compares traditional web search engines to a hypothetical semantic web search engine that uses common vocabularies and ontologies to understand relationships between concepts.

Uploaded by

qwerty u

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

90 views36 pages

Semantic Web (CS1145) : Department Elective (Final Year) Department of Computer Science & Engineering

Uploaded by

qwerty u

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 36

Semantic Web

(CS1145)
Department Elective (Final Year)
Department of Computer Science & Engineering
UNIT I
THE BASICS OF SEMANTIC WEB
From Traditional Web to Semantic Web
What is WWW?
World Wide Web/Internet
How are we using the Internet?/Usage
Search : eg. SOAP - Soaps
Integration: eg. Restaurant selection; Composition:
Airline Ticketing
Web data mining: Air traffic control tower Atlanta
international airport take off rate crawler (agent)
stock information (prices) every 10min
Internet is Huge distributed database
Drawbacks of Internet usage
Web search
Web severs- integration
Web data mining
The Internet is constructed in such a way that its documents
only contain enough information for the computers to present
them, not to understand them.
Semantic Web
Semantic Web
The Semantic Web is an extension of the current Web in which
information is given well-defined meaning, better enabling
computers and people to work in cooperation. . . . a web of data
that can be processed directly and indirectly by machines.
Tim Berners-Lee, James Hendler, Ora Lassila
... the idea of having data on the Web defined and linked in a way
that it can be used by machines not just for display purposes, but
for automation, integration, and reuse of data across various
applications.
W3C Semantic Web Activity
Summarizing Semantic Web in general
(Machine readable view)
The current Web is made up of many Web documents
(pages).
Any given Web document, in its current form (HTML tags
and natural text), only gives the machine instructions about
how to present information in a browser for human eyes.
Therefore, machines have no idea about the meaning of the
document they are presenting; in fact, every single document
on the Web looks exactly the same to machines.
Machines have no way to understand the documents and
cannot make any intelligent decisions about these documents.
Developers cannot process the documents on a global scale
(and search engines will never deliver satisfactory
performance).
One possible solution is to modify the Web documents, and
one such modification is to add some extra data to these
documents; the purpose of this extra information is to enable
the computers to understand the meaning of these documents.
Assuming that this modification is feasible, we can then
construct tools and agents running on this new Web to process
the document on a global scale; and this new Web is now
called the Semantic Web.
Metadata
data about data;- it is data that describes information
resources.
Metadata is a systematic method for describing resources and
thereby improving their access. In the Web world, systematic
means structured and, furthermore, structured data implies
machine readability and understandability.
Metadata of each Web document has its own unique structure,
and it is simply not possible for an automated agent to process
these metadata in a uniform and global way.
Metadata provides the essential link between the page content
and content meaning.
A standard is a set of agreed-on criteria for describing data. For
instance, a standard may specify that each metadata record should
consist of a number of predefined elements representing some
specific attributes of a resource (in this case, the Web document),
and each element can have one or more values. This kind of
standard is called a metadata schema.
Dublin Core (DC) is one such standard. It was developed in
the March 1995 Metadata Workshop sponsored by the Online
Computer Library Center (OCLC) and the National Center for
Supercomputing Applications (NCSA). It has 13 elements
(subsequently increased to 15), which are called Dublin Core
Metadata Element Set (DCMES); it is proposed as the minimum
number of metadata elements required to facilitate the discovery
of document-like objects in a networked environment such as
the Internet
Metadata Considerations
Embedding the Metadata in Your Page
<metadata> tag in the <head>section
Using a text-parsing crawler to create Metadata
Once the crawler reaches a page and finds that it does not have any
metadata, it attempts to discover some meaningful information by
scanning through the text and creates some metadata for the page
Using Metadata Tools to Add Metadata to Existing Pages
http://www.ukoln.ac.uk/metadata/dcdot/
The problem with this solution is that you have to visit the Web pages one
by one to generate the metadata, and the metadata that is generated is only
DC metadata
the generated metadata cannot be really added to the page itself, because
you normally do not have access to it; you need to figure out some other
place to store them
DCdot can be used to generate DC metadata for the
page you submit
The DC metadata generated by DCdot
Search Engine For The Traditional Web
Search Engines: Google, Yahoo, AltaVista, Lycos
Indexation process spider or crawler
To initially kick off the process, the main control
component of a search engine will provide the crawler with
a seed URL
Building the index table
Indexation process
The quality of the generated index table to a large extent
decides the quality of a query result.
indexation process is conducted by a special piece of software
usually called a spider , or crawler
A crawler visits the Web to collect literally everything it can
find by constructing the index table during its journey.
To initially kick off the process, the main control component of
a search engine will provide the crawler with a seed URL (a set
of seed URLs), and the crawler, after receiving seed URL, will
begin its journey by accessing this URL:
it downloads the page pointed to by this URL
and does the following:
Step 1: Build an index table for every single word on this
page.
Step 2: From the current page, find the first link (which is
again a URL pointing to another page) and crawl to this
link, meaning to download the page pointed to by this link
Step 3: After downloading this page, start reading each
word on this page, and add them all to the index table.
Step 4: Go to step 2, until no unvisited link exists
Two possibilities
1. the current word from this new page has never been
added to the index table, or
2. it already exists in the index table

Google, one of the most popular search engines, at best can

index about 4 billion to 5 billion Web pages, representing
only 1% of the World Wide Web (www).
the search engine presented, we store the number of times the word
appears on a page. Based on this number, our engine is able to assign a
weight to the page; the more often the word appears on the page, the
more relevant this page is.
Conducting The Search
The next step is to remember where the word appears. It could
appear in the title of the page, near the top of the page, in
subheadings, in links, or in the metadata tags. Clearly, these
different locations of a given word signify the different levels of
importance of the word.
metadata can play a unique role in guiding how the crawler
should build the index table
Another example is when the page owner decides to use formal
classification schemes in the Subject element, indicating the
general topic of the page (Entertainment, Business, Education,
etc.).
The counterpart of depth-first search is breadth-first search
The Web is made up of billions of Web pages.
Most Web page have three different kinds of codes: HTML/XHTML
tags (required), CSS tags (optional), and JavaScript (optional).
All these tags in a Web page just tell the machines how to display the
page; they do not tell the machines what they mean.
When a crawler reaches a given page, it has no way to know what this
page is all about. In fact, it is not even able to tell whether the page is a
personal page or a corporate Web site.
Thus, the crawler can only collect keywords, turning all search engines
into essentially keyword matchers.
Search Engine For The Semantic Web:
A Hypothetical Example
A Hypothetical Usage Of The Traditional Search Engine
www.cheapCameras.com
www.buyItHere.com
Building A Semantic Web Search Engine
Step 1: Build a common language.
It is a standard way to express the meaning/knowledge of a
specific domain.
It is a common understanding/language /meaning/ knowledge
shared by different parties over the Web.
Step 2: Markup the pages.
Step 3: A much smarter crawler.
Step:1

Camera is a root concept in this vocabulary; we can also call it a root

term or a root class. Camera has two subconcepts: Digital camera or
Film camera. Also, Digital camera can have two subclasses: SLR
camera or Point-and-Shoot camera.
Step: 2
Inference engine
Using the Semantic Web search Engine
Step:1 select your semantics
Step:2 select the index table and return the results
If this field points to NULL, discard this document.
If the field is not NULL and if it does not point to
mySimpleCamera.owl, then discard this document.
If the markupURL field is not NULL and it does point to
mySimpleCamera .owl, then include this document in the
candidate set.
Web Page Markup Problem
W3C can find the resources to implement an application
to convince the public.
Another solution to handle this is to do automatic
markup by running a smart crawler this crawler is
different from the search engine crawler in that its task is
not to build an index table but to automatically markup
the Web pages it has visited. At the current stage, this
still seems quite hard to accomplish given that Web pages
are not machine readable.
Common Vocabulary Problem
One of the difficulties is that for a single domain, there
could exist several of these files, each of which tries to
capture the common terminologies and their relations in
the given domain. Then we have the problem of
overlapping semantics.
Query-building Problem
the search engine would present the user the vocabulary
files and the user would decide which file defines the
semantics that he or she prefers.
How good is this solution?
What if the user is not quite clear about the semantics of
the concept he or she is searching for?
What if the concept shows up in several definition files and
each of them looks fairly close?
Other issues of performance and scalability

As 1928-2007 Child-Resistant Packaging - Requirements and Testing Procedures For Reclosable Packages (ISO 831
No ratings yet
As 1928-2007 Child-Resistant Packaging - Requirements and Testing Procedures For Reclosable Packages (ISO 831
10 pages
Dropbox Reseller Program Guide
No ratings yet
Dropbox Reseller Program Guide
9 pages
SW Mod1 PartB
No ratings yet
SW Mod1 PartB
33 pages
Semantic Web: Textbook: Introduction To The Semantic
No ratings yet
Semantic Web: Textbook: Introduction To The Semantic
19 pages
Chapter 6. Search Semantic and Recommendation Technology
No ratings yet
Chapter 6. Search Semantic and Recommendation Technology
29 pages
Comparative Study On Semantic Search Engines
No ratings yet
Comparative Study On Semantic Search Engines
9 pages
Semantic Search: With Contributions From Thanh Tran (KIT)
No ratings yet
Semantic Search: With Contributions From Thanh Tran (KIT)
78 pages
Webmininglec
100% (1)
Webmininglec
75 pages
Semantic Web Unit - 1 & 2
No ratings yet
Semantic Web Unit - 1 & 2
16 pages
A Survey On Semantic Web Search Engines: October 2011
No ratings yet
A Survey On Semantic Web Search Engines: October 2011
8 pages
Comp Sci - IJCSE - Topic Specfic Concept - Sonam Arora
No ratings yet
Comp Sci - IJCSE - Topic Specfic Concept - Sonam Arora
12 pages
7 CurrentTrendsAndIssues
No ratings yet
7 CurrentTrendsAndIssues
50 pages
CS571 Note
No ratings yet
CS571 Note
2 pages
Preparation
No ratings yet
Preparation
10 pages
SNS Unit I
100% (1)
SNS Unit I
31 pages
World Wide Web Web Pages Mine Data Databases Open Directories Web Directories Algorithmically
No ratings yet
World Wide Web Web Pages Mine Data Databases Open Directories Web Directories Algorithmically
7 pages
Search Engine
100% (2)
Search Engine
42 pages
Search Engine
No ratings yet
Search Engine
42 pages
241-320 Design Architecture and Engineering For Intelligent System
No ratings yet
241-320 Design Architecture and Engineering For Intelligent System
46 pages
IR Module 3
No ratings yet
IR Module 3
45 pages
The Role of Semantics in Searching For Information On The Web
No ratings yet
The Role of Semantics in Searching For Information On The Web
12 pages
SNS Unit 1 Notes
No ratings yet
SNS Unit 1 Notes
25 pages
Darknet Report
No ratings yet
Darknet Report
27 pages
Introduction To The Semantic Web (Tutorial) 2009 Semantic Technology Conference San Jose, California, USA June 15, 2009 Ivan Herman, W3C
No ratings yet
Introduction To The Semantic Web (Tutorial) 2009 Semantic Technology Conference San Jose, California, USA June 15, 2009 Ivan Herman, W3C
191 pages
Web Search Engingine Indexing Crawling and Ranking
No ratings yet
Web Search Engingine Indexing Crawling and Ranking
63 pages
An Introduction To The Semantic Web: Brian Kelly UK Web Focus Ukoln University of Bath B.Kelly@ukoln - Ac.uk
No ratings yet
An Introduction To The Semantic Web: Brian Kelly UK Web Focus Ukoln University of Bath B.Kelly@ukoln - Ac.uk
40 pages
Meta Search Engines
No ratings yet
Meta Search Engines
48 pages
Web Technology
No ratings yet
Web Technology
17 pages
Query and Reporting Tools: Search Engine Architecture
No ratings yet
Query and Reporting Tools: Search Engine Architecture
5 pages
Seach Engine
50% (2)
Seach Engine
18 pages
Semantic Web: Seminar Report
No ratings yet
Semantic Web: Seminar Report
13 pages
Social Network Analysis
No ratings yet
Social Network Analysis
117 pages
Unit I
No ratings yet
Unit I
25 pages
The Wisdom of Crowds: Web Mining or
No ratings yet
The Wisdom of Crowds: Web Mining or
50 pages
The Semantic Web in Action
No ratings yet
The Semantic Web in Action
8 pages
Slides
No ratings yet
Slides
191 pages
Chapter - 2 Literature Survey: S. No Page No
No ratings yet
Chapter - 2 Literature Survey: S. No Page No
22 pages
Internet Searching Technique - Last Edited
No ratings yet
Internet Searching Technique - Last Edited
36 pages
Search Engine
No ratings yet
Search Engine
35 pages
HTML Tags Working
No ratings yet
HTML Tags Working
6 pages
Datamining
No ratings yet
Datamining
21 pages
Module 1
No ratings yet
Module 1
81 pages
Chapter 2
No ratings yet
Chapter 2
23 pages
08 Web Search and Web Crawling
No ratings yet
08 Web Search and Web Crawling
33 pages
UNIT-1 Notes
No ratings yet
UNIT-1 Notes
28 pages
Relative Insertion of Business To Customer URL by Discover Web Information Schemas
No ratings yet
Relative Insertion of Business To Customer URL by Discover Web Information Schemas
4 pages
Web Devlopment
From Everand
Web Devlopment
Netra
No ratings yet
Search Engine Using Apache Lucene
No ratings yet
Search Engine Using Apache Lucene
5 pages
Working of Search Engines: Avinash Kumar Widhani, Ankit Tripathi and Rohit Sharma Lnmiit
No ratings yet
Working of Search Engines: Avinash Kumar Widhani, Ankit Tripathi and Rohit Sharma Lnmiit
13 pages
Semantic Web Analysis
No ratings yet
Semantic Web Analysis
16 pages
Module I
No ratings yet
Module I
85 pages
Semantic Web-Module1
No ratings yet
Semantic Web-Module1
75 pages
Chap 2
No ratings yet
Chap 2
29 pages
Web Search Engine
No ratings yet
Web Search Engine
26 pages
Data Mining Unit 5
No ratings yet
Data Mining Unit 5
36 pages
Semantic Web Tech
No ratings yet
Semantic Web Tech
35 pages
UNIT 3 Notes
No ratings yet
UNIT 3 Notes
32 pages
How A Search Engine Works - Slide
No ratings yet
How A Search Engine Works - Slide
40 pages
Deep Web: Under The Guidance of Prof. Pushpak Bhattacharyya
No ratings yet
Deep Web: Under The Guidance of Prof. Pushpak Bhattacharyya
35 pages
Chapter 2
No ratings yet
Chapter 2
45 pages
Web Mining1
No ratings yet
Web Mining1
87 pages
Web Scraping with Python Step by Step: A Practical Guide with Examples
From Everand
Web Scraping with Python Step by Step: A Practical Guide with Examples
William E. Clark
No ratings yet
2596 (Asme)
No ratings yet
2596 (Asme)
1 page
Refrigerated Vessel Survey Report R1.3 - Part A
No ratings yet
Refrigerated Vessel Survey Report R1.3 - Part A
9 pages
Maharashtra Pollution Control Board: SR No Product Maximum Quantity UOM
No ratings yet
Maharashtra Pollution Control Board: SR No Product Maximum Quantity UOM
11 pages
COP Data Diodes
No ratings yet
COP Data Diodes
13 pages
Light Street Led PDF
No ratings yet
Light Street Led PDF
12 pages
Stayfit Health and Fitness World PVT LTD..., PDF
0% (1)
Stayfit Health and Fitness World PVT LTD..., PDF
68 pages
Input SLD Datablock: One-Line Diagram - OLV1 (Edit Mode)
No ratings yet
Input SLD Datablock: One-Line Diagram - OLV1 (Edit Mode)
4 pages
Plasterroc MPR
No ratings yet
Plasterroc MPR
2 pages
2-Cis Zoli
No ratings yet
2-Cis Zoli
4 pages
T5-600/T5-1000 Electrical Tester: Instruction Card
No ratings yet
T5-600/T5-1000 Electrical Tester: Instruction Card
2 pages
Resume - Lalith Thard PDF
No ratings yet
Resume - Lalith Thard PDF
2 pages
Assignment For Mtech
No ratings yet
Assignment For Mtech
5 pages
TT-3026DMS Installation Manual
No ratings yet
TT-3026DMS Installation Manual
70 pages
Calculating Office Overhead: For The Building and Trade Contractor
No ratings yet
Calculating Office Overhead: For The Building and Trade Contractor
5 pages
ASTM
No ratings yet
ASTM
4 pages
Prog
No ratings yet
Prog
16 pages
Edgenta Ar2022
100% (1)
Edgenta Ar2022
223 pages
Phoenix DS - C Clamp Encoder PDF
No ratings yet
Phoenix DS - C Clamp Encoder PDF
2 pages
Dziwnerzeczyzinternetuojezusmariaokej
No ratings yet
Dziwnerzeczyzinternetuojezusmariaokej
24 pages
En 840-2 - 2004
No ratings yet
En 840-2 - 2004
18 pages
L-4082 Gear Imp Motor Manual 1-06
No ratings yet
L-4082 Gear Imp Motor Manual 1-06
12 pages
F-2002121 Ansul
No ratings yet
F-2002121 Ansul
4 pages
DC System Technical Data Sheet
0% (1)
DC System Technical Data Sheet
4 pages
Method Statement Form: Location of Activity
No ratings yet
Method Statement Form: Location of Activity
2 pages
Internal Audit Role
100% (2)
Internal Audit Role
12 pages
NM - 18 SNMP Cisco Traps
No ratings yet
NM - 18 SNMP Cisco Traps
70 pages
Emission Measurements of Industrial Valves According To TA Luft and en ISO 15848-1-Riedl - LR
No ratings yet
Emission Measurements of Industrial Valves According To TA Luft and en ISO 15848-1-Riedl - LR
3 pages
D.O. 111, S 2017
No ratings yet
D.O. 111, S 2017
8 pages

Semantic Web (CS1145) : Department Elective (Final Year) Department of Computer Science & Engineering

Uploaded by

Semantic Web (CS1145) : Department Elective (Final Year) Department of Computer Science & Engineering

Uploaded by

Semantic Web

Google, one of the most popular search engines, at best can

Camera is a root concept in this vocabulary; we can also call it a root

You might also like