0% found this document useful (0 votes)
243 views

Semantic Web

The document discusses the history and vision of the World Wide Web and the problems with information retrieval on the current web. It then introduces the concept of the Semantic Web as a solution. Specifically: 1) It provides a brief history of the development of the World Wide Web from 1969 to 1990. 2) It outlines some key problems with the current web related to information retrieval, extraction, and maintenance including a lack of explicit semantics and context. 3) It introduces the vision of the Semantic Web as proposed by Tim Berners-Lee, where web content would be annotated with explicit semantic metadata that can be processed by machines to address the current problems.

Uploaded by

Ankur Biswas
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
243 views

Semantic Web

The document discusses the history and vision of the World Wide Web and the problems with information retrieval on the current web. It then introduces the concept of the Semantic Web as a solution. Specifically: 1) It provides a brief history of the development of the World Wide Web from 1969 to 1990. 2) It outlines some key problems with the current web related to information retrieval, extraction, and maintenance including a lack of explicit semantics and context. 3) It introduces the vision of the Semantic Web as proposed by Tim Berners-Lee, where web content would be annotated with explicit semantic metadata that can be processed by machines to address the current problems.

Uploaded by

Ankur Biswas
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 86

SEMANTIC WEB

U N D E RSTA NDING I N B R I E F
INTRODUCTION
W EB OF D OCUMENTS V S . W EB OF D ATA

Ankur Biswas 4/4/2016 2


A Walk Through Brief History of World
Wide Web
1969 ARPANET (Advanced Research Project Agency)
launched
In 1980, Tim Berners-Lee built ENQUIRE, as a personal
database of people and software models, a way to WWW's historical logo designed
play with hypertext; each new page of information in by Robert Cailliau

ENQUIRE had to be linked to an existing page.


In 1990, Berners-Lee built all the tools necessary for
working Web: HTTP 0.9, HTML, First Web Browser
(Web-Editor), the first HTTP server software (CERN
httpd), the first web server (http://info.cern.ch), and
the first Web pages that described the project itself.
The NeXTcube used by Tim
Berners-Lee at CERN became
the first Web server.

Ankur Biswas 4/4/2016 3


How big is web???
As per http://www.worldwidewebsize.com/
the Indexed Web contains at least 4.84 billion
pages (Thursday, 25 February, 2016).
Early estimates suggested that the deep web
is 400 to 550 times larger than the surface
web.
Since more information and sites are always
being added, it can be assumed that the deep
web is growing exponentially at a rate that
cannot be quantified.

Ankur Biswas 4/4/2016 4


Understanding Information in the WWW

What is important and how do you know?


What is information, what is advertisement?
What does information mean?
How credible or trustworthy is the information?
What is redundant?

Ankur Biswas 4/4/2016 5


Understanding the Importance of Meaning
SEMANTICS: It is part of the linguistics focused on Sense & Meaning of
language or symbols of language.
It is study of interpretation of sign or symbols as used by agents or
communities within particular circumstances and contexts.
Semantics asks, how sense and meaning of complex concepts can be
derived from simple concepts based on the rules of syntax.
The semantics of a message depends of its context and pragmatics.

Dealing with things sensibly and realistically in a way that is based on practical rather than
theoretical considerations.

Ankur Biswas 4/4/2016 6


Understanding the Importance of Meaning
SYNTAX: In grammatics denotes the study of the principles
and processes by which sentences are constructed in
particular language.
In formal languages, syntax is just a set of rules, by which
well formed expressions can be created from a fundamental
set of symbols (alphabet).
In computer science, Syntax defines the normative structure
of data.

Ankur Biswas 4/4/2016 7


Understanding the Importance of Meaning
CONTEXT: It denotes the surrounding expressions (concepts) in an
expressing represents its relationship with surrounding expressions
(concepts) and further related elements.
Context denotes all elements of any sort of communications that
define the interpretation of the communicated content e.g.
General contexts: place, time, interrelation of action in message.
Personal or Social contexts: relation between sender and receiver of a message.

PRAGMATICS: It reflects the intention by which the language is used to


communicate a message.
In linguistic pragmatics denotes the study of applying language in
different situations It also denotes the intended purpose of speaker.
Pragmatics studies the ways in which context contributes to meaning
Ankur Biswas 4/4/2016 8
The limits of web
Traditional key based search leads to many irrelevant results.
Ex.- From a simple term Jaguar it is not clear if the user mean car or animal or
OS(Mac OS X Jaguar)

POLYSEMY: If you get some result for your search and get some other
result as well with different meaning having same or similar name.

Ankur Biswas 4/4/2016 9


Problem 1: Information Retrieval
Jaguar (animal) Panthera Onca
Traditional keyword-based search doesnt find all results.
Synonyms & metaphors (Not always addressed properly which results undesired
results)

Untyped Links Untyped Links Untyped Links


API/
HTML HTML HTML XML

A B C D

Primary objects: documents


Degree of structure in data: fairly low
Implicit semantics of contents
Designed for: human consumption
Ankur Biswas 4/4/2016 10
Problem 2: Information Extraction

Identifying contents written in other languages e.g. Japanese or


Bengali
Pictures doesnt give any information to search engines that what it
shows.
Example Google identifies the caption or name of the picture which
is embedded in it and makes it a reference keyword.

Ankur Biswas 4/4/2016 11


Problem 2: Information Extraction (Cont.)

? Are two Documents


Things Things
talking about same
Thing???
? ?
?
? ? ?
Untyped Links Untyped Links Untyped Links
API/
HTML HTML HTML XML

A B C D

Ankur Biswas 4/4/2016 12


Problem 2: Information Extraction (Cont.)
Can only be solved, correctly by a human agent
Heterogeneous distribution and order of information.
Software agent does not have sufficient:
Knowledge of contexts
World knowledge and
Experience
To solve problem
Hence it will not be able to solve the problem without explicit
semantic available.
Implicit knowledge, i.e. information doesnt have specified explicitly
but must be derived via logical deductions from available information.

Ankur Biswas 4/4/2016 13


Problem 3: Maintenance
The more complex and voluminous a website is , the more complicated is the
maintenance of the only weakly structured data.
Problems:
Syntactic consistency error: You have linked your webpage to another
webpage having some related content but now the webpage has moved to
some other place and the link to that address still exist.
http 404 Error: File/Page not found
Semantic (link) consistency error: This is even more dangerous where
hyperlinked destinations is consistently changing.
Correctness: It is tough to maintain correctness over time in automated
manner
Timeliness: Tracking the changes over time is really tough.

Ankur Biswas 4/4/2016 14


Problem 4: Personalization

Adaption of the presented information content to personal


requirements:
User normally password protect their details and hence it becomes tough to access
any such kind of information.

Problems:
From where do we get the required (personal) information?
Personalization vs Data Security

Ankur Biswas 4/4/2016 15


INTRODUCTION TO
SEMANTIC WEB TECHNOLOGIES
T HE V I SI ON OF T HE S EMANTI C W EB

Ankur Biswas 4/4/2016 16


The vision of the Semantic Web
Semantic Web concept was first introduced in 1990s by
Tim Berners Lee who is also one of the creator of internet.
Precondition:
Content can be read and
interpreted correctly
(understood) by machines

Natural language Processing


Semantic Web
Technologies of Traditional
Information Retrieval (Search Natural language web content will
Engines) be explicitly annotated with
semantic metadata
Semantic metadata encode the
Meaning (Semantics) of the
content and can be read and
interpreted correctly by machines.
Ankur Biswas 4/4/2016 17
How Can we Achieve the Semantic Web?
The Original Vision
Instead of publishing information to be consumed by
humans, publish machine-processable data and metadata
using terms/languages that can be understood by machines.
Build machines (agents) that will search for, query, integrate
etc. this data.
Make sure all agents understand your terms/languages.

Ankur Biswas 4/4/2016 18


The Semantic Web and Linked Data Vision
Today
The Semantic Web is a web of data. There is lots of data we all use
every day, and it is not part of the web.
The Semantic Web is about two things:
It is about common formats for integration and combination of data drawn from
diverse sources, where on the original Web mainly concentrated on the
interchange of documents.
It is also about language for recording how the data relates to real world objects.

That allows a person, or a machine, to start off in one database, and


then move through an unending set of databases which are connected
not by wires but by being about the same thing.

Ankur Biswas 4/4/2016 19


Semantic Web Technology Stack

Most apps use only a subset of


the stack
Querying allows fine-grained
data access
Standardized information
exchange is a key
Formats are necessary but not
too important
The semantic web is based on
the web

Ankur Biswas 4/4/2016 20


Basic Layer of Semantic Web Technology
Stack
The foundation of the layer is World Wide Web. Hence we rely on all technologies in
world wide web.
Semantic version of Wikipedia is DBpedia.
As Wikipedia is having template hence data is somewhat structured.
DBpedia extracts data from Wikipedia infoboxes.
DBpedia is having machine readable language RDF

Dbpedia stores & publishes the result in RDF and a few other formats.
It also hosts a community effort to define extractors for the data, that can be used
well beyond Wikipedia.
It provides a number of services around the extracted data, like DBpedia mobile, a
SPARQL endpoint, a faceted browser, a number of mappings to external ontologies,
an ontology itself, etc.

Ankur Biswas 4/4/2016 21


Semantic Web Technologies
A set of technologies and frameworks that enable
the Web of Data:
Resource Description Framework (RDF)
A variety of data interchange formats (e.g. RDF/XML, N3, Turtle, N-
Triples)
Notations such as RDF Schema (RDFS) and the Web Ontology
Language (OWL)
All are intended to provide a formal description of concepts, terms,
and relationships within a given knowledge domain
Specialized query language (SPARQL) is just like SQL but can be more
complicated and may be based on graph extraction

Ankur Biswas 4/4/2016 22


Application in Web of Data
Linked Data
Linked Open Data (LOD) denote publicly available (RDF) Data in the web,
identification via URI and accessible via HTTP. Linked data

Web of Data:
>31 billion Facts
>500 million Links
(Oct 2011)

Ankur Biswas 4/4/2016 23


What is so special about BBC Music Website?
Information is dynamically aggregated from
external, publicly available data (Wikipedia,
Music Brainz,)
No Screen Scrapping
No specialized API
Data available as Linked Open Data.
Data access via simple HTTP Request
Data is always up to date without manual
Ankur Biswas
interaction. 4/4/2016 24
How to build such a site 1.
Site editors roam the Web for new facts
may discover further links while roaming
They update the site manually
And the site gets soon out-of-date

Ankur Biswas 4/4/2016 25


How to build such a site 2.
Editors roam the Web for new data published on
Web sites
Scrape the sites with a program to extract the
information
i.e., write some code to incorporate the new data
Easily get out of date again

Ankur Biswas 4/4/2016 26


How to build such a site 3.
Editors roam the Web for new data via API-s
Understand those
input, output arguments, datatypes used, etc.
Write some code to incorporate the new data
Easily get out of date again

Ankur Biswas 4/4/2016 27


The choice of the BBC
Use external, public datasets
Wikipedia, MusicBrainz,
They are available as data
not API-s or hidden on a Web site
data can be extracted using, e.g., HTTP requests or
standard queries

Ankur Biswas 4/4/2016 28


Its all documented

Ankur Biswas 4/4/2016 29


Search Engines Document Retrieval
General Problems:
Correct interpretation of query
string ->
Somehow the context of user has
to be considered
e.g. what was the query of the user
just before a specific query or their
usual preferences etc.
Correct identification of entities
Automatic disambiguation
Usability
personalization

Ankur Biswas 4/4/2016 30


Intelligent Agents in Semantic Web
WO R LD W I DE W E B S E M A N TI C W E B

USER USER

Personal
Assistant

Intelligent
Infrastructure
Presentation Retrieval Service Services
Service (e.g. (e.g. Google)
Firefox)

www documents
Ankur Biswas www documents 4/4/2016 31
3 Generations of Web Documents
1st Generation 2nd Generation 3rd Generation

Static Web Interactive Virtual


Pages Web Pages Web Pages

Netbots
Java Script/ Applets Information Extraction
Presentation Planning

HTML / CSS
Database Access User Model
Template Based Machine Learning
Generation Online Layout

Dynamic Web Adaptive Web


Pages Pages

Ankur Biswas 4/4/2016 32


Toolbox for the Semantic Web
Standardized Language to express semantic of information content in the
web (XML/XSD, RDF(S), OWL, RIF)
Tools of semantic information in the web (RDFa, GRDDL,)
Various Field of computer science:
Artificial Intelligence Computer Architecture
Linguistics Software Engineering
Cryptography
Systems Theory
Database
Computer Networks
Theoretical Computer Science

Ankur Biswas 4/4/2016 33


Basic Architecture of Semantic Web - I
Uniform Different types of
resource identifiers all
constructed according to
uniform schema.
Resource Whatever may be
identified by URI
Identifier To distinguish one
resource from another

Ankur Biswas 4/4/2016 34


Uniform Resource Identifier (URI)
A Uniform Resource Identifier (URI) defines a simple and extensible
schema for world wide unique identification of abstract or physical
resources.
Resources can be every object with a clear identity (according to the context of
the application)
As e.g. webpages, books, locations, persons, relations among objects, abstract concepts,
etc.
The concept of URI is already established in various domains as e.g.
The Web(URL (uniform resource locator), PRN (persistent uniform names), pURL
(persistent uniform resource locator)
Books & Publications (ISBN, ISSN)
Digital Object Identifier (DOI)

Ankur Biswas 4/4/2016 35


Uniform Resource Identifier (URI)
URI Combines URI Generic Syntax
Address (Locator) URI=schema://[userinfo@]host[:port]
Uniform Resource Locator (URL, RFC [path][?query][#fragment]
1738) Schema: e.g. http, ftp, mailto
Denotes, where a resource can be Userinfo: e.g. username; password
found in the web by stating its
primary access mechanism Host: e.g. Domain name, IPv4/IPv6
Might change during life time. Address
Identity (Name) Port: e.g. :80 stands for http port
Uniform Resource Name (URN, RFC Path: e.g. path in file system of
2141)
WWW server
Persistent Identifier for a web
resource Query: e.g. parameters to be passed
Remains unchanged during life cycle over to applications
Fragment: e.g. determines specific
fragment of a document

Ankur Biswas 4/4/2016 36


Data on the Web is not enough
We need a proper infrastructure for a real Web of
Data
data is available on the Web
accessible via standard Web technologies
data are interlinked over the Web
i.e., data can be integrated over the Web
This is where Semantic Web technologies come in
We will use a simplistic example to introduce the
main Semantic Web concepts
Ankur Biswas 4/4/2016 37
The rough structure of data integration
Map the various data onto an abstract data
representation
make the data independent of its internal
representation
Merge the resulting representations
Start making queries on the whole!
queries not possible on the individual data sets

Ankur Biswas 4/4/2016 38


We start with a book...

Ankur Biswas 4/4/2016 39


A simplified bookstore data
(dataset A)
ID Author Title Publisher Year
ISBN 0-00-6511409-X id_xyz The Glass Palace id_qpr 2000

ID Name Homepage
id_xyz Ghosh, Amitav http://www.amitavghosh.com

ID Publishers name City


id_qpr Harper Collins London

Ankur Biswas 4/4/2016 40


1st: we export our data as a set of relations
The Glass Palace
http://isbn/000651409X
2000

London a:author
Harper Collins

a:name
a:homepage

Ghosh, Amitav http://www.amitavghosh.com

Ankur Biswas 4/4/2016 41


Some notes on the exporting the data
Relations form a graph
the nodes refer to the real data or contain some literal
how the graph is represented in machine is immaterial for now

Data export does not necessarily mean physical conversion of the data
relations can be generated on-the-fly at query time
via SQL bridges
scraping HTML pages
extracting data from Excel sheets
etc.

One can export part of the data

Ankur Biswas 4/4/2016 42


Same book in French

Ankur Biswas 4/4/2016 43


Another bookstore data
(dataset F)
A B C D

1
ID Titre Traducteur Original
2
ISBN 2020286682 Le Palais des Miroirs $A12$ ISBN 0-00-6511409-X
3

6
ID Auteur
7
ISBN 0-00-6511409-X $A11$
8

10
Nom
11
Ghosh, Amitav
12
Besse, Christianne
Ankur Biswas 4/4/2016 44
2nd: export your second set of data
http://isbn/000651409X Le palais des miroirs

f:auteur

http://isbn/2020386682

f:traducteur

f:nom
f:nom
Ghosh, Amitav

Besse, Christianne

Ankur Biswas 4/4/2016 45


3rd: start merging your data
The Glass Palace
http://isbn/000651409X

2000

Same URI!
London
a:author

Harper Collins http://isbn/000651409X

a:name
a:homepage
Le palais des miroirs

f:auteur
Ghosh, Amitav
http://www.amitavghosh.com
http://isbn/2020386682

f:traducteur

f:nom
f:nom
Ghosh, Amitav

Besse, Christianne

Ankur Biswas 4/4/2016 46


3rd: start merging your data
The Glass Palace
http://isbn/000651409X
2000

London
a:author
Harper Collins f:original

a:name f:auteur
a:homepage

Le palais des miroirs


Ghosh, Amitav
http://www.amitavghosh.com

http://isbn/2020386682

f:traducteur

f:nom
f:nom
Ghosh, Amitav

Besse, Christianne

Ankur Biswas 4/4/2016 47


Start making queries
User of data F can now ask queries like:
give me the title of the original
well, donnes-moi le titre de loriginal

This information is not in the dataset F


but can be retrieved by merging with dataset A!

Ankur Biswas 4/4/2016 48


However, more can be achieved
We feel that a:author and f:auteur should be the same
But an automatic merge does not know that!
Let us add some extra information to the merged data:
a:author same as f:auteur
both identify a Person
a term that a community may have already defined:
a Person is uniquely identified by his/her name and, say, homepage
it can be used as a category for certain type of resources

Ankur Biswas 4/4/2016 49


3rd revisited: use the extra knowledge
The Glass Palace
http://isbn/000651409X
2000
Le palais des miroirs
f:original
f:original

London
a:author http://isbn/2020386682
Harper Collins f:auteur
f:auteur

r:type f:traducteu
f:traducteur
r:type
r:type r
a:name
a:name
a:homepage http://foaf/Person
f:nom a:homepage
f:nom
f:nom

Besse, Christianne
Ghosh, Amitav
http://www.amitavghosh.com

Ankur Biswas 4/4/2016 50


Start making richer queries!
User of dataset F can now query:
donnes-moi la page daccueil de lauteur de loriginal
well give me the home page of the originals auteur

The information is not in datasets F or A


but was made available by:
merging datasets A and datasets F
adding three simple extra statements as an extra glue

Ankur Biswas 4/4/2016 51


Combine with different datasets
Using, e.g., the Person, the dataset can be combined with
other sources
For example, data in Wikipedia can be extracted using
dedicated tools
e.g., the dbpedia project can extract the infobox information
from Wikipedia already

Ankur Biswas 4/4/2016 52


Merge with Wikipedia data
The Glass Palace
http://isbn/000651409X
2000
Le palais des miroirs
f:original

London
a:author
a:author http://isbn/2020386682
Harper Collins f:auteur

r:type f:traducteu
f:traducteur
r
a:name r:type
a:homepage http://foaf/Person
f:nom
f:nom
r:type
w:isbn
Besse, Christianne
Ghosh, Amitav http://www.amitavghosh.com

w:reference http://dbpedia.org/../The_Glass_Palace
foaf:name
w:author_of
http://dbpedia.org/../Amitav_Ghosh
w:born_in http://dbpedia.org/../Kolkata

http://dbpedia.org/../The_Hungry_Tide
w:author_of
w:author_of w:long
w:lat
http://dbpedia.org/../The_Calcutta_Chromosome

Ankur Biswas 4/4/2016 53


Search Engines Fact
Retrieval
Query String: International Space Station - 17th
March 2016

What is International Space Station?


Is it orbiting on 17th March 2016?
How to compute the position of satellite on the
said date
External Data to be considered:
Constellation data
Planet data
Satellite data

Ankur Biswas 4/4/2016 54


RDF
RDF stands for
Resource: pages, dogs, ideas...
everything that can have a URI
Description: attributes, features, and
relations of the resources
Framework: model, languages and
syntaxes for these descriptions

RDF is a triple model i.e. every piece of


knowledge is broken down into
( subject , predicate , object )

Ankur Biswas 4/4/2016 55


RDF
doc.html has for author Ankur
and has for theme Research
doc.html has for author Ankur
doc.html has for theme Research
( doc.html , author , Ankur )
( doc.html , theme , Research )

( subject , predicate , object )

Ankur Biswas 4/4/2016 56


RDF is also a graph model to link the descriptions of resources
RDF triples can be seen as arcs
of a graph (vertex,edge,vertex)

Ankur Research

Author Theme
Doc.html

Ankur Biswas 4/4/2016 57


Resource Description Framework (RDF)
Another Triple Model:
Subject Predicate Object

Renee Miller Teaches CSC433

Renee Miller Lives in Toronto

<URI> <URI> <URI> or Literal

<http://cs.toronto.edu/~miller> <http://xmlns.com/foaf/spec/#term_based_near> <http://dbpedia.org/resource/Toronto>

<http://cs.toronto.edu/~miller> <http://xmlns.com/foaf/spec/#term_based_near> Toronto

rdf: type
bb: renee-j-miller foaf: Person

Renee J. Miller
foaf: name
dbpedia: Toront0
foaf: Friend of a Friend foaf: based_near
Ankur Biswas 4/4/2016 58
A Simple RDF Example (in RDF/XML)
rdf: type
bb: renee-j-miller foaf: Person

Renee J. Miller
foaf: name
dbpedia: Toront0
foaf: based_near

<?xml version="1.0"?>
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:foaf="http://xmlns.com/foaf/spec/#"
xmlns:bb="http://data.bibbase.org/ontology/">
<rdf:Description rdf:about="http://.../author/renee-j-miller/">
<rdf:type rdf:resource="http://xmlns.com/foaf/spec/#term_Person"/>
<foaf:name xml:lang=en">Rene J. Miller</foaf:name>
<foaf:based_near
rdf:resource="http://dbpedia.org/resource/Toronto"/>
</rdf:Description>
</rdf:RDF>

Ankur Biswas 4/4/2016 59


A Simple RDF Example (in Turtle)
rdf: type
bb: renee-j-miller foaf: Person

Renee J. Miller
foaf: name
dbpedia: Toront0
foaf: based_near

@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .


@prefix foaf: <http://xmlns.com/foaf/spec/#> .
@prefix bb: <http://data.bibbase.org/ontology/> .

<http://data.bibbase.org/author/renee-j-miller/>
rdf:type foaf:person .
foaf:name Rene J. Miller@en ;
foaf:based_near <http://dbpedia.org/resource/Toronto>

Ankur Biswas 4/4/2016 60


A Simple RDF Example (in RDFa)
rdf: type
bb: renee-j-miller foaf: Person

Renee J. Miller
foaf: name
dbpedia: Toront0
foaf: based_near

<p about="http://.../author/renee-j-miller">The author


<span property=foaf:name lang=en>Rene J. Miller</span>
lives in the city
<span rel=foaf:based_near
resource="http:///Toronto">Toronto</span>
</p> .

Ankur Biswas 4/4/2016 61


SPARQL stands for SPARQL Protocol
and RDF Query Language.
It is the standard query language for
RDF data proposed by the W3C.
It is based on matching graph
patterns against RDF graphs.
The simplest kind of graph pattern is
a triple pattern.
A triple pattern is like an RDF
triple, but with the option of a
variable in the subject, predicate or
object positions.

Ankur Biswas 4/4/2016 62


Example Dataset
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-
syntax-ns#> .
@prefix foaf: <http://xmlns.com/foaf/spec/#> .
@prefix bb: <http://data.bibbase.org/ontology/> .

<http://data.bibbase.org/author/renee-j-miller/>
rdf:type foaf:person .
foaf:name Rene J. Miller@en ;
foaf:based_near [ rdf: type foaf:Place;
foaf:name Toronto] .

Ankur Biswas 4/4/2016 63


Example SPARQL Query
SELECT ?name
WHERE { ?x foaf:name ?name .
?x rdf:type foaf:Person .
?x foaf:based_near ?y .
?y foaf:name Toronto .
}

Result

?name
Rene J. Miller

Ankur Biswas 4/4/2016 64


Ankur Biswas 4/4/2016 65
Example SPARQL Query

Ankur Biswas 4/4/2016 66


SPARQL 1.0 allows
Extraction of Data as
URIs, Blank Nodes, typed & un-typed Literals
RDF Subgraphs

Exploration of data via Query for unknown relations.


Execution of complex join operations heterogeneous databases in a
single query
Transformation of RDF Data from one Vocabulary to another
Construction of new RDF Graphs based on RDF Query Subgraph

Ankur Biswas 4/4/2016 67


SPARQL 1.1 (in progress) allows
Additional Query Features
Aggregate function, subqueries, negations, project expressions, property paths,

Enables logical Entailment for


RDF, RDFS, OWL Direct & RDFS Based Semantic entailment and RIF Core
entailment

Enables update of RDF graphs as a full data manipulation language


Enables the discovery of information about the SPARQL service
Enables Federated Queries distributed over different SPARQL.

Ankur Biswas 4/4/2016 68


SPARQL usage in practice
SPARQL is usually used over the network
Separate documents define the protocol and the result format
SPARQL Protocol for RDF with HTTP and SOAP bindings
SPARQL results in XML or JSON formats

Big datasets often offer SPARQL endpoints using this


protocol
Typical example: SPARQL endpoint to DBpedia

Ankur Biswas 4/4/2016 69


SPARQL as a unifying point

SPARQL Endpoint
SPARQL Endpoint
Applications

SPARQL Processor
Triple Store Database

NLP Technique

SQL RDF
RDF Graph Relational Database

HTML Unstructured Text XML/XHTML


Ankur Biswas 4/4/2016 70
Based on presentation by Ivan Herman, available at http://www.w3.org/2010/Talks/0622-SemTech-IH/
Other Semantic Web Technologies
Web Ontology Language (OWL)
A family of knowledge representation languages for authoring ontologies for
the Web
RDF Schema (RDFS)
RDF Vocabulary Description Language
http://www.w3.org/TR/rdf-schema/
How to use RDF to describe RDF vocabularies

Other RDF Vocabularies


Simple Knowledge Organization System (SKOS)
Designed for representation of thesauri, classification schemes, taxonomies,
subject-heading systems, or any other type of structured controlled
vocabulary

FOAF (Friend of a friend)


A machine-readable ontology describing persons, their activities and their
relations to other people and object

Ankur Biswas 4/4/2016 71


ONTOLOGIES
E XI STI NG OF B EI NG

Ankur Biswas 4/4/2016 72


Ontologies
An ontology is a formal, explicit, shared specification of a
conceptualization of a domain (Gruber, 1993).
Conceptualization: the objects, concepts, and other entities that are
assumed to exist in some area of interest and the relationships that
hold among them. A conceptualization is an abstract, simplified view
of the world that we wish to represent for some purpose.
The term ontology is borrowed from Philosophy, where ontology is a
systematic account of existence (what things exist, how they can be
differentiated from each other etc.).
Today the word ontology is a synonym for a shared knowledge base.

Ankur Biswas 4/4/2016 73


Ontologies Components & Models
Classes, Relations & Instances
The address contains the name, title and
place of address of a person
Classes represent concepts
Classes are described by Informal Description
attributes
Address
Attributes are name value pairs First name <string>
Family name <string>
Street <string>
PIN Code <int>
City <string>

Semi - Informal Description


Ankur Biswas 4/4/2016 74
Learning Ontologies

Ankur Biswas 4/4/2016 75


Very Large Ontologies
Recently there has been a lot of work on developing very large
ontologies that capture various areas of human knowledge and
deploying this knowledge in applications such as search engines or
question answering.
Example: Watson, IBMs question answering system that beat humans
in the quiz show Jeopardy (http://www-
03.ibm.com/innovation/us/watson/index.html ).

Ankur Biswas 4/4/2016 76


5 Open Data by Tim Berners-Lee
Tim Berners-Lee, the inventor of the Web and Linked Data initiator,
suggested a 5-star deployment scheme for Open Data. Here, we give
examples for each step of the stars and explain costs and benefits that
come along with it.

Ankur Biswas 4/4/2016 77


BY EXAMPLE

make your stuff available on the Web (whatever format) under an open
license
make it available as structured data (e.g., Excel instead of image scan of a
table)
make it available in a non-proprietary open format (e.g., CSV as well as of
Excel)
use URIs to denote things, so that people can point at your stuff

link your data to other data to provide context

Ankur Biswas 4/4/2016 78


What are the costs & benefits of Web
data?
As a consumer
You can look at it.
You can print it.
You can store it locally (on your hard drive or on an USB stick).
You can enter the data into any other system.
You can change the data as you wish.
You can share the data with anyone you like.

As a publisher
Its simple to publish.
You do not have explain repeatedly to others that they can use your data.

Ankur Biswas 4/4/2016 79


What are the costs & benefits of Web
data?
As a consumer, you can do all what you can do with Web
data and additionally:
You can directly process it with proprietary software to aggregate it,
perform calculations, visualize it, etc.
You can export it into another (structured) format.
As a publisher
Its still simple to publish.

Ankur Biswas 4/4/2016 80


What are the costs & benefits of Web
data?
As a consumer, you can do all what you can do
with Web data and additionally:
You can manipulate the data in any way you like, without the need
to own any proprietary software package.
As a publisher
You might need converters or plug-ins to export the data from the
proprietary format.
Its still rather simple to publish.

Ankur Biswas 4/4/2016 81


What are the costs & benefits of Web
data?
As a consumer, you can do all what you can do with Web data and additionally:
You can link to it from any other place (on the Web or locally).
You can bookmark it.
You can reuse parts of the data.
You may be able to reuse existing tools and libraries, even if they only understand parts of the pattern
the publisher used.
Understanding the structure of an RDF Graph of data can be more effort than tabular (Excel/CSV) or
tree (XML/JSON) data.
You can combine the data safely with other data. URIs are a global scheme so if two things have the
same URI then its intentional, and if so thats well on its way to being 5-star data!

As a publisher
You have fine-granular control over the data items and can optimize their access (load balancing,
caching, etc.)
Other data publishers can now link into your data, promoting it to 5 star!
You typically invest some time slicing and dicing your data.
Youll need to assign URIs to data items and think about how to represent the data.
You need to either find existing patterns to reuse or create your own.

Ankur Biswas 4/4/2016 82


What are the costs & benefits of Web
data?
As a consumer, you can do all what you can do with Web data and
additionally:
You can discover more (related) data while consuming the data.
You can directly learn about the data schema.
You now have to deal with broken data links, just like 404 errors in web pages.
Presenting data from an arbitrary link as fact is as risky as letting people include
content from any website in your pages. Caution, trust and common sense are all still
necessary.

As a publisher
You make your data discoverable.
You increase the value of your data.
Your own organization will gain the same benefits from the links as the consumers.
Youll need to invest resources to link your data to other data on the Web.
You may need to repair broken or incorrect links.
Ankur Biswas 4/4/2016 83
Applications
Data integration (e.g., see project Optique http://www.optique-
project.eu/)
E-government (e.g., open data)
E-commerce
Tourism
Medicine
Biology
Earth Observation (see the work of my group in projects TELEIOS
http://www.earthobservatory.eu/ and LEO
http://www.linkedeodata.eu/ ).

Ankur Biswas 4/4/2016 84


References:
Books:
Antoniou, Grigoris, and F. Van Harmelet. "A semantic web premier." England: The MIT Press
Cambridge (2004).
Segaran, Toby, Colin Evans, and Jamie Taylor. Programming the semantic web. " O'Reilly Media, Inc.", 2009.
Davies, John, Dieter Fensel, and Frank Van Harmelen. "Towards the semantic web." Ontology-Driven
Knowledge Management. Chichester (2003).

Scientific Papers:
Maedche, Alexander. Ontology learning for the semantic web. Vol. 665. Springer Science & Business Media,
2012.
Schmachtenberg, Max, Christian Bizer, and Heiko Paulheim. "Adoption of the linked data best practices in
different topical domains." The semantic webISWC 2014. Springer International Publishing, 2014. 245-260.

Video Lectures & Slides


Video lectures on Semantic Web by Dr. Harald Sack, Hasso Plattner Institute, University in Potsdam, Germany
www.cs.toronto.edu/~oktie/slides/web-of-data-intro.pdf
https://www.w3.org/2010/Talks/0622-SemTech-IH/

Websites
http://dbpedia.org/snorql/
http://5stardata.info/en/
Ankur Biswas 4/4/2016 85
Thank You
Ankur Biswas 4/4/2016 86

You might also like