0% found this document useful (0 votes)
34 views

Chap1 - Introduction To DSS and XML

This document provides an introduction to semistructured data and XML. It defines unstructured, structured, and semistructured data, explaining that semistructured data lies between unstructured and structured data and includes data formats like XML. The document then discusses XML in more detail, covering XML syntax rules, elements, attributes, namespaces, and parsers. It provides examples of XML code and definitions of key XML concepts.

Uploaded by

raniach
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
34 views

Chap1 - Introduction To DSS and XML

This document provides an introduction to semistructured data and XML. It defines unstructured, structured, and semistructured data, explaining that semistructured data lies between unstructured and structured data and includes data formats like XML. The document then discusses XML in more detail, covering XML syntax rules, elements, attributes, namespaces, and parsers. It provides examples of XML code and definitions of key XML concepts.

Uploaded by

raniach
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 26

Introduction to Semistructured

Data and XML

Semistructured Data
3rd year LMD
Department of computer science
University of Laghouat
Academic year 2023/2024
Different types of Data
o Digital data can be classified into three forms:

Unstructured
Semi-structured
Structured
Unstructured Data
o This is the data which does not conform to a data model or
is not in a form which can be used easily by a computer
program.

o Videos, audio, and binary data files might not have a specific
structure. They’re assigned to unstructured data.
Structured Data
o Structured data refers to the data which has a pre-defined
data model/schema/structure.

o Structured data is generally tabular data that is represented


by columns and rows in a database.

o In structured data, all row in a table has the same set of


columns.
Semi-Structured Data
o Semi-structured data lies between unstructured and
structured data.

o Semi-structured data does not have a defined structure like


relational database with tables.
o Semi-structured data have tags or other markers to isolate
the elements and provide a hierarchy of records and fields,
which define the data.

o This is the data which does not conform to a data model but
has some structure.

o for example, emails, XML files, JSON documents, and other


markup languages are all forms of Semi-structured data.
XML – A Solution for Semi-structured
Data Management
o XML stands for eXtensible Markup Language.
o XML is a format for storing data. Although it looks a lot
like HTML.
o XML does not define tag names, so document authors
must invent their own set of tags.
o With XML, you can exchange data between
incompatible systems.
Basic difference between XML and HTML

oHTML focuses on display of data in browser


Whereas XML focuses on representation of data.
oXML describes what’s in the document. In other
words, XML is concerned with how information is
organized, not how it is displayed.
oXML describes what the data is, while HTML
determines how to display the data to the end
user.
XML Syntax Rules (Well-Formed XML)
oXML documents must contain one root element
that is the parent of all other elements.
oXML tags are case sensitive.
oAn element must have both an opening and
closing tag, unless it is an empty element.
oAll XML elements must be properly nested.
oAttribute values must always be quoted.
<?xml version="1.0" encoding="UTF-8"?>
<!-- bibliography.xml 15 oct 2010 -->
<!DOCTYPE bibliography SYSTEM "bibliography.dtd" >
<bibliography>
<book key="Michard01" lang="fr">
<title>XML langage et applications</title>
<author>Alain Michard</author>
<year>2001</year>
<publisher>Eyrolles</publisher>
<isbn>2-212-09206-7</isbn>
</book>
<book key="Zeldman03" lang="en">
<title>Designing with web standards</title>
<author>Jeffrey Zeldman</author>
<year>2003</year>
<publisher>New Riders</publisher>
<isbn>0-7357-1201-8</isbn>
</book>
...
</bibliography>
XML declaration
o The XML document can optionally have an XML declaration.

o the code between the <?xml and the ?> is called an XML
declaration.

o if an XML document does have an XML declaration, then


that declaration must be the first thing in the document.

o This declaration contains special information for the XML


processor (the program reading the XML) indicating that
this document conforms to Version 1.0 of the XML
standard.

o Encoding declaration defines the character encoding used in


the document. UTF-8 is the default encoding used.
Comments
o XML comments are syntactically similar to HTML comments.

o Just as in HTML, they begin with <!-- and end with the first
occurrence of -->.

o The double hyphen -- must not appear anywhere inside the


comment until the closing -->.

o Comments may appear anywhere in the character data of a


document. They may also appear before or after the root
element

o comments may not appear inside a tag or inside another


comment.
<!DOCTYPE>
o <!DOCTYPE root-element SYSTEM "URI_of_DTD">

o The <!DOCTYPE> instruction allows you to specify a DTD for


an XML document.

o The SYSTEM variant specifies the URI location of a DTD for


private use in the document.
Processing Instructions
o <?target attribute1= value attribute2= value ... ?>

o A processing instruction begins with <? and ends with ?>.

o used to provide information to the application processing


an XML document. Such information may include
instructions on how to process the document, how to
display the document, and so forth.

o The most common processing instruction, xml-stylesheet, is


used to attach stylesheets to documents. It always appears
before the root element:

o <?xml-stylesheet type =″text/xsl″ href = ″biblio.xsl″?>


Elements in XML
o Elements look like this : <element></element>

o Elements can contain text, other elements, or a combination


of both. Others are empty (<element/>).

o An element name must start with a letter or an underscore.

o Element names can contain letters, numbers, hyphens,


underscores, periods, and colons when namespaces are used
(more on namespaces later).

o Element names cannot contain spaces,


Attributes
o In the element start tag you can add more information
about the element in the form of attributes:
<element attribute="value"></element>

o An attribute is a name-value pair.

o The attribute value is always quoted, using either single or


double quotes.

o Attribute names are subject to the same restrictions as


element type names.

o Example: <price currency="Euro">


Entity References
o Entity references are used as substitutions for specific
characters in XML.
o A common use for entity references is to denote document
symbols that might otherwise be mistaken for markup by an
XML processor.
o XML predefines five entity references for you, which are
substitutions for basic markup symbols. However, you can
define as many entity references as you like in your own
DTD.
o Entity references always begin with an ampersand & and
end with a semicolon ; .
Entity References
o Below are the predefined entities defined in XML :
CDATA Sections
o Syntax: <![CDATA[ ... ]]>

o You can define special marked sections of character data, or


CDATA, which the XML processor will not attempt to interpret
as markup.

o Anything that is included inside a CDATA marked section is


treated as plain text.

o CDATA marked sections begin with the characters <!


[CDATA[ and end with the characters ]]>.

o Note that you may not use entity references inside a CDATA
marked section.
XML Namesapces
XML namespace is a collection of XML elements and
attributes identified by a URI.

XML Namespaces provide a way to avoid element name


conflicts.

Help us to distinguish identically named elements from


one another.

Namespaces are similar to packages in Java.


Declaring Namesapces
Namespaces are declared as an attribute of an element.

You specify an XML Namespace through one of two reserved


attributes:

o You can specify a default XML Namespace URI using the


xmlns attribute.

o You can specify a nondefault XML Namespace URI using the


xmlns:prefix attribute, where prefix is a unique prefix
associated with this XML Namespace.
How Do I Declare a Default Namespace?
o A namespace declared without a prefix becomes the default
namespace for the document.

o The xmlns attribute has the following syntax:


xmlns=namespace

<BOOK xmlns="www.book.com" ISBN = "....">


<TITLE>Creepy Crawlies</TITLE>
<PRICE currency="US Dollar">22.95</PRICE>
</BOOK>
How Do I Declare an Explicit Namespace?
The xmlns attribute has the following syntax:
xmlns:prefix=namespace

<BOOKS>
<b:BOOK xmlns:b="www.book.com"
xmlns:m="urn:Finance:Money">
<b:TITLE>Creepy Crawlies</b:TITLE>
<b:PRICE m:currency="US Dollar">22.95</b:PRICE>
</b:BOOK>
</BOOKS>
Example
<doc xmlns:m="http://www.w3.org/1998/MathML"
xmlns:xlink="http://www.w3.org/1999/xlink">
<title>My Document</title>
<body xmlns="http://www.w3.org/1999/xhtml">
<p xlink:href="alternate.xml">I am a paragraph containing some mathematics:
<m:math><m:mi>x</m:mi></m:math>
</p>
</body>
</doc>

o The 'doc' element has no namespace.

o The 'title' element has no namespace.

o The 'body' element defaults the XHTML namespace.

o The 'p' element is in the default XHTML namespace.

o The 'href' attribute is in the namespace associated with the 'xlink' prefix declared on the
'doc' element.

o The 'math' element is in the namespace associated with the 'm' prefix declared on the
Well Formed XML Documents &
Valid XML Documents
o An XML document with correct syntax is called "Well
Formed".

o A "valid" XML document must be well formed. In addition, it


must conform to a document type definition.

o There are two different document type definitions that can


be used with XML:

 DTD - The original Document Type Definition


 XML Schema - An XML-based alternative to DTD
XML Parsers
o An XML parser is a software library or package that provides
interfaces for client applications to work with an XML
document.

o The XML Parser is designed to read the XML and create a way
for programs to use XML.

o A parser’s first task is to check an XML document’s syntax and


make sure the document is well formed.

o The second task for some parsers is to verify that an XML


document is valid according to the rules of a DTD or schema.

o Parsers that perform the validation step are called validating


parsers. while parser that don’t are called non-validating
END.

You might also like