XML Hierarchical (Tree) Data Model
The XML data model’s
model s basic object is the XML document
The 2 main structural elements used to construct an XML
document are elements and attributes
Additional concepts: entities, identifiers and references
Complex elements (internal tree nodes) are constructed
from other elements hierarchically
Simple elements (leaf tree nodes) contain data values
There is no limit on the level of nesting of elements
Identify
Id tif complex
l and
d simple
i l elements
l t iin <projects>
j t
Slide 27- 1
XML Hierarchical (Tree) Data Model
A complex
p XML element called <projects>
p j
Slide 27- 2
XML DTD
An XML DTD file called projects
Slide 27- 3
Main types of XML documents
I. Data-centric XML documents:
have many small data items that follow a specific
structure and hence may be extracted from a
structured database
database.
II. Document-centric XML documents:
contain large amounts of text (i
(i.e.
e articles
articles, books)
They contain few (or none at all) structured data
elements
III. Hybrid XML documents:
they have parts that contain structured data and
other parts that are mostly textual or unstructured
Slide 27- 4
If an XML document conforms to a specific
Schema/DTD, then it is considered as
structured data
XML allows for documents that do not conform
to any Schema/DTD.
Schema/DTD
These are considered as semistructured
data or schemaless XML documents
The value of the standalone attribute is yes
Slide 27- 5
XML DTD and XML Schema
Limitations of XML DTD
First, the data types in DTD are not very general.
Second, DTD has its own special syntax and so it
requires specialized processors.
It would be advantageous to specify XML schema
documents using the syntax rules of XML itself so
that the same processors for XML documents can
process XML schema descriptions.
p p
Third, all DTD elements are always forced to follow
the specified ordering of the document so
unordered elements are not permitted.
This is why XML Schema was developed Slide 27- 6
XML Schema
XML schema file called company
p y
Slide 27- 7
XML schema file called compan
company
Slide 27- 8
XML sc
schema
e a file
e ca
called
ed co
company
pa y
Slide 27- 9
XML schema file called company
Slide 27- 10
XML Schema
The XML schema language g g is a standard for
specifying the structure of an XML document
It uses the same syntax rules as regular XML
Storing data in native XML format has been
proposed as an alternative to relational databases
Th previous
The i XML schema
h file
fil company wouldld
specify the structure of the COMPANY database,
if itt were
e e sto
stored
ed in a native
at e XML syste
system
XML schema is based on the tree data model
(elements, attributes) borrowing concepts from
database and object models (keys, ( references,
f
identifiers)
Slide 27- 11
XML Schema
1. Schema Descriptions and XML Namespaces
It is necessary to identify the specific set of XML
schema language elements (tags) by a file stored at
a Web
W b site
it location.
l ti
The second line in our example specifies the file used in
this example: "http://www.w3.org/2001/XMLSchema".
Each such definition is called an XML namespace.
The file name is assigned to the variable xsd (XML
schema description) using
sing the attrib
attribute
te xmlns
mlns (XML
namespace), and this variable is used as a prefix to
all XML schema commands (tag names).
When we write xsd:element, we refer to the
definition of the element tag in this Web file
Slide 27- 12
XML Schema
2. Annotations, documentation, language used:
The xsd:annotation and xsd:documentation tags
are used for providing comments and other
d
descriptions
i ti iin th
the XML d
document.
t
The attribute xml:lang of the xsd:documentation
element specifies the language being used
used. E
E.g.,
g
“en”
Slide 27- 13
XML Schema
3. Elements and types:
Next, we specify the root element of our XML
schema. In XML schema, the name attribute of the
xsd:element
d l t tag
t specifies
ifi ththe element
l t name, which
hi h
is called company for the root element in our
example.
p
The structure of the company root element is
specified by xsd:complexType.
This is further specified to be a sequence of
departments, employees and projects, using the
q
xsd:sequence tag
g
Slide 27- 14
XML Schema
4. First-level
First level elements in the company database:
These elements are named employee, department, and
project, and each is specified in an xsd:element tag.
If a tag has only attributes and no further sub-elements
or data within it, it can be ended with the back slash
symbol (/>)
(/ ) directly
(instead of a separate matching end tag)
These are called Empty Elements.
Examples: the xsd:element elements,
department, project
Slide 27- 15
XML Schema
5. Specifying element type and minimum and
maximum occurrences:
The attributes type, minOccurs and maxOccurs in the
xsd:element tag are used for specifying the type and
l
lower andd upper b
boundsd on th
the number
b off occurrences
of each element. (ER: min/max, DTD: +,*,?)
If we specify a type attribute in an xsd:element,
the structure of the element will be described separately,
typically using the xsd:complexType element of XML
Schema.
Example: employee, department, projects elements
If we don’t specify a type attribute in an xsd:element,
the structure of the element will be described directly
following the tag. Example: company (root) element
The default is exactly one occurrence.
Slide 27- 16
XML Schema
6. Specifying Keys:
For specifying primary keys, the tag xsd:key is used.
For specifying foreign keys, the tag xsd:keyref is used.
The xsd:unique tag, tag specifies elements that correspond to
unique attributes in a relational database, that are not
primary keys.
S h uniqueness
Such i constraints
t i t can b be given
i a name and d mustt
also specify xsd:selector and xsd:field tags to identify the
element type that contains the unique element and the
element
l t name within
ithi it th
thatt iis unique,
i via
i th
the xpath
th attribute
tt ib t
When specifying a foreign key:
(1) the attribute refer of the xsd:keyref tag specifies the
referenced primary key
(2) the tags xsd:selector and xsd:field specify the
referencing element type and foreign key Slide 27- 17
XML Schema
7. Specifying the structures of complex elements via
complex types:
Complex elements in our example are Department,
Employee, Project, and Dependent, which use the tag
xsd:complexType. We specify each of these as a
sequence of subelements corresponding to the database
attributes of each entity type by using the xsd:sequence
and xsd:element tags of XML schema. Each element is
given a name and type via the corresp. attributes name
and type of xsd:element.
We can also specify minOccurs and maxOccurs
attributes if we need to change the default of exactly one
occurrence. For ((optional)
p ) database attributes where null is
allowed, we need to specify minOccurs = 0, whereas for
multivalued database attributes we need to specify
maxOccurs = “unbounded” on the corresponding element.Slide 27- 18
XML Schema
8. Composite (compound) attributes:
Composite attributes from an ER Schema are also
specified as complex types in the XML schema, as
ill t t d b
illustrated by th
the Address,
Add N
Name, W k and
Worker, d
WorksOn complex types.
Another option is that these could have been
directly embedded within their parent elements.
Slide 27- 19
XML Documents and Databases
Approaches
pp to Storing/Retrieve
g XML Documents
(1) Using a DBMS to store the documents as text:
We can use a relational or object DBMS to store whole XML
documents as text fields within the DBMS records or objects
objects. This
approach can be used if the DBMS has a special module for
document processing, and would work for storing schemaless and
document-centric
document centric XML documents.
(2) Using a DBMS to store the document contents as data
elements:
This approach
Thi h wouldld workk for
f storing
t i a collection
ll ti off documents
d t that
th t
follow a specific XML DTD or XML schema. Since all the documents
have the same structure, we can design a relational (or object)
database to store the leaf
leaf-level
level data elements within the XML
documents. We need mapping algorithms to design a database
schema that is compatible with the XML document structure.
Slide 27- 20
XML Documents and Databases
Approaches
pp to Storing/Retrieve
g XML Documents
(3) Designing a specialized system for storing native XML data:
A new type of database system based on the hierarchical (tree)
model could be designed and implemented
implemented. Native XML DBMS
DBMS.
The system would include specialized indexing and querying
techniques, and would work for all types of XML documents.
(4) Creating or publishing customized XML documents from
pre-existing relational databases:
Because there are enormous amounts of data already stored in
relational
l ti l databases,
d t b parts
t off these
th data
d t may need
d to
t be
b formatted
f tt d
as documents for exchanging or displaying over the Web. Use a
separate software that would handle the conversions needed.
Slide 27- 21
Extracting XML Documents from
Relational Databases
Databases.
Issues arising when converting (flat) relational data
into XML documents (hier. tree model)
Suppose that an application needs to extract XML
documents for student, course, and grade
information from the university database.
Th data
The d t needed
d d ffor th
these d
documents t iis contained
t i d
in the database attributes of the entity types
course, section, and student as shown below
(part of the main ER), and the relationships s-s
and c-s between them.
Slide 27- 22
ER schema diagram
for the UNIVERSITY database
Slide 27- 23
Subset of the UNIVERSITY database
schema
In general, most documents extracted from a
database will use only a subset of the entire
database schema.
Here we use the following subset:
Slide 27- 24
Hierarchical (tree) view with COURSE as
the root
Slide 27- 25
XML schema document with COURSE as
the root
Slide 27- 26
Hierarchical (tree) view with STUDENT as
the root
Slide 27- 27
XML schema document with STUDENT
as the root
Slide 27- 28