0% found this document useful (0 votes)
6 views

UNIT 5 part 01

XML (Extensible Markup Language) is a flexible markup language used to describe and share structured data across various platforms and applications. It allows for the creation of custom tags, ensuring data is both human-readable and machine-readable, and supports data interchange, documentation, and configuration options. XML's hierarchical structure and strict formatting enable efficient data processing and integrity, making it essential for modern web-connected applications.

Uploaded by

tanishamahavar32
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views

UNIT 5 part 01

XML (Extensible Markup Language) is a flexible markup language used to describe and share structured data across various platforms and applications. It allows for the creation of custom tags, ensuring data is both human-readable and machine-readable, and supports data interchange, documentation, and configuration options. XML's hierarchical structure and strict formatting enable efficient data processing and integrity, making it essential for modern web-connected applications.

Uploaded by

tanishamahavar32
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 24

UNIT -5

XML and Data warehousing


What is XML (Extensible Markup Language)?
XML (Extensible Markup Language) is used to describe data. The XML standard is
a flexible way to create information formats and electronically share structured data
via the public internet, as well as via corporate networks.
XML is a markup language based on Standard Generalized Markup Language
(SGML) used for defining markup languages.
XML's primary function is to create formats for data that is used to encode
information for documentation, database records, transactions and many other types
of data. XML data may be used for creating different content types that are generated
by building dissimilar types of content -- including web, print and mobile content --
that are based on the XML data.
Like Hypertext Markup Language (HTML), which is also based on the SGML
standard, XML documents are stored as American Standard Code for Information
Interchange (ASCII) files and can be edited using any text editor.

What is XML used for?


XML's primary function is to provide a "simple text-based format for representing
structured information," according to the World Wide Web Consortium (W3C), the
standards body for the web, including for the following:

underlying data formats for applications such as those in Microsoft Office;


technical documentation;
configuration options for application software;
books;
transactions; and
invoices.
XML enables sharing of structured information among and between the
following:

 programs and programs;


 programs and people; and
 locally and across networks.
How does XML work?
XML works by providing a predictable data format. XML is strict on formatting;
if the formatting is off, programs that process or display the encoded data will
return an error.

For an XML document to be considered well-formed -- that is, conforming to


XML syntax and able to be read and understood by an XML parser -- it must be
valid XML code. All XML documents consist of elements; an element acts as a
container for data. The beginning and end of an element are identified by opening
and closing tags, with other elements or plain data within.

XML works by providing properly formatted data that can be reliably processed
by programs designed to handle XML inputs. For example, technical
documentation may include a <warning> element similar to that shown in the
following snippet of XML code:

<warning>
<para>
<emphasis type="bold">May cause serious injury</emphasis>
Exercise extreme caution as this procedure could result in serious injury or
death if precautions are not taken.
</para>
</warning>

xml development goals


The core design goals of XML development prioritize simplicity, generality, and
usability across the internet for data sharing and transport, allowing for easy creation,
processing, and readability of documents and data structures.
Here's a more detailed breakdown of the XML development goals:
 Simplicity and Clarity: XML documents should be easy to read and understand,
both for humans and machines.
 Generality and Extensibility: XML is designed to support a wide variety of
applications and data structures, allowing for easy expansion and adaptation.
 Internet Usability: XML is designed to be easily used over the internet for data
exchange and sharing.
 Compatibility: XML should be compatible with SGML (Standard Generalized
Markup Language).
 Easy Program Processing: XML should be easy to write programs that can process
XML documents.
 Minimal Optional Features: The number of optional features in XML should be
kept to a minimum, ideally zero.
 Human-Readable: XML documents should be reasonably clear and human-
readable.
 Quick Design: The design of XML should be prepared quickly.
 Formal and Concise Design: The design of XML should be formal and concise.
 Easy Creation: XML documents should be easy to create.
 Data Sharing and Transport: XML facilitates the sharing and transport of data
between different systems and applications.
 Platform Independence: XML stores data in a plain text format, making it software
and hardware independent.
 Content and Presentation Separation: XML separates content from presentation,
allowing the same XML document to be displayed in various ways.
 Unicode Support: XML has strong support for different human languages via
Unicode.
 Machine-Readable Context Information: XML contains machine-readable
context information, which is useful for processing and interpreting data.

Comparison or Difference between HTML & XML

Parameter HTML XML


Markup language used Markup language used for storing and
Purpose
for creating web pages transporting data
Designed to define the
structure and Designed to define the structure of data,
Presentation
presentation of web with no predefined presentation semantics
content
Contains predefined tags
Allows the creation of custom tags based on
Tags for structuring web
the specific data being represented
content
Provides semantic
No inherent semantics; meaning is defined
Semantics meaning to web content
by the user or the application
elements
Must adhere to
Document predefined document Does not have strict document type
Type type definitions (DTDs) requirements
or schemas
Primarily used for
Data Used for storing and exchanging data
displaying web content
Interchange between different systems
in browsers
Limited extensibility
Highly extensible; allows the creation of
Extensibility with predefined tag
custom tags and structures
structure
HTML documents can
XML documents can be validated against
be validated against
Validation XML schemas or Document Type
predefined DTDs or
Definitions (DTDs)
HTML5 specifications
Web development, Data storage, data interchange,
Popular
creating web pages and configuration files, data representation in
Applications
web applications various domains
<h1>Heading</h1> <person><name>John
Examples
<p>Paragraph</p> Doe</name><age>30</age></person>

Business importance in using xml


XML has become the basis of virtually web-connected applications. It has greatly
enhanced the way businesses share information and communicate with each other.
XML is also an excellent way to store data. As XML allows sharing of information
between different computer platforms and different applications, businesses are
opting to convert a lot of their data into XML format.
For instance, suppose company A is selling something to company B. Company A
would need to send a purchase order to company B. For that, both companies require
a common formatting practice. XML provides the language of both the descriptions
of that formatting practice or convention, and offers a suitable way to send the
purchase order data.
Here are the important reasons why businesses use this globally accepted web
format.
 XML is one of the most cost-effective platforms for publishing documents on the
Internet
 Easy retrieval and archival of documents
 XML has flexibility, scalability and versatility
 It is an unique cross platform web publishing format
 XML can represent complex data structures
 It is used to interchange information
 XML allows automated web publishing
 Optimum security of data
 Hassle-free retrieval and archiving

XML documents can be used and shared in different ways. You can use them to send
survey forms or online applications to companies, and also to send output to print on
publishing systems. Large files and non-printable characters are handled well with
XML. XML conversion frees up a lot of data storage space and its high versatility
and good compatibility across major platforms allows data to be shared easily. As
data is arranged systematically and in a more orderly way in XML format, it allows
greater scalability and flexibility.
If your documents are in PDF, HTML, Text, Excel or Word format, you can convert
your documents to XML format by outsourcing the task to a professional document
conversion company. Through XML conversion your documents will be well
formed and can be used various platforms.

XML Full Form, Features, Benefits and Limitations

Key Features of XML

o XML is a structured format that allows users to organize the information


within a file in any way they prefer.
o For those who are familiar with HTML, XML will appear as a more defined
format that resembles standard text.
o XML allows for a specific structure for data, meaning you can dictate how the
XML data file must be organized in another XML file.
o Applications can check the schema definition to determine the type of data to
import.

Advantages of XML

o XML is easy to read and write, making it accessible for anyone to understand.
o It allows for easy preservation of backward and forward compatibility.
o XML is an international standard, making it compatible with any language.
o It's platform-independent, meaning it's resistant to technological changes.
o XML can be updated incrementally.

Drawbacks of XML

o Implementing namespace support in an XML parser can be challenging.


o XML can become complex when trying to structure a large amount of data
manually.
o Compared to JSON, XML requires more labels to format data.
o Maintaining XML node relations requires additional effort.
o XML encourages a non-relational database.
XML structure
This page aims to provide general tips for handling XML documents for
masking, including editing XML structures and leveraging XPath to target data
with precision. With the creation and manipulation of XML file formats,
masked data should be handled per requirements.

Understanding XML structure


An XML document is both human-readable and machine-readable, which
allows it to serve as a common medium for information exchange across diverse
systems.

Definitions
Prolog (optional): The prolog appears at the beginning of the XML document
and contains metadata about the document itself, such as the XML version and
the character encoding (e.g., <?xml version="1.0" encoding="UTF-8"?>).

Elements: Elements are the building blocks of XML documents, denoted by


tags. An element can contain text, other elements, or a mix of both. Elements
are used to encase data points in a document, and typically consist of a start
tag, content, and an end tag (e.g., <name>John Doe</name>).

Attributes: Attributes provide additional information about elements. They are


included within the start tag of an element and usually come in name/value
pairs (e.g., <postcode id="12345"/>).

Root Element: Every XML document must contain a single root element that
encases all other elements. The root element provides a container for all data in
the document to enforce a hierarchical structure.
Hierarchical structure
XML documents are inherently hierarchical, a feature that allows them to
represent complex data structures effectively.

Parent and child elements: Elements nested within other elements create
parent-child relationships. This structure allows XML to represent complex
data relationships naturally (e.g., a Person element might contain FirstName,
LastName, and ContactDetails as child elements).

Sibling elements: Elements that are at the same level of the hierarchy and share
the same parent are called siblings. Sibling elements often represent similar
types of data or repeated elements in a list (e.g., multiple Person elements within
a People root element).

Use of XML in data masking


Masking operations on XML files typically involve modifying the content of
elements or attributes to obfuscate sensitive data while maintaining the
structural integrity of the document. Using XML's hierarchical nature, you can
selectively apply masking rules to specific parts of the document without
disrupting its overall format, to keep the masked data useful for testing or
development purposes.

This structured approach not only helps in maintaining the logical grouping of
data but also ensures that data masking can be done efficiently and effectively,
targeting only those elements that contain sensitive information.

XML example
<Person>
<First_Name>John</First_Name>
<Last_Name>Doe</Last_Name>
<DOB>1968-11-24</DOB>
<State></State>
<Postcode id=""/>
</Person>

Application of xml Structure of an xml document

What is XML?
Extensible Markup Language (XML) lets you define and store data in a shareable
manner. XML supports information exchange between computer systems such as
websites, databases, and third-party applications. Predefined rules make it easy to
transmit data as XML files over any network because the recipient can use those
rules to read the data accurately and efficiently.
Why is XML important?
Extensible Markup Language (XML) is a markup language that provides rules to
define any data. Unlike other programming languages, XML cannot perform
computing operations by itself. Instead, any programming language or software can
be implemented for structured data management.
For example, consider a text document with comments on it. The comments might
give suggestions like these:

 Make the title bold


 This sentence is a header
 This word is the author
Such comments improve the document’s usability without affecting its content.
Similarly, XML uses markup symbols to provide more information about any data.
Other software, like browsers and data processing applications, use this information
to process structured data more efficiently.
XML tags

You use markup symbols, called tags in XML, to define data. For example, to
represent data for a bookstore, you can create tags such as <book>, <title>, and
<author>. Your XML document for a single book would have content like this:
<book>
<title> Learning Amazon Web Services </title>
<author> Mark Wilkins </author>
</book>
Tags bring sophisticated data coding to integrate information flows across different
systems.
What are the benefits of using XML?
Support inter business transactions

When a company sells a good or service to another company, the two businesses
need to exchange information like cost, specifications, and delivery schedules. With
Extensible Markup Language (XML), they can share all the necessary information
electronically and close complex deals automatically, without any human
intervention.
Maintain data integrity

XML lets you transfer data along with the data’s description, preventing the loss of
data integrity. You can use this descriptive information to do the following:

 Verify data accuracy


 Automatically customize data presentation for different users
 Store data consistently across multiple platforms
Improve search efficiency

Computer programs like search engines can sort and categorize XML files more
efficiently and precisely than other types of documents. For example, the
word mark can be either a noun or a verb. Based on XML tags, search engines can
accurately categorize mark for relevant search results. Thus, XML helps computers
to interpret natural language more efficiently.
Design flexible applications

With XML, you can conveniently upgrade or modify your application design. Many
technologies, especially newer ones, come with built-in XML support. They can
automatically read and process XML data files so that you can make changes without
having to reformat your entire database.

What are the applications of XML?


Extensible Markup Language (XML) is the underlying technology in thousands of
applications, ranging from common productivity tools like word processing to book
publishing software and even complex application configuration systems.
Data transfer

You can use XML to transfer data between two systems that store the same data in
different formats. For example, your website stores dates in MM/DD/YYYY format,
but your accounting system stores dates in DD/MM/YYYY format. You can transfer
the data from the website to the accounting system by using XML. Your developers
can write code that automatically converts the following:

 Website data to XML format


 XML data to accounting system data
 Accounting system data back to XML format
 XML data back to website data
Web applications

XML gives structure to the data that you see on webpages. Other website
technologies, like HTML, work with XML to present consistent and relevant data to
website visitors. For example, consider an e-commerce website that sells clothes.
Instead of showing all clothes to all visitors, the website uses XML to create
customized webpages based on user preferences. It shows products from specific
brands by filtering the <brand> tag.
Documentation

You can use XML to specify the structural information of any technical document.
Other programs then process the document structure to present it flexibly. For
example, there are XML tags for a paragraph, an item in a numbered list, and a
heading. Using these tags, other types of software automatically prepare the
document for uses such as printing and webpage publication.
Data type

Many programming languages support XML as a data type. With this support, you
can easily write programs in other languages that work directly with XML files.
What are the components of an XML file?
An Extensible Markup Language (XML) file is a text-based document that you can
save with the .xml extension. You can write XML similar to other text files. To
create or edit an XML file, you can use any of the following:

 Text editors like Notepad or Notepad++


 Online XML editors
 Web browsers
Any XML file includes the following components.
XML document

The <xml></xml> tags are used to mark the beginning and end of an XML file. The
content within these tags is also called an XML document. It is the first tag that any
software will look for to process XML code.
XML declaration

An XML document begins with some information about XML itself. For example,
it might mention the XML version that it follows. This opening is called an XML
declaration. Here's an example.
<?xml version="1.0" encoding="UTF-8"?>
XML elements

All the other tags you create within an XML document are called XML elements.
XML elements can contain these features:

 Text
 Attributes
 Other elements
All XML documents begin with a primary tag, which is called the root element.
For example, consider the XML file below.
<InvitationList>
<family>
<aunt>
<name>Christine</name>
<name>Stephanie</name>
</aunt>
</family>
</InvitationList>
<InvitationList> is the root element; family and aunt are other element names.
XML attributes

XML elements can have other descriptors called attributes. You can define your own
attribute names and write the attribute values within quotation marks as shown
below.
<person age=“22”>
XML content

The data in XML files is also called XML content. For example, in the XML file,
you might see data like this.
<friend>
<name>Charlie</name>
<name>Steve</name>
</friend>
The data values Charlie and Steve are the content.
What is an XML schema?
An Extensible Markup Language (XML) schema is a document that describes some
rules or limits on the structure of an XML file. You can describe these constraints in
several different ways, like these:

 Grammatical rules to determine the order of elements


 Yes or No conditions that the content must satisfy
 Data types for the content in XML files
 Constraints for data integrity
For example, an XML schema for bookstores might impose constraints like these:

1. A book element will have the attributes title and author.


2. The book element will be nested under a category element with an attribute name.
3. The price of a book will be a separate element nested under book.
To meet these constraints, we will write the XML file as shown below.
<category name=“Technology”>
<book title=“Learning Amazon Web Services”, author=“Mark Wilkins”>
<price>$20</price>
</book>
</category>
XML schemas enforce consistency in how different software applications create and
use XML files. Some industries implement XML schemas that are specific to their
operations to reduce complexity in writing XML code for interbusiness data transfer.
For example, Scalable Vector Graphics (SVG) is an XML specification for
describing computer graphics-related data. Software developers write XML files so
that they meet such industry specifications.
What is an XML parser?
An Extensible Markup Language (XML) parser is software that can process or read
XML documents to extract the data within them. XML parsers also check the syntax
or rules of the XML file and can validate it against a particular XML schema.
Because XML is a strict markup language, the parsers will not process the file if
there are any validation or syntax errors. For example, the XML parser will give
errors if any of these conditions are true:

 A closing tag or end tag is missing


 Attribute values don’t have quotation marks
 A schema condition has not been met
Software applications use XML parsers to transform XML files into native data
types. They can thus focus on the application logic without having to go into the
details of the XML itself.

How is XML different from HTML?


HyperText Markup Language (HTML) is the language used in most webpages. A
web browser processes the HTML documents and displays them as a multimedia
page. The World Wide Web Consortium (W3C) is the international community that
develops protocols and guidelines to ensure the long-term growth of the web. W3C
established both the HTML and Extensible Markup Language (XML) standards that
website developers implement for consistency and quality.
XML vs. HTML

While HTML and XML files look very similar, there are some key differences.
Purpose

The purpose of HTML is to present and display data. However, XML stores and
transports data.
Tags

HTML has predefined tags, but users can create and define their own tags in XML.
Syntax rules

There are some minor yet important differences between HTML and XML syntax.
For example, XML is case sensitive, but HTML is not. XML parsers will give errors
if you write a tag as <Book> instead of <book>.
XHTML Full Form

What is the full form of XHTML?


The full form of XHTML is Extensible Hypertext Markup Language. It is a cross
between a programming language HTML & XML. XHTML is almost the same as
HTML, but it’s more rigid than HTML. XHTML is an XML app specified in HTML.
All major web browsers support it.

While XHTML is about the same as HTML, since XHTML is more stringent in
syntax and context-sensitivity than HTML, it is much more important to build the
code properly. Unlike HTML, which includes a lenient HTML-specific parser,
XHTML files are well-formed and interpreted using standard XML parsers.

A brief history of XHTML


The W3C (World Wide Web Consortium) created it. Web designers use XHTML in
order to migrate from HTML to XML. XHTML guarantees content consistency and
it’s easier for developers to use XHTML to enter XML.

 The first document form in the XHTML family is XHTML 1.0, as suggested
by W3C on 26 January 2000.
 The second form of the document is XHTML 1.1 and was suggested on 31
May 2001 by W3C.
 The third document form is XHTML5, a standard used to create the HTML5
standard for XML adaptation.

Various component of XHTML


XHTML documents consist of three sections, listed as follows:

DOCTYPE
DOCTYPE is used to declare a DTD.

Head
The Head segment is used to announce the title and other associated attributes.
Body
The body is used to provide the content on web pages. It is composed of several tags.

In order to construct an XHTML webpage, it is essential to provide a (DTD)


Document Type Definition. The three types of DTD are

1. Transitional DTD – it is supported by older browsers that do not have CSS


support.
2. Strict DTD – It is used when only markup language is included in the XHTML
page.
3. Frameset DTD – It is used when XHTML pages include frames.

XHTML Benefits

 XHTML tags have closing tags to create a clean code and they are
appropriately encoded.
 It requires less bandwidth that significantly reduces website costs.
 In association with CSS, web pages are developed.
 Simple to send to wireless computers, braille readers and other expert web
conditions. Correct formatting.
 Development of Web Technology.
Data Warehouse

The data warehouse, the Internet, and large-scale technological development have
led to the explosive growth of data in today’s world. Corporate decision makers, on
the other hand, want to examine the relationship between data, tap into the hidden
features of data, and analyze and explore deeper levels of data.

However, data sharing between different databases of the enterprise is not possible,
due to multiple databases in the same enterprise, integration between databases poses
great challenges, especially in terms of consolidation and storage of big data.
Operational databases may be scattered around the Microsoft SQL Server database
or Oracle database, the purpose of the data warehouse is to extract and process
multiple databases to collect data from hundreds of gigabytes of data and, according
to the required format, transform, clean, process, and finally store the data into the
warehouse is to install.

According to IBM researchers (Barry Devlin and Paul Murphy) “A data warehouse
is a subject-oriented, integrated, relatively stable collection of data that reflects
historical changes, used to support management decision making.”

1. By its nature, a data warehouse is used to complete decision making for


management and analytics of business data operation, but it is different from
the operational database of the enterprise.

2. A data warehouse is the efficient integration and management of multiple


heterogeneous data sources in a single repository, organized in terms of
historical data, and there is no need for transactional modification of data in a
data warehouse.

After the emergence of data warehouses, the information needs of businesses have
moved from relational databases to a decision support system. This decision support
system is actually what we call Business Intelligence (BI).
Data Mart

Image — III

Compared to a data warehouse, the data mart can be understood as a “small data
warehouse”, it does not depend on heterogeneous databases, but only on a single
instance of an operational database, and the data coverage is not wide enough. Data
mart specifically targets a specific business operation (sales, production) data mart
users find the data they need quickly, in data mart you only need to design and create
database tables, populate database tables with relevant data and decide who can
access the dataset.
Data Lake

Much like flowing water in its natural state, data flows from the multi-source system
into this lake, users can obtain, validate, manage and perform other BI tasks outside
of the data lake. The data lake can evolve to implement the following features;

 Imports all data from source systems, no data loss from source systems.

 The data is stored in its original state without converting the original data.

 The data lake schema accurately satisfies the data analysis requirements.
 It has a data lock, control and management.

Operational Data Storage (ODS)

Operational Data Storage (ODS) is a database for transactional processing data, data
in ODS is mainly raw data, data from ODS is always moved to data warehouse or
data mart for further processing. In ODS you can query data and access only the latest
developments in business operation.

What Is an Operational Data Store?

An operational data store is a cost-effective solution to the non-volatile nature of


data warehouses. An ODS does not require the same type of transformations as a
data warehouse. Since an ODS can only store structured data, the data remains in its
existing schema, making it more like a data lake, which uses the schema-on-write
approach.
In this sense, the ODS acts as a repository that stores a snapshot of an organization's
most current data, making it easier for users to diagnose problems before searching
through component systems. For example, an ODS allows service representatives to
immediately query a transaction to answer:

 Where is the customer’s package currently located?

 Why is the transaction not going through?

 What steps can I take to further troubleshoot this problem?

Since the staging area receives operational data from transactional sources in near-
real-time, the burden is offloaded from the transactional systems by only providing
access to current data that is being queried. This makes an ODS the ideal solution
for those looking for a 360-degree view of information connected to current data
records to make faster business decisions.

Benefits of an Operational Data Store

How can your business benefit from an operational data store? Here are five
compelling reasons why you should consider an ODS to offer your business the
speed, scale, and agility it needs in a snapshot glance.

Cost-effective
An ODS is much cheaper to build and implement than data warehouses and data
lakes. While prices vary dramatically based on operating requirements and use cases,
an ODS typically costs about a tenth of what businesses can expect to pay for an on-
premise data warehouse.

Rapid querying
Since an operational data store collects only current data, querying is simplified by
bypassing the need for multi-level joins. This is particularly helpful when locating
data to answer pressing transactional questions on the fly.
Better data quality
Since an ODS acts as a staging area, it can configure the data into one consistent
format. This improves the overall data quality before being sent into the data
warehouse, where it will be used for strategic decision-making.

Faster tactical decision making


An ODS provides time-sensitive business data that would be impossible to locate
when embedded in disparate source systems. Since an ODS extracts real-time
operational data, it simplifies the reporting process and greatly improves efficiency
by consolidating that information in a snapshot repository.

Faster time to market


A next-generation ODS can reduce manual schema mapping to just a single click.
With a microservices architecture, organizations are enabled to bring new services
to market faster.

Disadvantages of an Operational Data Store

Traditional ODS solutions typically suffer from high latency because they are based
on either relational databases or disk-based NoSQL databases. These systems simply
cannot handle large amounts of data and provide high performance simultaneously.

The limited scalability of traditional systems also leads to performance issues when
multiple users access the data store at the same time. As such, traditional ODS
solutions cannot provide real-time API services for accessing systems of record.

You might also like