UNIT 5 part 01
UNIT 5 part 01
XML works by providing properly formatted data that can be reliably processed
by programs designed to handle XML inputs. For example, technical
documentation may include a <warning> element similar to that shown in the
following snippet of XML code:
<warning>
<para>
<emphasis type="bold">May cause serious injury</emphasis>
Exercise extreme caution as this procedure could result in serious injury or
death if precautions are not taken.
</para>
</warning>
XML documents can be used and shared in different ways. You can use them to send
survey forms or online applications to companies, and also to send output to print on
publishing systems. Large files and non-printable characters are handled well with
XML. XML conversion frees up a lot of data storage space and its high versatility
and good compatibility across major platforms allows data to be shared easily. As
data is arranged systematically and in a more orderly way in XML format, it allows
greater scalability and flexibility.
If your documents are in PDF, HTML, Text, Excel or Word format, you can convert
your documents to XML format by outsourcing the task to a professional document
conversion company. Through XML conversion your documents will be well
formed and can be used various platforms.
Advantages of XML
o XML is easy to read and write, making it accessible for anyone to understand.
o It allows for easy preservation of backward and forward compatibility.
o XML is an international standard, making it compatible with any language.
o It's platform-independent, meaning it's resistant to technological changes.
o XML can be updated incrementally.
Drawbacks of XML
Definitions
Prolog (optional): The prolog appears at the beginning of the XML document
and contains metadata about the document itself, such as the XML version and
the character encoding (e.g., <?xml version="1.0" encoding="UTF-8"?>).
Root Element: Every XML document must contain a single root element that
encases all other elements. The root element provides a container for all data in
the document to enforce a hierarchical structure.
Hierarchical structure
XML documents are inherently hierarchical, a feature that allows them to
represent complex data structures effectively.
Parent and child elements: Elements nested within other elements create
parent-child relationships. This structure allows XML to represent complex
data relationships naturally (e.g., a Person element might contain FirstName,
LastName, and ContactDetails as child elements).
Sibling elements: Elements that are at the same level of the hierarchy and share
the same parent are called siblings. Sibling elements often represent similar
types of data or repeated elements in a list (e.g., multiple Person elements within
a People root element).
This structured approach not only helps in maintaining the logical grouping of
data but also ensures that data masking can be done efficiently and effectively,
targeting only those elements that contain sensitive information.
XML example
<Person>
<First_Name>John</First_Name>
<Last_Name>Doe</Last_Name>
<DOB>1968-11-24</DOB>
<State></State>
<Postcode id=""/>
</Person>
What is XML?
Extensible Markup Language (XML) lets you define and store data in a shareable
manner. XML supports information exchange between computer systems such as
websites, databases, and third-party applications. Predefined rules make it easy to
transmit data as XML files over any network because the recipient can use those
rules to read the data accurately and efficiently.
Why is XML important?
Extensible Markup Language (XML) is a markup language that provides rules to
define any data. Unlike other programming languages, XML cannot perform
computing operations by itself. Instead, any programming language or software can
be implemented for structured data management.
For example, consider a text document with comments on it. The comments might
give suggestions like these:
You use markup symbols, called tags in XML, to define data. For example, to
represent data for a bookstore, you can create tags such as <book>, <title>, and
<author>. Your XML document for a single book would have content like this:
<book>
<title> Learning Amazon Web Services </title>
<author> Mark Wilkins </author>
</book>
Tags bring sophisticated data coding to integrate information flows across different
systems.
What are the benefits of using XML?
Support inter business transactions
When a company sells a good or service to another company, the two businesses
need to exchange information like cost, specifications, and delivery schedules. With
Extensible Markup Language (XML), they can share all the necessary information
electronically and close complex deals automatically, without any human
intervention.
Maintain data integrity
XML lets you transfer data along with the data’s description, preventing the loss of
data integrity. You can use this descriptive information to do the following:
Computer programs like search engines can sort and categorize XML files more
efficiently and precisely than other types of documents. For example, the
word mark can be either a noun or a verb. Based on XML tags, search engines can
accurately categorize mark for relevant search results. Thus, XML helps computers
to interpret natural language more efficiently.
Design flexible applications
With XML, you can conveniently upgrade or modify your application design. Many
technologies, especially newer ones, come with built-in XML support. They can
automatically read and process XML data files so that you can make changes without
having to reformat your entire database.
You can use XML to transfer data between two systems that store the same data in
different formats. For example, your website stores dates in MM/DD/YYYY format,
but your accounting system stores dates in DD/MM/YYYY format. You can transfer
the data from the website to the accounting system by using XML. Your developers
can write code that automatically converts the following:
XML gives structure to the data that you see on webpages. Other website
technologies, like HTML, work with XML to present consistent and relevant data to
website visitors. For example, consider an e-commerce website that sells clothes.
Instead of showing all clothes to all visitors, the website uses XML to create
customized webpages based on user preferences. It shows products from specific
brands by filtering the <brand> tag.
Documentation
You can use XML to specify the structural information of any technical document.
Other programs then process the document structure to present it flexibly. For
example, there are XML tags for a paragraph, an item in a numbered list, and a
heading. Using these tags, other types of software automatically prepare the
document for uses such as printing and webpage publication.
Data type
Many programming languages support XML as a data type. With this support, you
can easily write programs in other languages that work directly with XML files.
What are the components of an XML file?
An Extensible Markup Language (XML) file is a text-based document that you can
save with the .xml extension. You can write XML similar to other text files. To
create or edit an XML file, you can use any of the following:
The <xml></xml> tags are used to mark the beginning and end of an XML file. The
content within these tags is also called an XML document. It is the first tag that any
software will look for to process XML code.
XML declaration
An XML document begins with some information about XML itself. For example,
it might mention the XML version that it follows. This opening is called an XML
declaration. Here's an example.
<?xml version="1.0" encoding="UTF-8"?>
XML elements
All the other tags you create within an XML document are called XML elements.
XML elements can contain these features:
Text
Attributes
Other elements
All XML documents begin with a primary tag, which is called the root element.
For example, consider the XML file below.
<InvitationList>
<family>
<aunt>
<name>Christine</name>
<name>Stephanie</name>
</aunt>
</family>
</InvitationList>
<InvitationList> is the root element; family and aunt are other element names.
XML attributes
XML elements can have other descriptors called attributes. You can define your own
attribute names and write the attribute values within quotation marks as shown
below.
<person age=“22”>
XML content
The data in XML files is also called XML content. For example, in the XML file,
you might see data like this.
<friend>
<name>Charlie</name>
<name>Steve</name>
</friend>
The data values Charlie and Steve are the content.
What is an XML schema?
An Extensible Markup Language (XML) schema is a document that describes some
rules or limits on the structure of an XML file. You can describe these constraints in
several different ways, like these:
While HTML and XML files look very similar, there are some key differences.
Purpose
The purpose of HTML is to present and display data. However, XML stores and
transports data.
Tags
HTML has predefined tags, but users can create and define their own tags in XML.
Syntax rules
There are some minor yet important differences between HTML and XML syntax.
For example, XML is case sensitive, but HTML is not. XML parsers will give errors
if you write a tag as <Book> instead of <book>.
XHTML Full Form
While XHTML is about the same as HTML, since XHTML is more stringent in
syntax and context-sensitivity than HTML, it is much more important to build the
code properly. Unlike HTML, which includes a lenient HTML-specific parser,
XHTML files are well-formed and interpreted using standard XML parsers.
The first document form in the XHTML family is XHTML 1.0, as suggested
by W3C on 26 January 2000.
The second form of the document is XHTML 1.1 and was suggested on 31
May 2001 by W3C.
The third document form is XHTML5, a standard used to create the HTML5
standard for XML adaptation.
DOCTYPE
DOCTYPE is used to declare a DTD.
Head
The Head segment is used to announce the title and other associated attributes.
Body
The body is used to provide the content on web pages. It is composed of several tags.
XHTML Benefits
XHTML tags have closing tags to create a clean code and they are
appropriately encoded.
It requires less bandwidth that significantly reduces website costs.
In association with CSS, web pages are developed.
Simple to send to wireless computers, braille readers and other expert web
conditions. Correct formatting.
Development of Web Technology.
Data Warehouse
The data warehouse, the Internet, and large-scale technological development have
led to the explosive growth of data in today’s world. Corporate decision makers, on
the other hand, want to examine the relationship between data, tap into the hidden
features of data, and analyze and explore deeper levels of data.
However, data sharing between different databases of the enterprise is not possible,
due to multiple databases in the same enterprise, integration between databases poses
great challenges, especially in terms of consolidation and storage of big data.
Operational databases may be scattered around the Microsoft SQL Server database
or Oracle database, the purpose of the data warehouse is to extract and process
multiple databases to collect data from hundreds of gigabytes of data and, according
to the required format, transform, clean, process, and finally store the data into the
warehouse is to install.
According to IBM researchers (Barry Devlin and Paul Murphy) “A data warehouse
is a subject-oriented, integrated, relatively stable collection of data that reflects
historical changes, used to support management decision making.”
After the emergence of data warehouses, the information needs of businesses have
moved from relational databases to a decision support system. This decision support
system is actually what we call Business Intelligence (BI).
Data Mart
Image — III
Compared to a data warehouse, the data mart can be understood as a “small data
warehouse”, it does not depend on heterogeneous databases, but only on a single
instance of an operational database, and the data coverage is not wide enough. Data
mart specifically targets a specific business operation (sales, production) data mart
users find the data they need quickly, in data mart you only need to design and create
database tables, populate database tables with relevant data and decide who can
access the dataset.
Data Lake
Much like flowing water in its natural state, data flows from the multi-source system
into this lake, users can obtain, validate, manage and perform other BI tasks outside
of the data lake. The data lake can evolve to implement the following features;
Imports all data from source systems, no data loss from source systems.
The data is stored in its original state without converting the original data.
The data lake schema accurately satisfies the data analysis requirements.
It has a data lock, control and management.
Operational Data Storage (ODS) is a database for transactional processing data, data
in ODS is mainly raw data, data from ODS is always moved to data warehouse or
data mart for further processing. In ODS you can query data and access only the latest
developments in business operation.
Since the staging area receives operational data from transactional sources in near-
real-time, the burden is offloaded from the transactional systems by only providing
access to current data that is being queried. This makes an ODS the ideal solution
for those looking for a 360-degree view of information connected to current data
records to make faster business decisions.
How can your business benefit from an operational data store? Here are five
compelling reasons why you should consider an ODS to offer your business the
speed, scale, and agility it needs in a snapshot glance.
Cost-effective
An ODS is much cheaper to build and implement than data warehouses and data
lakes. While prices vary dramatically based on operating requirements and use cases,
an ODS typically costs about a tenth of what businesses can expect to pay for an on-
premise data warehouse.
Rapid querying
Since an operational data store collects only current data, querying is simplified by
bypassing the need for multi-level joins. This is particularly helpful when locating
data to answer pressing transactional questions on the fly.
Better data quality
Since an ODS acts as a staging area, it can configure the data into one consistent
format. This improves the overall data quality before being sent into the data
warehouse, where it will be used for strategic decision-making.
Traditional ODS solutions typically suffer from high latency because they are based
on either relational databases or disk-based NoSQL databases. These systems simply
cannot handle large amounts of data and provide high performance simultaneously.
The limited scalability of traditional systems also leads to performance issues when
multiple users access the data store at the same time. As such, traditional ODS
solutions cannot provide real-time API services for accessing systems of record.