20it403 DBMS Digital Material Unit V
20it403 DBMS Digital Material Unit V
MANAGEMENTSYSTEMS
Department: IT
Batch/Year: 2020-24/II
Created by:
Ms.R.Asha
AP/IT
Date: 25.05.2022
1.TABLE OF CONTENTS
1. Contents
2. Course Objectives
1. Pre Requisites
1. Syllabus
1. Course outcomes
1. Lecture Plan
1. Lecture Notes
1. Assignments
1. Assessment Schedule
To learn how to efficiently design and implement various database objects and entities
3. PRE REQUISITES
8
4. SYLLABUS
DATABASE MANAGEMENT SYSTEMS
Query Processing Overview – Algorithms for SELECT and JOIN operations – Query
optimization using Heuristics and Cost Estimation
CO6: Design and deploy an efficient and scalable data storage node for varied kind
of application requirements.
6. CO- PO/PSO MAPPING
N O OF
SL PE RIODS PROPOSED ACTUAL
. LECTURE LECTUR E
NO TAX MODE
TOPIC CO ON OF
O MY DELIVEY
LEVE
L
Distributed 1 PPT/
database ONLINE
1 Implement C LECTU
K2
ation O RE
18.05.2022 18.05.2022 5
Concurrent 1 PPT/
transactions ONLINE
- Concurrency CO LECTU
K2
control 5 RE
2 19.05.2022 19.05.2022
Lock based PPT/
ONLINE
1 LECTU
C RE
3 O K2
19.05.2022 19.05.2022 5
Time stamping 1 PPT/
ONLINE
LECTU
C RE
4 O K3
21.05.2022 21.05.2022 5
Validation based 1 PPT/
ONLINE
C LECTU
O RE
5 K3
24.05.2022 24.05.2022 5
NoSQL 1 PPT/
ONLINE
6 LECTU
RE
C
O K2
24.05.2022 24.05.2022 6
NoSQL 1 PPT/
Categories ONLINE
LECTU
C RE
7 O K2
25.05.2022 25.05.2022 6 1
2
Designing an 1 PPT/
enterprise ONLINE
8. ACTIVITY BASED LEARNING
Down
1. Type of distributed database
2. Set of instructions which forms a logical unit
3. One of the characteristic of distributed database
5. Way to achieve availability in distributed database
8. One of the Object oriented concepts
11. Processing of accessing the information from the storage medium
8. ACTIVITY BASED LEARNING
9. LECTURE NOTES
Unit V
DISTRIBUTED DATABASES
A distributed database is a database in which not all storage devices are attached to
a common processor. It may be stored in multiple computers, located in the same
physical location; or may be dispersed over a network of interconnected computers.
Reliability: In distributed database system, if one system fails down or stops working
for some time another system can complete the task.
Availability: In distributed database system reliability can be achieved even if sever
fails down. Another system is available to serve the client request.
Architectural Models
Some of the common architectural models are –
Client - Server Architecture for DDBMS
Peer - to - Peer Architecture for DDBMS
Multi - DBMS Architecture
Fragmentation: The system partitions the relation into several fragments, and
stores each fragment at a different site.
Data Replication: If relation r is replicated, a copy of relation r is stored in two
or more sites. In the most extreme case, we have full replication, in which a copy
is stored in every site in the system.
Transparency
The user of a distributed database system should not be required to know where the
data are physically located nor how the data can be accessed at the specific local
site. This characteristic, called data transparency, can take several forms:
Fragmentation transparency. Users are not required to know how a relation
has been fragmented.
Replication transparency. Users view each data object as logically unique. The
distributed system may replicate an object to increase either system performance
or data availability. Users do not have to be concerned with what data objects have
been replicated, or where replicas have been placed.
Location transparency. Users are not required to know the physical location of
the data. The distributed database system should be able to find any data as
long as the data identifier is supplied by the user transaction.
DISTRIBUTED TRANSACTIONS
There are two types of transaction that we need to consider.
Local transactions are those that access and update data in only one local
database;
Global transactions are those that access and update data in several local
databases.
System Structure
Each site has its own local transaction manager, whose function is to ensure the ACID
properties of those transactions that execute at that site. The various transaction
managers cooperate to execute global transactions. To understand how
such a manager can be implemented, consider an abstract model of a transaction
system, in which each site contains two subsystems:
• Locking Protocols
• Timestamp based Protocol
• Validation based protocol
Locking Protocols
Single Lock-Manager Approach
• In the single lock-manager approach, the system maintains a single lock manager
that resides in a single chosen site—say Si .
a message is sent to the site at which the lock request was initiated. The
transaction can read the data item from any one of the sites at which a replica of
Advantages:
• Simple implementation This scheme requires two messages for handling lock
requests and one message for handling unlock requests.
•Simple deadlock handling Since all lock and unlock requests are made at one
site, the deadlock-handling algorithms can be applied directly.
Disadvantages
• Bottleneck The site Si becomes a bottleneck, since all requests must be processed
there.
•Vulnerability If the site Si fails, the concurrency controller is lost. Either processing
must stop, or a recovery scheme must be used so that a backup site can take over
lock management from Si
Distributed Lock Manager
• A compromise between the advantages and disadvantages can be achieved
through the distributed-lock-manager approach, in which the lock-manager
function is distributed over several sites.
• When a transaction wishes to lock a data item Q that is not replicated and
resides at site Si , a message is sent to the lock manager at site Si requesting a
lock (in a particular lock mode).
• If data item Q is locked in an incompatible mode, then the request is
delayed until it can be granted. Once it has determined that the lock request
can be granted, the lock manager sends a message back to the initiator
indicating that it has granted the lock request.
Advantage:
Disadvantage
• Deadlock handling is more complex, since the lock and unlock requests are no
longer made at a single site:
• There may be intersite deadlocks even when there is no deadlock within a
single site.
• For each data item Q, the primary copy of Q must reside in precisely
Advantage:
• allows for a simple implementation.
Disadvantage:
• However, if the primary site of Q fails, Q is inaccessible, even though other sites
containing a replica may be accessible.
Majority Protocol
The majority protocol works this way:
• If data item Q is replicated in n different sites, then a lock-request message must
be sent to more than one-half of the n sites in which Q is stored.
• Each lock manager determines whether the lock can be granted immediately. As
before, the response is delayed until the request can be granted.
• The transaction does not operate on Q until it has successfully obtained a lock on
a majority of the replicas of Q.
• Writes are performed on all replicas, requiring all sites containing replicas to be
available. However, the major benefit of the majority protocol is that it can be
extended to deal with site failures.
Advantage:
The protocol also deals with replicated data in a decentralized manner, thus avoiding
the drawbacks of central control.
Disadvantages:
Implementation. The majority protocol is more complicated to implement than are
the previous schemes.
It requires at least 2(n/2 + 1) messages for handling lock requests and at least (n/2
+ 1) messages for handling unlock requests.
Deadlock handling. In addition to the problem of global deadlocks due to the use of
a distributed-lock-manager approach, it is possible for a deadlock to occur even if
only one data item is being locked.
As an illustration, consider a system with four sites and full replication. Suppose that
transactions T1 and T2 wish to lock data item Q in exclusive mode. Transaction T1
may succeed in locking Q at sites S1 and S3, while transaction T2 may succeed in
locking Q at sites S2 and S4. Each then must wait to acquire the third lock; hence, a
deadlock has occurred. We can avoid such deadlocks with relative ease, by requiring
all sites to request locks on the replicas of a data item in the same predetermined
order.
Biased Protocol
The biased protocol is another approach to handling replication.
The difference from the majority protocol is that requests for shared locks are
given more favorable treatment than requests for exclusive locks.
Shared locks.
When a transaction needs to lock data item Q, it simply requests a lock on Q from
the lock manager at one site that contains a replica of Q.
Exclusive locks.
When a transaction needs to lock data item Q, it requests a lock on Q from the
lock manager at all sites that contain a replica of Q. As before, the response to the
request is delayed until it can be granted.
Advantage :
imposing less overhead on read operations than does the majority protocol. This
savings is especially significant in common cases in which the frequency of read is
much greater than the frequency of write.
Disadvantage:
Additional overhead on writes
The biased protocol shares the majority protocol‘s disadvantage of complexity in
handling deadlock.
It assigns read and write operations on an item x two integers, called read
quorum
Qr and write quorum Qw, that must satisfy the following condition, where S is the
total weight of all sites at which x resides:
Timestamp based Protocol in DBMS is an algorithm which uses the System Time or
Logical Counter as a timestamp to serialize the execution of concurrent transactions. The
Timestamp-based protocol ensures that every conflicting read and write operations are
executed in a timestamp order.
The older transaction is always given priority in this method. It uses system time to
determine the time stamp of the transaction. This is the most commonly used concurrency
protocol.
Lock-based protocols help you to manage the order between the conflicting transactions
when they will execute. Timestamp-based protocols manage conflicts as soon as an
operation is created.
Example:
Suppose there are there transactions T1, T2, and T3.
T1 has entered the system at time 0010
T2 has entered the system at 0020 T3 has entered the system at 0030
Priority will be given to transaction T1, then transaction T2 and lastly
Transaction T3.
The principal idea behind the timestamp-based concurrency control protocols is that each
transaction is given a unique timestamp that the system uses in deciding the serialization
order. Our first task, then, in generalizing the centralized scheme to a distributed scheme is
to develop a scheme for generating unique timestamps.
Generation of Timestamps
There are two primary methods for generating unique timestamps, one centralized and
one distributed.
The node can use a logical counter or its own local clock for this purpose. While this
scheme is easy to implement, failure of the node would potentially block all transaction
processing in the system.
In the distributed scheme, each node generates a unique local timestamp by
using either a logical counter or the local clock.
We obtain the unique global timestamp by concatenating the unique local
timestamp with the node identifier, which also must be unique
We may still have a problem if one node generates local timestamps at a rate
faster than that of the other nodes.
In such a case, the fast node‘s logical counter will be larger
than that of other nodes. Therefore, all timestamps generated by the fast node
will be larger than those generated by other nodes. What we need is a
mechanism to ensure that local timestamps are generated fairly across the
system.
There are two solution approaches for this problem.
1. Keep the clocks synchronized by using a network time protocol. The protocol
periodically communicates with a server to find the current time. If the local time
is ahead of the time returned by the server, the local clock is slowed down,
whereas if the local time is behind the time returned by the server it is speeded
up, to bring it back in synchronization with the time at the server. Since all nodes
are approximately synchronized with the server, they are also approximately
synchronized with each other.
2.We define within each node Ni a logical clock (LCi), which generates the unique
local timestamp. The logical clock can be implemented as a counter that is
incremented after a new local timestamp is generated. To ensure that the various
logical clocks are synchronized, we require that a node Ni advance its logical clock
whenever a transaction Ti with timestamp < x, y > visits that node and x is greater
than the current value of LCi. In this case, node Ni advances its logical clock to the
value x + 1. As long as messages are exchanged regularly, the logical clocks will be
approximately synchronized.
Distributed Timestamp Ordering
The timestamp ordering protocol can be easily extended to a parallel or
distributed database setting.
Suppose one node has a time significantly lagging the others, and a transaction
T1 gets its timestamp at that node n1. Suppose the transaction T1 fails a
timestamp test on a data item di because di has been updated by a transaction T2
with a higher timestamp; T1 would be restarted with a new timestamp, but if the
time at node n1 is not synchronized, the new timestamp may still be old enough
to cause the timestamp test to fail, and T1 would be restarted repeatedly until the
time at n1 advances ahead of the timestamp of T2.
Advantages:
Schedules are serializable just like 2PL protocols
No waiting for the transaction, which eliminates the possibility of deadlocks!
Disadvantages:
Starvation is possible if the same transaction is restarted and continually aborted
Distributed Validation
The protocol is based on three timestamps: The start timestamp StartTS(Ti).
The validation timestamp, TS(Ti), which is used as the serialization order.
The finish timestamp FinishTS(Ti) which identifies when the writes of a transaction
have completed.
Steps in Validation based protocol
1.Validation is done locally at each node, with timestamps assigned as described
below.
3.The validation test for a transaction Ti looks at all transactions Tj with TS(Tj) <
TS(Ti), to check if Tj either finished before Ti started, or has no conflicts with Ti. The
assumption is that once a particular transaction enters the validation phase, no
transaction with a lower timestamp can enter the validation phase. The assumption
can be ensured in a centralized system by assigning the timestamps in a critical
section, but cannot be ensured in a distributed setting.
A key problem in the distributed setting is that a transaction Tj may enter the
validation phase after a transaction Ti, but with TS(Tj) < TS(Ti). It is too late for
Ti to be validated against Tj. However, this problem can be easily fixed by rolling back
any transaction if, when it starts validation at a node, a transaction with a later
timestamp had already started validation at that node.
4.The start and finish timestamps are used to identify transactions Tj whose writes
would definitely have been seen by a transaction Ti. These timestamps must be
assigned locally at each node, and must satisfy StartTS(Ti) ≤ TS(Ti) ≤ FinishTS(Ti).
Each node uses these timestamps to perform validation locally.
5.When used in conjunction with 2PC, a transaction must first be validated and then
enter the prepared state. Writes cannot be committed at the database until the
transaction enters the committed state in 2PC. Suppose a transaction Tj reads an
item updated by a transaction Ti that is in the prepared state and is allowed to
proceed using the old value of the data item (since the value generated by Ti has not
yet been written to the database). Then, when transaction Tj attempts to validate, it
will be serialized after Ti and will surely fail validation if Ti commits. Thus, the read
by Tj may as well be held until Ti commits and finishes its writes. The above
behavior is the same as what would happen with locking, with write locks acquired at
the time of validation.
The Validation based Protocol is performed in the following three phases: Read
Phase
Read Phase
In the Read Phase, the data values from the database can be read by a
transaction but the write operation or updates are only applied to the local data
copies, not the actual database.
Validation Phase
In Validation Phase, the data is checked to ensure that there is no violation of
serializability while applying the transaction updates to the database.
Write Phase
In the Write Phase, the updates are applied to the database if the validation is
successful, else; the updates are not applied, and the transaction is rolled back.
1) SQL systems offer too many services (powerful query language, concurrency
control, etc.), which this application may not need; and
2) A structured data model such the traditional relational model may be too
restrictive.
Some of the organizations that were faced with these data management and storage
applications decided to develop their own systems:
BigTable
Google developed a proprietary NOSQL system known as BigTable, which is used in
many of Google‘s applications that require vast amounts of data storage, such as
Gmail, Google Maps, and Web site indexing.
DynamoDB
This innovation led to the category known as key-value data stores or sometimes
key-tuple or key-object data stores.
Cassandra
Facebook developed a NOSQL system called Cassandra, which is now open source
and known as Apache Cassandra.
This NOSQL system uses concepts from both key-value stores and column-based
systems.
Other software companies started developing their own solutions and making them
available to users who need these capabilities—for example, MongoDB and
CouchDB, which are classified as document-based NOSQL systems or document
stores.
Some NOSQL systems, such as OrientDB, combine concepts from many of the
categories discussed above.
Scalability
Availability, replication, and eventual consistency Replication models
Master-slave Master-master
Sharding of files
There are two kinds of scalability in distributed systems: horizontal and vertical. In
NOSQL systems,
horizontal scalability:
It is generally used, where the distributed system is expanded by adding more nodes
for data storage and processing as the volume of data grows.
Many applications that use NOSQL systems require continuous system availability.
To accomplish this, data is replicated over two or more nodes in a transparent
manner, so that if one node fails, the data is still available on other nodes.
Replication improves data availability and can also improve read performance,
because read requests can often be serviced from any of the replicated data
nodes.
Replication Models:
Two major replication models are used in NOSQL systems:
To achieve this, most systems use one of two techniques: hashing or range
partitioning on object keys.
The majority of accesses to an object will be by providing the key value rather than
by using complex query conditions.
In hashing, a hash function h(K) is applied to the key K, and the location of the
object with key K is determined by the value of h(K).
In range partitioning, the location is determined via a range of key values; for
example, location i would hold the objects whose key values K are in the range Kimin
≤ K ≤ Kimax. In applications that require range queries, where multiple objects
within a range of key values are retrieved, range partitioned is preferred.
NOSQL characteristics related to data models and query languages
The users can specify a partial schema in some systems to improve storage efficiency,
but it is not required to have a schema in most of the NOSQL systems.
As there may not be a schema to specify constraints, any constraints on the data
would have to be programmed in the application programs that access the data
items.
There are various languages for describing semistructured data, such as JSON
(JavaScript Object Notation) and XML (Extensible Markup Language).
JSON is used in several NOSQL systems, but other methods for describing semi-
structured data can also be used.
Less Powerful Query Languages: Many applications that use NOSQL systems
may not require a powerful query language such as SQL, because search (read)
queries in these systems often locate single objects in a single file based on their
object keys. NOSQL systems typically provide a set of functions and operations as a
programming API (application programming interface), so reading and writing the
data objects is accomplished by calling the appropriate operations by the programmer.
In many cases, the operations are called CRUD operations, for Create, Read,
Update, and Delete.
In other cases, they are known as SCRUD because of an added Search (or Find)
operation.
Some NOSQL systems also provide a high-level query language, but it may not have
the full power of SQL; only a subset of SQL querying capabilities would be provided.
In particular, many NOSQL systems do not provide join operations as part of the
query language itself; the joins need to be implemented in the application programs.
Versioning:
Some NOSQL systems provide storage of multiple versions of the data items, with
the timestamps of when the data version was created.
Categories of NoSQL Databases:
NoSQL databases can be divided into 4 types. They are as follows
1. Document Database
The document database stores data in the form of documents. This implies that
data is grouped into files that make it easier to be recognized when it is required
for building application software.
In Key value Database data gets stored in an organized manner with the help of
associative pairing. A typical example of this type is Amazon's Dynamo database.
Column Oriented Database:
This type of database stores data in the form of columns that segregates
information into homogenous categories.
This allows the user to access only the desired data without having to retrieve
unnecessary information.
When it comes to data analytics in social media networking sites, the column-
oriented database works very efficiently by showcasing data that is prevalent in
the search results.
Graph Database:
Data is stored in the form of graphical knowledge and related elements like edges,
nodes, etc.
Data points are placed in such a manner that nodes are related to edges and thus,
a network or connection is established between several data points.
This way, one data point leads to the other without the user having to retrieve
individual data points. In the case of software development, this type of database
works well since connected data points often lead to networked data storage.
This, in turn, makes the functioning of software highly effective and organized. An
example of the graph NoSQL database is Amazon Neptune.
Hybrid NOSQL systems: These systems have characteristics from two or more of
the above four categories.
Document-Based NOSQL Systems and MongoDB
Although the documents in a collection should be similar, they can have different
data elements (attributes), and new documents can have new data elements that
do not exist in any of the current documents in the collection.
The system basically extracts the data element names from the self-describing
documents in the collection, and the user can request that the system create
indexes on some of the data elements.
In our example, the collection is capped; this means it has upper limits on its
storage space (size) and number of documents (max). The capping parameters
help the system choose the storage options for each collection.
For our example, we will create another document collection called worker to
hold information about the EMPLOYEEs who work on each project; for example:
db.createCollection(―worker‖, { capped : true, size : 5242880, max : 2000 } ) )
Each document in a collection has a unique ObjectId field, called _id, which is
automatically indexed in the collection unless the user explicitly requests no index
for the _id field.
The value of ObjectId can be specified by the user, or it can be system- generated
if the user does not specify an _id field for a particular document.
User-generated ObjectsIds can have any value specified by the user as long as it
uniquely identifies the document and so these Ids are similar to primary keys in
relational systems.
A collection does not have a schema. The structure of the data fields in
documents is chosen based on how documents will be accessed and used, and the
user can choose a normalized design (similar to normalized relational tuples) or a
denormalized design (similar to XML documents or complex objects).
MongoDB CRUD Operations
MongoDb has several CRUD operations, where CRUD stands for (create, read,
update, delete).
Documents can be created and inserted into their collections using the insert
operation, whose format is: db.<collection_name>.insert(<document(s)>)
The parameters of the insert operation can include either a single document or an
array of documents.
The delete operation is called remove, and the format is:
db.<collection_name>.remove(<condition>)
The key is a unique identifier associated with a data item and is used to locate
this data item rapidly.
The value is the data item itself, and it can have very different formats for
different key-value storage systems.
In some cases, the value is just a string of bytes or an array of bytes, and the
application using the key-value store has to interpret the structure of the data
value.
In other cases, some standard formatted data is allowed; for example, structured
data rows (tuples) similar to relational data, or semistructured data using JSON or
some other self-describing data format.
Different key-value stores can thus store unstructured, semistructured, or
structured data items.
The main characteristic of key-value stores is the fact that every value (data item)
must be associated with a unique key, and that retrieving the value by supplying
the key must be very fast.
DynamoDB Overview
The DynamoDB system is an Amazon product and is available as part of Amazon‘s
AWS/SDK platforms (Amazon Web Services/Software Development Kit).
It can be used as part of Amazon‘s cloud computing services, for the data storage
component.
DynamoDB data model.
The basic data model in DynamoDB uses the concepts
of tables, items, and attributes. A table in DynamoDB does not have a schema;
it holds a collection of self-describing items.
Each item will consist of a number of (attribute, value) pairs, and attribute
values can be single-valued or multivalued.
So basically, a table will hold a collection of items, and each item is a self-
describing record (or object).
DynamoDB also allows the user to specify the items in JSON format, and the
system will convert them to the internal storage format of DynamoDB.
When a table is created, it is required to specify a table name and a primary
key; the primary key will be used to rapidly locate the items in the table.
Thus, the primary key is the key and the item is the value for the DynamoDB
key-value store.
The primary key attribute must exist in every item in the table.
Column-Based or Wide Column NOSQL Systems
The Google distributed storage system for big data, known as BigTable, is a well-
known example of this class of NOSQL systems, and it is used in many Google
applications that require large amounts of data storage, such as Gmail.
Big-Table uses the Google File System (GFS) for data storage and distribution.
An open source system known as Apache Hbase is somewhat similar to Google
Big-Table, but it typically uses HDFS (Hadoop Distributed File System) for data
storage.
Hbase can also use Amazon‘s Simple Storage System (known as S3) for data
storage.
Hbase data model. The data model in Hbase organizes data using the concepts
of namespaces, tables, column families, column qualifiers, columns, rows, and data
cells.
Tables and Rows. Data in Hbase is stored in tables, and each table has a table
name. Data in a table is stored as self-describing rows. Each row has a unique
row key
Column Families, Column Qualifiers, and Columns. A table is associated with
one or more column families. Each column family will have a name, and the
column families associated with a table must be specified when the table is
created and cannot be changed later.
When the data is loaded into a table, each column family can be associated with
many column qualifiers, but the column qualifiers are not specified as part of
creating a table.
So the column qualifiers make the model a self-describing data model because the
qualifiers can be dynamically specified as new rows are created and inserted into
the table.
Cells. A cell holds a basic data item in Hbase. The key (address) of a cell is
specified by a combination of (table, rowid, columnfamily, columnqualifier,
timestamp).
The data model in Neo4j organizes data using the concepts of nodes and
relationships.
Both nodes and relationships can have properties, which store the data items
associated with nodes and relationships. Nodes can have labels; the nodes that
have the same label are grouped into a collection that identifies a subset of the
nodes in the database graph for querying purposes.
A node can have zero, one, or several labels. Relationships are directed; each
relationship has a start node and end node as well as a relationship type, which
serves a similar role to a node label by identifying similar relationships that have
the same relationship type.
Properties can be specified via a map pattern, which is made of one or more
―name : value‖ pairs enclosed in curly brackets; for example {Lname : ‗Smith‘,
Fname : ‗John‘, Minit : ‗B‘}.
An enterprise DBMS is such in which 100 to 10, 000 individuals can access
simultaneously. Businesses and big companies use this to handle their vast data
set. Such database allows businesses to increase their productivity. This kind of
database can handle large organizations with thousands of employees and busy
web server with lakhs of people accessing it simultaneously online.
• Warehouse Database :
A warehouse database is one that holds data taken from several different
databases in subject field. Data in data centers is taken from databases of subject
region.
• Parallel query :
At the same time, several users will position queries in there.All the questions are
responded to simultaneously.
• Multi-process support :
Several processes can be handled by splitting work load between them all.
• Clustering features :
That is, it combines more than one server or single database connecting
case.Often one server can not be sufficient to handle data volume, so this is time
where this function comes into play.
10.ASSIGNMENT
1. Given the following relation EMP and the predicates p1: SAL > 23000, p2: SAL <
23000
In General, the traditional approach to disaster protection does not apply to online
transaction processing. An alternate approach to disaster protection is to have two or
more sites actively back one another up. During normal operation, each site stores a
replica of the data and each carries part of the telecommunications and
computational load. In an emergency, all work is shifted to the surviving sites. For
failsafe protection, the sites must be independent of each other's failures. They
should be geographically separate, part of different power grids, switching stations,
and so on. Geographic diversity gives protection from fires, natural disasters, and
sabotage.
A finance company gives a good example of this approach. The company has traders
of notes, loans, and bonds located in ten American cities. To allow delivery to the
banks for processing on a "same-day" basis, deals must be consummated by 1 PM.
Traders do not start making deals until about 11 AM because interest rates can
change several times during the morning. In a three hour period the traders move
about a billion dollars of commercial paper.
To guard against lost data and lost work, two identical data centers were installed,
each a six-processor system with 10 spindles of disc. The data centers are about 500
km apart. Each center supports half the terminals, although each trading floor has a
direct path to both centers. Each data center stores the whole database. As is
standard with Tandem, each data center duplexes its discs so the database is stored
on four sets of discs they call this the "quad" database architecture. Hence, there is
redundancy of communications, processing, and data.
15. CONTENT BEYOND SYLLABUS
The basic object in XML is the XML document. Two main structuring concepts are used to
construct an XML document: elements and attributes. An example of an XML element called
. As in HTML, elements are identified in a document by their start tag and end tag. The tag
names are enclosed between angled brackets < ... >, and end tags are further identified by
a slash.</…..>.
Complex elements are constructed from other elements hierarchically, whereas simple
elements contain data values. A major difference between XML and HTML is that XML tag
names are defined to describe the meaning of the data elements in the document, rather
than to describe how the text is to be displayed. This makes it possible to process the data
elements in the XML document automatically by computer programs. Also, the XML tag
(element) names can be defined in another document, known as the schema document, to
give a semantic meaning to the tag names that can be exchanged among multiple users. In
HTML, all tag names are predefined and fixed; that is why they are not extendible.
as structured data
DOCUMENT TYPE DEFINITION (DTD):
The document type definition (DTD) is an optional part of an XML
document. The main purpose of a DTD is much like that of a schema:
to constrain and type the information present in the document.
However, the DTD does not in fact constrain types in the sense of basic types like integer or
string. Instead, it constrains only the appearance of sub elements and attributes within an
element. The DTD is primarily a list of rules for what pattern of sub-elements may appear
within an element.
Example of a DTD
Thus, in the DTD, a university element consists of one or more course, department, or
instructor elements; the operator specifies ―or‖ while the + operator specifies ―one or more.‖
Although not shown here, the ∗operator is used to specify ―zero or more,‖ while the?
operator is used to specify an optional element (that is, ―zero or one‖). The course element
contains sub elements course id, title, dept name, and credits (in that order).
Similarly, department and instructor have the attributes of their relational schema defined as
sub elements in the DTD. Finally, the elements course id, title, dept name, credits, building,
budget, IID, name, and salary are all declared to be of type #PCDATA. The keyword #PCDATA
indicates text data; it derives its name, historically, from ―parsed character data.‖
Two other special type declarations are empty, which says that the element has no
contents, and any, which says that there is no constraint on the sub elements of the
element; that is, any elements, even those not mentioned in the DTD, can occur as
sub elements of the element. The absence of a declaration for an element is
equivalent to explicitly declaring the type as any.
XML SCHEMA
XML Schema defines a number of built-in types such as string, integer, decimal date,
and boolean. In addition, it allows user-defined types; these may be simple types
with added restrictions, or complex types constructed using constructors such as
complex Type and sequence.
Note that any namespace prefix could be used in place of xs; thus we could replace
all occurrences of ―xs:‖ in the schema definition with ―xsd:‖ without changing the
meaning of the schema definition. All types defined by XML Schema must be
prefixed by this namespace prefix. The first element is the root element university,
whose type is specified to be University Type, which is declared later. The example
then defines the types of elements department, course, instructor, and teaches. Note
that each of these is specified by an element with tag xs:element, whose body
contains the type definition.
XQUERY
XPath allows us to write expressions that select items from a tree-structured XML
document. XQuery permits the specification of more general queries on one or more
XML documents. The typical form of a query in XQuery is known as a FLWR
expression, which stands for the four main clauses of XQuery and has the following
form:
FOR<variable bindings to individual nodes (elements)> LET <variable
bindings to collections of nodes (elements)> WHERE <qualifier
conditions>
RETURN<query result specification>
There can be zero or more instances of the FOR clause, as well as of the LET clause
in a single XQuery. The WHERE clause is optional, but can appear at most once, and
the RETURN clause must appear exactly once. Let us illustrate these clauses with the
following simple example of a XQuery.
1.Variables are prefixed with the $ sign. In the above example, $d, $x, and $y are
variables.
2.The LET clause assigns a variable to a particular expression for the rest of the
query. In this example, $d is assigned to the document file name. It is possible to
have a query that refers to multiple documents by assigning multiple variables in
this way.
3.The FOR clause assigns a variable to range over each of the individual items in a
sequence. In our example, the sequences are specified by path expressions. The
$x variable ranges over elements that satisfy the path expression
Assessment 1 20.9.2021
Assessment 2 22.10.2021
REFERENCES:
1. Raghu Ramakrishnan, Gehrke ―Database Management Systems‖, MCGraw Hill, 3rd Edition
2014.
2. Plunkett T., B. Macdonald, ―Oracle Big Data Hand Book‖ , McGraw Hill, First Edition, 2013
3. Gupta G K , ―Database Management Systems‖ , Tata McGraw Hill Education Private Limited,
New Delhi, 2011.
4. C. J. Date, A.Kannan, S. Swamynathan, ―An Introduction to Database Systems‖, Eighth
Edition, Pearson Education, 2015.
5. Maqsood Alam, Aalok Muley, Chaitanya Kadaru, Ashok Joshi, Oracle NoSQL Database: Real-
Time Big Data Management for the Enterprise, McGraw Hill Professional, 2013.
6. Thomas Connolly, Carolyn Begg, ― Database Systems: A Practical Approach to Design,
Implementation and Management‖, Pearson , 6th Edition, 2015.
18. MINI PROJECT SUGGESTIONS
insurance scheme and finally make the payments for the client to the added insurance.
2) Inventory Management
The project starts by adding a seller and by adding details of customer. the user can now
purchase new products by the desired seller and then can sell them to the customer, the
purchasing and selling of products is reflected in the inventory section. The main aim of
Inventory Management Mini DBMS project is to add new products and sell them and keep an
inventory to manage them.
Disclaimer:
This document is confidential and intended solely for the educational purpose of RMK Group of
Educational Institutions. If you have received this document through email in error, please notify the
system manager. This document contains proprietary information and is intended only to the
respective group / learning community as intended. If you are not the addressee you should not
disseminate, distribute or copy through e-mail. Please notify the sender immediately by e-mail if you
have received this document by mistake and delete this document from your system. If you are not
the intended recipient you are notified that disclosing, copying, distributing or taking any action in
reliance on the contents of this information is strictly prohibited.