ADBMS Sem 1 Mumbai University (MSC - CS)
ADBMS Sem 1 Mumbai University (MSC - CS)
UNIT 1
1.what are the different distributed dbms issues related to query processing.
1
2
1. Full Replication
2
3
3. Partial replication
Partial replication means only some fragments are replicated from the
database.
3
4
1.Query Decomposition –
Decomposing a high level query (relational calculus) into an algebraic query
(relational algebra) on global relations
2.Data Localization-
Input: Algebraic query on distributed relations
Purpose: ∗ Apply data distribution information to the algebra operations and
determine which fragments are involved
∗ Substitute global query with queries on fragments
∗ Optimize the global query
4. Local Optimization-
Input: Best global execution schedule • Use the centralized optimization
techniques
4
5
1.Location transparency
2.Fragmentation transparency
3. NETWORK TRANSPARENCY
5
6
1.First, it provides the user with faster results, which makes the application seem
faster to the user.
2.Secondly, it allows the system to service more queries in the same amount of
time, because each request takes less time than unoptimized queries.
6
7
–Fragmentation/replication of relations
–Additional communication costs
–Parallel execution
7
8
UNIT 2
1.explain the need for commit protocol in distrubuted DBMS.describe two
phase commit protocol.
Commit Protocols
1.a transaction which executes at multiple sites must either be committed at all
the sites, or aborted at all the sites.
1.voting phase
1)Coordinator sends a Prepare message along with the transaction to all
participants and asks each one of them to cast their vote for commit or abort.
2)If participant can commit the transaction Vote commit is send to the
coordinator and if participant cannot commit Vote abort is send to the
coordinator.
2.Commit phase
3)Decision for commit or abort is taken by the coordinator in this phase. If Vote
commit is received from all the participants then Global commit is send to all the
participants and if at least one Vote abort is received then coordinator send
Global abort to all those voted for commit.
8
9
A deadlock can occur because transactions wait for one another and occurs when
the WFG contains a cycle.
For a deadlock to occur, each of the following four conditions must hold.
i. Mutual Exclusion
iii. No preemption
A useful tool in analyzing deadlocks is a wait for graph. A WFG is a directed graph
that represents wait for relationship among transactions. The node of this graph
represents the concurrent transactions in the system. A edge Ti | Tj exists in the
WFG if transaction Ti is waiting for Tj to release a lock on some entry.
In distributed systems, then it is not sufficient that each local distributed DBMS
form a local wait-for graph at each site; it is also necessary to form a global wait
for graph which is the union of all the LWFG.
There are three known methods for handling deadlocks: prevention, avoidance
and detection and resolution.
9
10
Read(x)
Read(y)
x<-x+y
Write(x)
commit
Definition-
A transaction is a single logical unit of work which accesses and possibly modifies
the contents of a database. Transactions access data using read and write
operations.
States of Transaction
A transaction must be in one of the following states:
Active: the initial state, the transaction stays in this state while it is
executing.
Partially committed: after the final statement has been executed.
Failed: when the normal execution can no longer proceed.
Aborted: after the transaction has been rolled back and the database has
been restored to its state prior to the start of the transaction.
Committed: after successful completion.
10
11
4. what is isolation
1.in-place update
2.out-place update
11
12
12
13
13
14
Each processor in the shared nothing system has its own local memory and
local disk.
Processors can communicate with each other through intercommunication
channel.
Any processor can act as a server to serve the data which is stored on local
disk.
14
15
Because The log record describing a message is forced to stable storage before the message is sent
15
16
1.Round-Robin Partitioning:
- Data is distributed evenly by Informatica among all partitions.
- This partitioning is used where the number of rows to process in each partition
are approximately same
2.Hash Portioning:
- Informatica server applies a hash function for the purpose of partitioning keys to
group data among partitions.
- It is used where ensuring the processes groups of rows with the same
partitioning key in the same partition, need to be ensured.
16
17
1. Serial Schedules –
Schedules in which the transactions are executed non-interleaved, i.e., a serial
schedule is one in which no transaction starts until a running transaction has
ended are called serial schedules.
Example: Consider the following schedule involving two transactions T1 and T2.
T1 T2
R(A)
W(A)
R(B)
W(B)
R(A)
R(B)
where R(A) denotes that a read operation is performed on some data item ‘A’
This is a serial schedule since the transactions perform serially in the order T1 —>
T2
2. Complete Schedules –
Schedules in which the last operation of each transaction is either abort (or)
commit are called complete schedules.
Example: Consider the following schedule involving three transactions T1, T2 and
T3.
T1 T2 T3
R(A)
W(A)
R(B)
W(B)
17
18
T1 T2 T3
commit
commit
abort
This is a complete schedule since the last operation performed under every
transaction is either “commit” or “abort”.
3. Recoverable Schedules –
Schedules in which transactions commit only after all transactions whose changes
they read commit are called recoverable schedules. In other words, if some
transaction Tj is reading value updated or written by some other transaction Ti,
then the commit of Tj must occur after the commit of Ti.
Example – Consider the following schedule involving two transactions T1 and T2.
T1 T2
R(A)
W(A)
W(A)
R(A)
Commit
commit
This is a recoverable schedule since T1 commits before T2, that makes the value
read by T2 correct.
18
19
19
20
Second Phase: (Figure 4) Node N issues either a GLOBAL ABORT (GA) or a GLOBAL
COMMIT (GC) and sends it to node N-1. Then node N-1 will enter an ABORT or
COMMIT state. Node N-1 will send the GA or GC to node N-2 until the final vote
to commit or abort reaches the coordinator node.
20
21
1. First Phase: (in Figure 1) In this phase, when a user wants to COMMIT a
transaction, the coordinator issues a PREPARE message to all the subordinates,
[21]. When a subordinate receives the PREPARE message, it writes a PREPARE log
and, if that subordinate is willing to COMMIT, sends a YES VOTE, and enters the
PREPARED state; or, it writes an abort record and, if that subordinate is not willing
to COMMIT, sends a NO VOTE. A subordinate sending a NO VOTE doesn’t need to
enter a PREPARED state since it knows that the coordinator will issue an abort. In
this case, the NO VOTE acts like a veto in the sense that only one NO VOTE is
needed to abort the transaction. The following two rules apply to the
coordinator’s decision, [7]:
a. If even one participant votes to abort the transaction, the coordinator has to
reach a global abort decision.
b. If all the participants vote to COMMIT, the coordinator has to reach a global
COMMIT decision.
21
22
2. Second Phase: (in Figure 2) After the coordinator gets a vote, it has to relay this
vote to the subordinates. If the decision is COMMIT, then the coordinator moves
into the committing state and sends a COMMIT message to all the subordinates
informing them of the COMMIT. When the subordinates receive the COMMIT
message, they move to the committing state and send an acknowledge (ACK)
message to the coordinator. When the coordinator receives the ACK messages, it
ends the transaction. If, on the other hand, the coordinator reaches an ABORT
decision, it sends an ABORT message to all the subordinates. Here, the
coordinator doesn’t need to send an ABORT message to the subordinate(s) that
gave a NO VOTE.
22
23
There are two algorithms for this purpose, namely wait-die and wound-wait. Let
us assume that there are two transactions, T1 and T2, where T1 tries to lock a
data item which is already locked by T2. The algorithms are as follows −
23
24
Unit 3
1.what are difference between transient and persistent objects. How persistence
is handled In Object Oriented (OO) database systems?
1.Transient object:They can't be serialized, its value is not persistent and stored
in heap
2.Persistent Object:They can be serialized, its value is persistent as name implies
and stored in memory
24
25
temporal databases store temporal data, i.e. data that is time dependent
(timevarying). Typical temporal database scenarios and applications include time-
dependent/time-varying economic data, such as:
Share prices
Exchange rates
Interest rates
Company profits
There are three different forms of time dimensions: user-defined time, valid time,
and transaction time.
valid time concerns the time when an event is true in the real world.
Transaction time concerns the time when an event was present in the database as
stored data.
25
26
One of the problems with the definition of an HMM or a dynamic belief network
is that the model depends on the time granularity.
The time granularity can either be fixed, for example each day or each thirtieth of
a second, or it can be event-based, where a time step exists when something
interesting occurs.
If the time granularity were to change, for example from daily to hourly, the
conditional probabilities must be changed.
If you do not have an OID, you can specify the object class or attribute name
appended with -oid. For example, if you create the attribute tempID, you can
specify the OID as tempID-oid.
26
27
Logical data models add further information to the conceptual model elements. It defines the
structure of the data elements and set the relationships between them.
The advantage of the Logical data model is to provide a foundation to form the base for the
Physical model. However, the modeling structure remains generic.
At this Data Modeling level, no primary or secondary key is defined. At this Data modeling
level, you need to verify and adjust the connector details that were set earlier for relationships.
27
28
Supports Standard data types and Supports standard data types and new
additional data types richer data types.
28
29
UNIT 4
1.diffrence between active and passive database
The active database is the one that is currently being used by the clients that have
mailboxes in that database. All the transactions for that database are being
generated by the server it's on.
1.Structured Data
For geeks and developpers (not the same things ^^) Structured data is very banal.
It concerns all data which can be stored in database SQL in table with rows and
columns. They have relationnal key and can be easily mapped into pre-designed
fields. Today, those datas are the most processed in development and the
simpliest way to manage informations.
But structured datas represent only 5 to 10% of all informatics datas. So let’s
introduce semi structured data.
Examples of semi-structured : CSV but XML and JSON documents are semi
structured documents, NoSQL databases are considered as semi structured.
29
30
But as Structured data, semi structured data represents a few parts of data (5 to
10%) so the last data type is the strong one : unstructured data.
3.Unstructured data
Unstructured data represent around 80% of data. It often include text and
multimedia content. Examples include e-mail messages, word processing
documents, videos, photos, audio files, presentations, webpages and many other
kinds of business documents. Note that while these sorts of files may have an
internal structure, they are still considered « unstructured » because the data
they contain doesn’t fit neatly in a database.
30
31
the least fixed point (lfp or LFP, sometimes also smallest fixed point) of a
function from a partially ordered set to itself is the fixed point which is less than
each other fixed point, according to the set's order. A function need not have a
least fixed point, and cannot have more than one.
4. define XML scheme. difference between xml schema and xml dtd
4. XML Schema has a wealth of derived and built-in data types that are not
available in DTD.
5. XML Schema does not allow inline definitions, while DTD does.
31
32
32
33
33
34
Event condition action (ECA) is a short-cut for referring to the structure of active
rules in event driven architecture and active database systems.
The event part specifies the signal that triggers the invocation of the rule
The condition part is a logical test that, if satisfied or evaluates to true,
causes the action to be carried out
The action part consists of updates or invocations on the local data
This structure was used by the early research in active databases which started to
use the term ECA.
34
35
35
36
spatial database
SDBMS focuses on
1.Efficient storage, querying, sharing of large spatial datasets
2.Provides simpler set based query operations
3.Example operations: search by region, overlay, nearest neighbor, distance,
adjacency, perimeter etc.
4.Uses spatial indices and query optimization to speed up queries over large
spatial datasets.
5.SDBMS may be used by applications other than GIS
*Astronomy, Genomics, Multimedia information systems.
36
37
3. what are the difference among immediate, deferred, and detached execution
of active rule actions ?
37
38
38
39
39