ADBMS Answer Bank
ADBMS Answer Bank
Systems
______________________________
Databases are stored in file formats, which contain records. At physical level, the actual
data is stored in electromagnetic format on some device. These storage devices can be
broadly categorized into three types −
Primary Storage − The memory storage that is directly accessible to the CPU
comes under this category. CPU's internal memory (registers), fast memory
(cache), and main memory (RAM) are directly accessible to the CPU, as they are all
placed on the motherboard or CPU chipset. This storage is typically very small,
ultra-fast, and volatile. Primary storage requires continuous power supply in order
to maintain its state. In case of a power failure, all its data is lost.
Secondary Storage − Secondary storage devices are used to store data for future
use or as backup. Secondary storage includes memory devices that are not a part
of the CPU chipset or motherboard, for example, magnetic disks, optical disks
(DVD, CD, etc.), hard disks, flash drives, and magnetic tapes.
Tertiary Storage − Tertiary storage is used to store huge volumes of data. Since
such storage devices are external to the computer system, they are the slowest in
speed. These storage devices are mostly used to take the back up of an entire
system. Optical disks and magnetic tapes are widely used as tertiary storage.
Hard Disk Drives (HDD) utilize magnetic storage technology, with data stored on
spinning disks called platters. These platters, along with read/write heads,
constitute the main components of an HDD. HDDs are known for their relatively
large storage capacities and cost-effectiveness per gigabyte. However, they have
mechanical parts and, as a result, exhibit slower access times and data transfer
rates compared to SSDs. HDDs are often employed in scenarios where massive
data storage is required, such as in servers and desktop computers.
On the other hand, Solid State Drives (SSD) leverage NAND-based flash memory
technology. Unlike HDDs, SSDs have no moving parts, storing data electronically
on NAND flash memory chips. This absence of mechanical components contributes
to faster access times, increased data transfer rates, and enhanced durability.
While historically more expensive per gigabyte, SSD prices have been decreasing,
and they have become increasingly popular for applications requiring high-speed
data access. SSDs are commonly used as the primary storage for operating
systems, applications, and in scenarios where rapid data retrieval is crucial, such
as in gaming systems.
### 1. **Records:**
- **Definition:** Records are units of data within a database or file
that contain information about a particular entity. In a database
context, a record represents a row in a table, and it consists of fields
that hold specific pieces of data.
- **Example:** In a student database, a record might represent an
individual student and include fields such as "Student ID," "Name,"
"Age," and "Grade."
### 2. **Blocking:**
- **Definition:** Blocking refers to the practice of grouping
multiple records together into a block or a cluster when storing data
on disk. This is done to improve data retrieval efficiency.
### 4. **File:**
- **Definition:** A file is a collection of related data stored
together with a file name. Files are used to organize and store data on
a computer's storage media.
Update Operations
Retrieval Operations
Definition: This metric assesses the types of access operations that the
index can efficiently support. It focuses on the ability of the index to
facilitate quick retrieval of records based on specific criteria.
Example: An index might efficiently support access operations for
records with a specified value in a particular attribute or for records with
an attribute value falling within a specified range of values.
Access Time:
Definition: Access time measures the time required to locate and retrieve
records using the index. It is a crucial metric for assessing the speed and
efficiency of index-based data retrieval operations.
Example: A well-designed index should minimize access time, ensuring
that queries are executed swiftly.
Insertion Time:
Definition: Insertion time refers to the time required to add a new record
to the database when an index is present. It evaluates how the presence
of the index impacts the efficiency of insertion operations.
Example: If an index significantly slows down the insertion of records, it
may affect real-time data processing scenarios.
Deletion Time:
Space Overhead:
b. Primary index
c. Secondary index
d. Sparse Index
8. Explain Single-level Ordered Indexes
Single Level Indexing
It is somewhat like the index (or the table of contents) found in a book.
Index of a book contains topic names along with the page number
similarly the index table of the database contains keys and their
corresponding block address.
1. Primary Indexing: The indexing or the index table created using Primary keys is known as Primary
Indexing. It is defined on ordered data. As the index is comprised of primary keys, they are unique, not
null, and possess one to one relationship with the data blocks.
Example:
Example:
Example:
Ordered Indexing:
Ordered indexing is the traditional way of storing that gives fast
retrieval. The indices are stored in a sorted manner hence it is also
known as ordered indices.
1. Dense Indexing: In dense indexing, the index table contains records for every search key value of the
database. This makes searching faster but requires a lot more space. It is like primary indexing but
contains a record for every search key.
Example:
2. Sparse Indexing: Sparse indexing consumes lesser space than dense indexing, but it is a bit slower as
well. We do not include a search key for every record despite that we store a Search key that points to
a block. The pointed block further contains a group of data. Sometimes we have to perform double
searching this makes sparse indexing a bit slower.
Example:
1. **Data Ordering:**
- *Description:* Entries in the index are arranged based on the order of
the indexed data, facilitating efficient range queries and ordered retrieval.
2. **Uniqueness:**
- *Description:* The index is based on unique values, ensuring that each
entry is unique and allowing for fast and accurate searches.
3. **Additional Paths:**
- *Description:* The index is created on non-primary key columns,
providing additional paths for efficient retrieval based on different criteria.
4. **Storage Efficiency:**
- *Description:* The index structure is designed to optimize storage
space, often by not having entries for every possible search key value.
5. **Physical Ordering:**
- *Description:* In a clustered index, the physical order of data records
in the table corresponds to the order of entries in the index, potentially
improving retrieval speed for range queries.
7. **Bitwise Representation:**
- *Description:* A bitmap index uses a bitmap to represent the
presence or absence of a particular value in the indexed column, enabling
efficient compression and bitwise operations.
B Tree
B Tree is a specialized m-way tree that can be widely used for disk access. A B-Tree of order m can have at
most m-1 keys and m children. One of the main reason of using B tree is its capability to store large number
of keys in a single node and large key values by keeping the height of the tree relatively small.
A B tree of order m contains all the properties of an M way tree. In addition, it contains the following
properties.
2. Every node in a B-Tree except the root node and the leaf node contain at least m/2 children.
It is not necessary that, all the nodes contain the same number of children but, each node must have m/2
number of nodes. Eg. of order 4 below
While performing some operations on B Tree, any property of B Tree may violate such as number of
minimum children a node can have. To maintain the properties of B Tree, the tree may split or join.
B+ tree
B+ Tree
B+ Tree is an extension of B Tree which allows efficient insertion, deletion and search operations.
In B Tree, Keys and records both can be stored in the internal as well as leaf nodes. Whereas, in B+ tree,
records (data) can only be stored on the leaf nodes while internal nodes can only store the key values.
The leaf nodes of a B+ tree are linked together in the form of a singly linked lists to make the search
queries more efficient.
B+ Tree are used to store the large amount of data which can not be stored in the main memory. Due to the
fact that, size of main memory is always limited, the internal nodes (keys to access records) of the B+ tree
are stored in the main memory whereas, leaf nodes are stored in the secondary memory.
The internal nodes of B+ tree are often called index nodes. A B+ tree of order 3 is shown in the following
figure.
Advantages of B+ Tree
5. Faster search queries as the data is stored only on the leaf nodes.
13. Difference between B-tree and B+-tree
14. Explain Hashing: static and dynamic techniques
Applications of Hashing
Hashing is applicable in the following area −
Password verification
Associating filename with their paths in operating systems
Data Structures, where a key-value pair is created in which the key is a unique
value, whereas the value associated with the keys can be either same or different
for different keys.
Board games such as Chess, tic-tac-toe, etc.
Graphics processing, where a large amount of data needs to be matched and
fetched.
Read through this article to find out more about Hashing and specifically the difference
between two important hashing techniques − static hashing and dynamic hashing.
AD
Delete − Locate the desired location and support deleting data (or a chunk of data)
at that location.
Insertion − Support inserting new data into the data bucket if there is a space
available in the data bucket.
Query − Perform querying to compute the bucket address.
Update − Perform a query to update the data.
The location of the data in memory keeps changing according to the bucket size.
Hence if there is a phenomenal increase in data, then maintaining the bucket
address table becomes a challenge.
AD
Conclusion
Hashing is a computation technique that uses mathematical functions called Hash
Functions to calculate the location (address) of the data in the memory. We learnt that
there are two different hashing functions namely, Static hashing and Dynamic hashing.
Each hashing technique is different in terms of whether they work on fixed-length data
bucket or a variable-length data bucket. Selecting a proper hashing technique is required
by considering the amount of data needed to be handled, and the intended speed of the
application.
Module 2 : Query Processing and Optimization
____________________________________________________
1. Explain basic steps in Query Processing Query Processing in
DBMS
Query Processing is the activity performed in extracting data from the database. In query processing, it
takes various steps for fetching the data from the database. The steps involved are:
2. Optimization
3. Evaluation
Suppose a user executes a query. As we have learned that there are various methods of extracting the data
from the database. In SQL, a user wants to fetch the records of the employees whose salary is greater than
or equal to 10000. For doing this, the following query is undertaken:
Thus, to make the system understand the user query, it needs to be translated in the form of relational
algebra. We can bring this query in the relational algebra form as:
After translating the given query, we can execute each relational algebra operation by using different
algorithms. So, in this way, a query processing begins its working.
Evaluation
For this, with addition to the relational algebra translation, it is required to annotate the translated relational
algebra expression with the instructions used for specifying and evaluating each operation. Thus, after
translating the user query, the system executes a query evaluation plan.
○ In order to fully evaluate a query, the system needs to construct a query evaluation plan.
○ The annotations in the evaluation plan may refer to the algorithms to be used for the particular index
or the specific operations.
○ Such relational algebra with annotations is referred to as Evaluation Primitives. The evaluation
primitives carry the instructions needed for the evaluation of the operation.
○ Thus, a query evaluation plan defines a sequence of primitive operations used for evaluating a
query. The query evaluation plan is also referred to as the query execution plan.
○ A query execution engine is responsible for generating the output of the given query. It takes the
query execution plan, executes it, and finally makes the output for the user query.
Optimization
○ The cost of the query evaluation can vary for different types of queries. Although the system is
responsible for constructing the evaluation plan, the user does need not to write their query
efficiently.
○ Usually, a database system generates an efficient query evaluation plan, which minimizes its cost.
This type of task performed by the database system and is known as Query Optimization.
○ For optimizing a query, the query optimizer should have an estimated cost analysis of each
operation. It is because the overall operation cost depends on the memory allocations to several
operations, execution costs, and so on.
Finally, after selecting an evaluation plan, the system evaluates the query and produces the output of the
query.
2. Explain Query Optimization in detail
Query optimization is used to access and modify the database in the most efficient way
possible. It is the art of obtaining necessary information in a predictable, reliable, and timely
manner. Query optimization is formally described as the process of transforming a query
into an equivalent form that may be evaluated more efficiently. The goal of query
optimization is to find an execution plan that reduces the time required to process a query.
We must complete two major tasks to attain this optimization target.
The first is to determine the optimal plan to access the database, and the second is to reduce
the time required to execute the query plan.
Optimizer Components
The optimizer is made up of three parts: the transformer, the estimator, and the plan
generator. The figure below depicts those components.
● Query Transformer: The query transformer determines whether it is advantageous to rewrite the
original SQL statement into a semantically equivalent SQL statement at a lower cost for some
statements.
● Estimator The estimator is the optimizer component that calculates the total cost of a given execution
plan. To determine the cost, the estimator employs three different methods:
○ Selectivity: The query picks a percentage of the rows in the row set, with 0 indicating no rows
and 1 indicating all rows. Selectivity is determined by a query predicate, such as WHERE the
last name LIKE X%, or by a mix of predicates. As the selectivity value approaches zero, a
predicate gets more selective, and as the value nears one, it becomes less selective (or more
unselective).
○ Cardinality: The cardinality of an execution plan is the number of rows returned by each
action. This input is shared by all cost functions and is essential for determining the best
strategy. Cardinality in DBMS can be calculated using DBMS STATS table statistics or after
taking into account the impact of predicates (filter, join, and so on), DISTINCT or GROUP BY
operations, and so on. In an execution plan, the Rows column displays the estimated
cardinality.
○ Cost: This metric represents the number of units of labor or resources used. The query
optimizer uses disc I/O, CPU utilization, and memory usage as units of effort. For example, if
the plan for query A has a lower cost than the plan for query B, then the following outcomes
are possible: A executes faster than B, A executes slower than B or A executes in the same
amount of time as B.
● Plan Generator : The plan generator investigates multiple plans for a query block by experimenting
with various access paths, join methods, and join orders.
3. Explain Query Cost Measurement.
4. Explain Cost estimation of selection algorithms in detail.
Cost Estimation
Here, the overall cost of the algorithm is composed by adding the cost of individual index scans and cost of
fetching the records in the intersection of the retrieved lists of pointers. We can minimize the cost by
sorting the list of pointers and fetching the sorted records. So, we found the following two points for cost
estimation:
○ We can fetch all selected records of the block using a single I/O operation because each pointer in
the block appears together.
○ The disk-arm movement gets minimized as blocks are read in sorted order.
Linear Search, Equality ts + (br/2) It is the average case where it needs only one record satisfying
Primary B+-tree index, (hi +1) * (tr Each I/O operation needs one seek and one block transfer to
Equality on Key + ts) fetch the record by traversing the height of the tree.
Primary B+-tree index, hi * (tT + ts) It needs one seek for each level of the tree, and one seek for the
Equality on Key (tr + ts) fetch the record by traversing the height of the tree.
Secondary B+-tree index, (hi + n) * It requires one seek per record because each record may be on a
Primary B+-tree index, hi * (tr + ts) It needs one seek for each level of the tree, and one seek for the
Secondary B+-tree index, (hi + n) * It requires one seek per record because each record may be on a
Cost Estimation:
To estimate the cost of different available execution plans or the
execution strategies the query tree is viewed and studied as a data
structure that contains a series of basic operation which are linked in
order to perform the query. The cost of the operations that are present in
the query depends on the way in which the operation is selected such
that, the proportion of select operation that forms the output. It is also
important to know the expected cardinality of an operation output. The
cardinality of the output is very important because it forms the input to
the next operation.
The cost of optimization of the query depends upon the following-
Cardinality-
Cardinality is known to be the number of rows that are returned by
performing the operations specified by the query execution plan. The
estimates of the cardinality must be correct as it highly affects all the
possibilities of the execution plan.
Selectivity-
Selectivity refers to the number of rows that are selected. The selectivity
of any row from the table or any table from the database almost depends
upon the condition. The satisfaction of the condition takes us to the
selectivity of that specific row. The condition that is to be satisfied can be
any, depending upon the situation.
Cost-
Cost refers to the amount of money spent on the system to optimize the
system. The measure of cost fully depends upon the work done or the
number of resources used.
Q.1 Draw the state transition diagram of transition with transaction states ?
States through which a transaction goes during its lifetime. These are the states which tell about the current state of the
Transaction and also tell how we will further do the processing in the transactions. These states govern the rules which
decide the fate of the transaction whether it will commit or abort.
They also use Transaction log. Transaction log is a file maintain by recovery management component to record all the
activities of the transaction. After commit is done transaction log file is removed.
Active State –
When the instructions of the transaction are running then the transaction is in active state. If all the ‘read and write’
operations are performed without any error then it goes to the “partially committed state”; if any instruction fails, it goes
to the “failed state”.
Partially Committed –
After completion of all the read and write operation the changes are made in main memory or local buffer. If the changes
are made permanent on the DataBase then the state will change to “committed state” and in case of failure it will go to
the “failed state”.
Failed State –
When any instruction of the transaction fails, it goes to the “failed state” or if failure occurs in making a permanent
change of data on Data Base.
Aborted State –
After having any type of failure the transaction goes from “failed state” to “aborted state” and since in previous states,
the changes are only made to local buffer or main memory and hence these changes are deleted or rolled-back.
Committed State –
It is the state when the changes are made permanent on the Data Base and the transaction is complete and therefore
terminated in the “terminated state”.
Terminated State –
If there isn’t any roll-back or the transaction comes from the “committed state”, then the system is consistent and ready
for new transaction and the old transaction is terminated.
Concurrency control concept comes under the Transaction in database management system (DBMS). It is a procedure
in DBMS which helps us for the management of two simultaneous processes to execute without conflicts between each
other, these conflicts occur in multi user systems.
Concurrency can simply be said to be executing multiple transactions at a time. It is required to increase time efficiency.
If many transactions try to access the same data, then inconsistency arises. Concurrency control required to maintain
consistency data.
For example, if we take ATM machines and do not use concurrency, multiple persons cannot draw money at a time in
different places. This is where we need concurrency.
Advantages
Control concurrency
The simultaneous execution of transactions over shared databases can create several data integrity and consistency
problems.
For example, if too many people are logging in the ATM machines, serial updates and synchronization in the bank
servers should happen whenever the transaction is done, if not it gives wrong information and wrong data in the
database.
Main problems in using Concurrency
The problems which arise while using concurrency are as follows −
Updates will be lost − One transaction does some changes and another transaction deletes that change. One
transaction nullifies the updates of another transaction.
Uncommitted Dependency or dirty read problem − On variable has updated in one transaction, at the same time another
transaction has started and deleted the value of the variable there the variable is not getting updated or committed that
has been done on the first transaction this gives us false values or the previous values of the variables this is a major
problem.
Inconsistent retrievals − One transaction is updating multiple different variables, another transaction is in a process to
update those variables, and the problem occurs is inconsistency of the same variable in different instances.
Locking
Lock guaranties exclusive use of data items to a current transaction. It first accesses the data items by acquiring a lock,
after completion of the transaction it releases the lock.
Types of Locks
● Shared Lock [Transaction can read only the data item values]
● Exclusive Lock [Used for both read and write data item values]
Time Stamping
Time stamp is a unique identifier created by DBMS that indicates relative starting time of a transaction. Whatever
transaction we are doing it stores the starting time of the transaction and denotes a specific time.
This can be generated using a system clock or logical counter. This can be started whenever a transaction is started.
Here, the logical counter is incremented after a new timestamp has been assigned.
Optimistic
It is based on the assumption that conflict is rare and it is more efficient to allow transactions to proceed without
imposing delays to ensure serializability.
Transaction property
The transaction has the four properties. These are used to maintain consistency in a database, before and after the
transaction.
Atomicity
It states that all operations of the transaction take place at once if not, the transaction is aborted.
There is no midway, i.e., the transaction cannot occur partially. Each transaction is treated as one unit and either run to
completion or is not executed at all.
Atomicity involves the following two operations:
● Abort: If a transaction aborts then all the changes made are not visible.
● Commit: If a transaction commits then all the changes made are visible.
Example: Let's assume that following transaction T consisting of T1 and T2. A consists of Rs 600 and B consists of Rs
300. Transfer Rs 100 from account A to account B.
T1 T2
Read(A)
A:= A-100
Write(A) Read(B)
Y:= Y+100
Write(B)
After completion of the transaction, A consists of Rs 500 and B consists of Rs 400.
If the transaction T fails after the completion of transaction T1 but before completion of transaction T2, then the amount
will be deducted from A but not added to B. This shows the inconsistent database state. In order to ensure correctness
of database state, the transaction must be executed in entirety.
Consistency
The integrity constraints are maintained so that the database is consistent before and after the transaction.
The execution of a transaction will leave a database in either its prior stable state or a new stable state.
The consistent property of database states that every transaction sees a consistent database instance.
The transaction is used to transform the database from one consistent state to another consistent state.
For example: The total amount must be maintained before or after the transaction.
Isolation
It shows that the data which is used at the time of execution of a transaction cannot be used by the second transaction
until the first one is completed.
In isolation, if the transaction T1 is being executed and using the data item X, then that data item can't be accessed by
any other transaction T2 until the transaction T1 ends.
The concurrency control subsystem of the DBMS enforced the isolation property.
Durability
The durability property is used to indicate the performance of the database's consistent state. It states that the
transaction made the permanent changes.
They cannot be lost by the erroneous operation of a faulty transaction or by the system failure. When a transaction is
completed, then the database reaches a state known as the consistent state. That consistent state cannot be lost, even
in the event of a system's failure.
The recovery subsystem of the DBMS has the responsibility of Durability property.
a. Schedules
b. Characterizing schedules based on serialization
c. Characterizing schedules based on recoverability ?
A. Schedule, as the name suggests, is a process of lining the transactions and executing them one by one. When there
are multiple transactions that are running in a concurrent manner and the order of operation is needed to be set so that
the operations do not overlap each other, Scheduling is brought into play and the transactions are timed accordingly.
The basics of Transactions and Schedules is discussed in Concurrency Control (Introduction), and Transaction Isolation
Levels in DBMS articles. Here we will discuss various types of schedules.
B. Serializability is a concept that is used to ensure that the concurrent execution of multiple transactions does not result
in inconsistencies or conflicts in a database management system. In other words, it ensures that the results of
concurrent execution of transactions are the same as if the transactions were executed one at a time in some order.
A schedule is considered to be serializable if it is equivalent to some serial schedule, which is a schedule where all
transactions are executed one at a time. This means that if a schedule is serializable, it does not result in any
inconsistencies or conflicts in the database.
● Conflict serializability − A schedule is conflict serializable if it is equivalent to some serial schedule and does not
contain any conflicting operations.
● View serializability − A schedule is a view serializable if it is equivalent to some serial schedule, but the order of
the transactions may be different.
To check for conflict serializability, we can use the conflict graph method which involves creating a graph where each
transaction is represented by a node and each conflicting operation is represented by an edge. A schedule is
considered to be conflict serializable if there are no cycles in the graph.
To check for view serializability, we can use the view equivalence method which involves comparing the results of a
schedule with the results of a serial schedule. If the results are the same, the schedule is considered to be view
serializable.
C. Recoverability refers to the ability of a system to restore its state in the event of a failure. The recoverability of a
system is directly impacted by the type of schedule that is used.
A serial schedule is considered to be the most recoverable, as there is only one transaction executing at a time, and it is
easy to determine the state of the system at any given point in time.
A parallel schedule is less recoverable than a serial schedule, as it can be more difficult to determine the state of the
system at any given point in time.
A concurrent schedule is the least recoverable, as it can be very difficult to determine the state of the system at any
given point in time.
● Concerrency control
● Locking portocols
● Timestamp based protocols
● Deadlock prevention, detection and recovery strategies ?
Concurrency control
Concurrency control concept comes under the Transaction in database management system (DBMS). It is a procedure
in DBMS which helps us for the management of two simultaneous processes to execute without conflicts between each
other, these conflicts occur in multi user systems.
Concurrency can simply be said to be executing multiple transactions at a time. It is required to increase time efficiency.
If many transactions try to access the same data, then inconsistency arises. Concurrency control required to maintain
consistency data.
For example, if we take ATM machines and do not use concurrency, multiple persons cannot draw money at a time in
different places. This is where we need concurrency.
Advantages
Locking portocols
Lock-Based Protocols -
It is a mechanism in which a transaction cannot read or write data unless the appropriate lock is acquired. This helps in
eliminating the concurrency problem by locking a particular transaction to a particular user. The lock is a variable that
denotes those operations that can be executed on the particular data item.
● Binary lock: It ensures that the data item can be in either locked or unlocked state
● Shared Lock: A shared lock is also called read only lock because you don’t have permission to update data on
the data item. With this lock data item can be easily shared between different transactions. For example, if two
teams are working on employee payment accounts, they would be able to access it but wouldn’t be able to
modify the data on the payment account.
● Exclusive Lock: With exclusive locks, the data items will not be just read but can also be written
● Simplistic Lock Protocol: this lock protocol allows transactions to get lock on every object at the start of
operation. Transactions are able to unlock the data item after completing the write operations
● Pre-claiming locking: This protocol evaluates the operations and builds a list of the necessary data items which
are required to initiate the execution of the transaction. As soon as the locks are acquired, the execution of
transaction takes place. When the operations are over, then all the locks release.
● Starvation: It is the condition where a transaction has to wait for an indefinite period for acquiring a lock.
● Deadlock: It is the condition when two or more processes are waiting for each other to get a resource released
Timestamp based protocols
Timestamp-based protocols in dbms are used to order the transaction in ascending order of their creation time.
The creation time is the system time or a logical counter.
The transaction which is created first or you can say older transactions are given high priority over new
transactions.
For example, if there are two transactions T1 and T2. T1 enters the system at 008 and T2 enters the system at
009 then T1 is given priority over T2.
The timestamp of the protocol determines the serializability order of the transactions. Timestamp ordering
protocol ensures that any conflicting Read or write operation must follow the timestamp ordering protocols.
Suppose any transaction T tries to perform a Read(X) or Write(X) on item X. In that case, the Basic timestamp
ordering algorithm compares the timestamp of Read(X) and Write(X) with R_TS(X) and W_TS(X) and ensures
that the timestamp ordering protocol is not violated.
Deadlock detection and recovery is the process of detecting and resolving deadlocks in an operating system. A
deadlock occurs when two or more processes are blocked, waiting for each other to release the resources they need.
This can lead to a system-wide stall, where no process can make progress.
There are two main approaches to deadlock detection and recovery:
Prevention: The operating system takes steps to prevent deadlocks from occurring by ensuring that the system is
always in a safe state, where deadlocks cannot occur. This is achieved through resource allocation algorithms such as
the Banker’s Algorithm.
Detection and Recovery: If deadlocks do occur, the operating system must detect and resolve them. Deadlock detection
algorithms, such as the Wait-For Graph, are used to identify deadlocks, and recovery algorithms, such as the Rollback
and Abort algorithm, are used to resolve them. The recovery algorithm releases the resources held by one or more
processes, allowing the system to continue to make progress.
Difference Between Prevention and Detection/Recovery: Prevention aims to avoid deadlocks altogether by carefully
managing resource allocation, while detection and recovery aim to identify and resolve deadlocks that have already
occurred.
Deadlock detection and recovery is an important aspect of operating system design and management, as it affects the
stability and performance of the system. The choice of deadlock detection and recovery approach depends on the
specific requirements of the system and the trade-offs between performance, complexity, and risk tolerance. The
operating system must balance these factors to ensure that deadlocks are effectively detected and resolved.
Deadlock Recovery :
A traditional operating system such as Windows doesn’t deal with deadlock recovery as it is a
time and space-consuming process. Real-time operating systems use Deadlock recovery.
1. Killing the process –
Killing all the processes involved in the deadlock. Killing process one by one. After killing
each process check for deadlock again and keep repeating the process till the system
recovers from deadlock. Killing all the processes one by one helps a system to break
2. Resource Preemption –
Resources are preempted from the processes involved in the deadlock, and preempted
resources are allocated to other processes so that there is a possibility of recovering the
system from the deadlock. In this case, the system goes into starvation.
ensure that concurrent processes do not access the same data at the same time, which
can lead to inconsistencies and errors. Deadlocks can occur in concurrent systems when
two or more processes are blocked, waiting for each other to release the resources they
need. This can result in a system-wide stall, where no process can make progress.
shared resources and ensuring that concurrent processes do not interfere with each
other.
Algorithm for Recovery and Isolation Exploiting Semantics (ARIES) is based on the Write Ahead Log (WAL) protocol.
Every update operation writes a log record which is one of the following :
In it, every log record is assigned a unique and monotonically increasing log sequence number (LSN). Every data page
has a page LSN field that is set to the LSN of the log record corresponding to the last update on the page. WAL requires
that the log record corresponding to an update make it to stable storage before the data page corresponding to that
update is written to disk. For performance reasons, each log write is not immediately forced to disk. A log tail is
maintained in main memory to buffer log writes. The log tail is flushed to disk when it gets full. A transaction cannot be
declared committed until the commit log record makes it to disk.
Once in a while the recovery subsystem writes a checkpoint record to the log. The checkpoint record contains the
transaction table and the dirty page table. A master log record is maintained separately, in stable storage, to store the
LSN of the latest checkpoint record that made it to disk. On restart, the recovery subsystem reads the master log record
to find the checkpoint’s LSN, reads the checkpoint record, and starts recovery from there on.
● Analysis:
The recovery subsystem determines the earliest log record from which the next pass must start. It also scans
the log forward from the checkpoint record to construct a snapshot of what the system looked like at the instant
of the crash.
● Redo:
Starting at the earliest LSN, the log is read forward and each update redone.
● Undo:
The log is scanned backward and updates corresponding to loser transactions are undone.
Shadow Paging is recovery technique that is used to recover database. In this recovery technique, database is
considered as made up of fixed size of logical units of storage which are referred as pages. pages are mapped into
physical blocks of storage, with help of the page table which allow one entry for each logical page of database. This
method uses two page tables named current page table and shadow page table. The entries which are present in
current page table are used to point to most recent database pages on disk. Another table i.e., Shadow page table is
used when the transaction starts which is copying current page table. After this, shadow page table gets saved on disk
and current page table is going to be used for transaction. Entries present in current page table may be changed during
execution but in shadow page table it never get changed. After transaction, both tables become identical. This
technique is also known as Cut-of-Place updating.
To understand concept, consider above figure. In this 2 write operations are performed on page 3 and 5. Before start of
write operation on page 3, current page table points to old page 3. When write operation starts following steps are
performed :
● All the modifications which are done by transaction which are present in buffers are transferred to physical
database.
● Output current page table to disk.
● Disk address of current page table output to fixed location which is in stable storage containing address of
shadow page table. This operation overwrites address of old shadow page table. With this current page table
becomes same as shadow page table and transaction is committed.
Advantages :
Disadvantages :
● Due to location change on disk due to update database it is quite difficult to keep related pages in database
closer on disk.
● During commit operation, changed blocks are going to be pointed by shadow page table which have to be
returned to collection of free blocks otherwise they become accessible.
● The commit of single transaction requires multiple blocks which decreases execution speed.
● To allow this technique to multiple transactions concurrently it is difficult.
● Data fragmentation: The main disadvantage of this technique is the updated Data will suffer from fragmentation
as the data is divided up into pages that may or not be in linear order for large sets of related hence, complex
storage management strategies.
In Advanced Database Management Systems (ADBMS), recovery refers to the process of restoring a database to a
consistent state after a failure or an abnormal termination. Here are explanations for the recovery concepts you
mentioned:
1. **Write-Ahead Logging:**
Write-Ahead Logging (WAL) is a protocol used to ensure database durability and consistency. In WAL, changes (such
as modifications or additions) to the database are first recorded in a log before being applied to the actual database.
This means that before any data is written to the database, a log entry is made. This protocol guarantees that changes
are logged before the corresponding data is updated in the database itself, ensuring that in the event of a system crash,
the system can recover by using the log to redo or undo changes.
3. **Rollbacks:**
Rollbacks refer to the process of undoing a set of transactions or changes that have not been committed yet. When a
transaction is rolled back, all its changes are reverted, and the database returns to the state it was in before the
transaction began. This is essential in maintaining the integrity of the database, especially when a transaction fails or
needs to be canceled.
4. **Deferred Updates:**
Deferred updates refer to delaying the application of modifications until a certain point in the transaction. In this
approach, changes made by a transaction are recorded in a separate space or buffer and are only applied to the actual
database when the transaction is committed successfully. This ensures that changes are not visible to other
transactions until the entire transaction is completed.
5. **Immediate Updates:**
Immediate updates, in contrast to deferred updates, directly apply changes to the database as the transaction
progresses. This means that changes become immediately visible and permanent once the database system confirms
the success of each operation within the transaction.
In ADBMS, these recovery concepts and methods are crucial for maintaining the database's consistency, durability, and
integrity, particularly in the event of system failures, crashes, or other unforeseen events. These mechanisms ensure
that data remains in a valid and reliable state despite such occurrences.
Temporal databases are databases that are designed to store and manage data with a temporal aspect, which means
they can record and query data at different points in time. Here's an explanation of various temporal database concepts
in the context of an Active Database Management System (ADBMS):
Tuple Versioning:
Tuple versioning in an ADBMS refers to maintaining multiple versions of the same tuple over time. Each time a tuple is
updated, a new version is created, allowing users to track changes and access historical data. This is particularly useful
for auditing, historical analysis, and regulatory compliance.
Bitemporal Databases:
Bitemporal databases combine both valid time and transaction time information. They track when data was valid in the
real world (valid time) and when it was recorded or modified in the database (transaction time). Bitemporal databases
provide a comprehensive view of data changes and are often used in applications where data correctness and historical
tracking are critical, such as financial systems.
Attribute Versioning:
Attribute versioning is an approach where specific attributes of a tuple can have multiple versions independently of each
other. This means that some attributes may change over time while others remain constant. Attribute versioning is
valuable in scenarios where only certain aspects of an entity evolve or need to be tracked.
A temporal database is a database that needs some aspect of time for the organization of information. In the temporal
database, each tuple in relation is associated with time. It stores information about the states of the real world and time.
The temporal database does store information about past states it only stores information about current states.
Whenever the state of the database changes, the information in the database gets updated. In many fields, it is very
necessary to store information about past states. For example, a stock database must store information about past
stock prizes for analysis. Historical information can be stored manually in the schema.
Valid Time: The valid time is a time in which the facts are true with respect to the real world.
Transaction Time: The transaction time of the database is the time at which the fact is currently present in the database.
Decision Time: Decision time in the temporal database is the time at which the decision is made about the fact.
Temporal databases use a relational database for support. But relational databases have some problems in temporal
database, i.e. it does not provide support for complex operations. Query operations also provide poor support for
performing temporal queries.
It can be used in Factory Monitoring System for storing information about current and past readings of sensors in the
factory.
Healthcare: The histories of the patient need to be maintained for giving the right treatment.
Banking: For maintaining the credit histories of the user.
2. Bi-Temporal Relation: The relation which is associated with both valid time and transaction time is called a
Bi-Temporal relation. Valid time has two parts namely start time and end time, similar in the case of transaction time.
3. Tri-Temporal Relation: The relation which is associated with three aspects of time namely Valid time, Transaction
time, and Decision time called as Tri-Temporal relation.
Spatial data support in database is important for efficiently storing, indexing and querying of data on the basis of spatial
location. For example, suppose that we want to store a set of polygons in a database and to query the database to find
all polygons that intersect a given polygon. We cannot use standard index structures, such as B-trees or hash indices,
to answer such a query efficiently. Efficient processing of the above query would require special-purpose index
structures, such as R-trees for the task.
Computer-aided design (CAD)data, which include spatial information about how objects-such as building, cars, or
aircraft-are constructed. Other important examples of computer-aided-design databases are integrated-circuit and
electronic-device layouts.
CAD systems traditionally stored data in memory during editing or other processing, and wrote the data back to a file at
the end of a session of editing. The drawbacks of such a schema include cost(programming complexity, as well as time
cost) of transforming data from one form to another, and the need to read in an entire file even if only parts of it are
required. For large design of an entire airplane, it may be impossible to hold the complete design in memory. Designers
of object oriented database were motivated in large part by the database requirements of CAD systems. Object-oriented
database represent components of design as objects, and the connections between the objects indicate how the design
is structure.
Geographic data such as road maps, land-usage maps, topographic elevation maps, political maps showing
boundaries, land-ownership maps, and so on. Geographical information system are special purpose databases for
storing geographical data. Geographical data are differ from design data in certain ways. Maps and satellite images are
typical examples of geographic data. Maps may provide not only location information associated with locations such as
elevations. Soil type, land type and annual rainfall.
Raster data
Vector data
1.Raster data: Raster data consist of pixels also known as grid cells in two or more dimensions. For example, image of
Satellites , digital pictures, and scanned maps.
2.Vector data: Vector data consist of triangles, lines, and various geometrical objects in two dimensions and cylinders,
cuboids, and other polyhedrons in three dimensions. For example, building boundaries and roads.
Microsoft SQL server: Since the 2008 version of Microsoft SQL server supported spatial databases.
CouchDB : This is document-based database in which spatial data is enabled by plugin called GeoCouch.
Neo4j database.
Map data : Map data includes different types of spatial features of objects in map, e.g – an object’s shape and location
of object within map. The three basic types of features are points, lines, and polygons (or areas).
Points – Points are used to represent spatial characteristics of objects whose locations correspond to single 2-D
coordinates (x, y, or longitude/latitude) in the scale of particular application. For examples : Buildings, cellular towers, or
stationary vehicles. Moving vehicles and other moving objects can be represented by sequence of point locations that
change over time.
Lines – Lines represent objects having length, such as roads or rivers, whose spatial characteristics can be
approximated by sequence of connected lines.
Polygons – Polygons are used to represent characteristics of objects that have boundary, like states, lakes, or countries.
Attribute data : It is the descriptive data that Geographic Information Systems associate with features in the map. For
example, in map representing countries within an Indian state (ex – Odisha or Mumbai). Attributes- Population, largest
city/town, area in square miles, and so on.
Image data : It includes camera created data like satellite images and aerial photographs. Objects of interest, such as
buildings and roads, can be identified and overlaid on these images. Aerial and satellite images are typical examples of
raster data.
Models of Spatial Information : It is divided into two categories :
Field : These models are used to model spatial data that is continuous in nature, e.g. terrain elevation, air quality index,
temperature data, and soil variation characteristics.
Object : These models have been used for applications such as transportation networks, land parcels, buildings, and
other objects that possess both spatial and non-spatial attributes. A spatial application is modeled using either field or
an object based model, which depends on the requirements and the traditional choice of model for the application.
Q.6)
Multimedia databases deal with the storage, retrieval, and management of multimedia data, which can include text,
images, audio, video, and other types of multimedia content. Let's discuss how SQLite and peer-to-peer mobile
databases relate to multimedia databases:
1) SQLite:
SQLite is a widely used relational database management system (RDBMS) that is particularly suitable for embedded
systems, mobile devices, and applications that require a lightweight and self-contained database. It can also be used in
multimedia database applications. Here's how SQLite is relevant to multimedia databases:
- Storage of Metadata: SQLite can be used to store metadata about multimedia files. For example, in a multimedia
database, you can create tables to store information about each multimedia object, such as the title, author, date, file
location, and descriptions. SQLite's structured storage allows for efficient organization and retrieval of metadata.
- Indexing and Search: SQLite supports indexing, which can significantly enhance the performance of multimedia
database queries. You can create indexes on specific attributes (e.g., title, author) to speed up search operations,
allowing users to find multimedia content quickly.
- Query and Retrieval: SQLite provides a powerful SQL query language for searching, filtering, and retrieving
multimedia content based on various criteria. You can use SQL statements to retrieve multimedia files matching certain
conditions, making it suitable for building multimedia database applications.
2) Peer-to-Peer Mobile Database:
A peer-to-peer (P2P) mobile database is a database system designed to operate in a decentralized, peer-to-peer
network, where mobile devices communicate and share data directly with each other. In the context of multimedia
databases, a peer-to-peer mobile database can offer unique advantages:
- Distributed Multimedia Sharing: Peer-to-peer mobile databases can enable users to share multimedia content
directly from their mobile devices to others in the network. For example, users can share photos, videos, or audio files
without relying on a central server.
- Offline Collaboration: In scenarios where network connectivity may be intermittent or unavailable, P2P mobile
databases can allow users to collaborate and share multimedia content even when not connected to the internet. Data
synchronization can occur when devices come into contact with each other.
- Decentralized Data Management: In multimedia databases, users may have multimedia content on their mobile
devices that they want to share or collaborate on. A P2P mobile database allows for distributed and decentralized data
management, ensuring that each user's device acts as a node in the network.
- Redundancy and Resilience: P2P mobile databases can provide redundancy and resilience for multimedia content. If
one device goes offline, others can still access and share the content, making it useful for multimedia applications
requiring data availability and fault tolerance.
Overall, both SQLite and peer-to-peer mobile databases have their respective roles in multimedia database
applications. SQLite is suitable for structured storage and query capabilities, while peer-to-peer mobile databases offer
decentralized, collaborative, and offline-capable solutions for sharing and managing multimedia content among mobile
devices.
A cache is maintained to hold frequent and transactions so that they are not lost due to connection failure.
As the use of laptops, mobile and PDAs is increasing to reside in the mobile system.
Mobile databases are physically separate from the central database server.
Mobile databases resided on mobile devices.
Mobile databases are capable of communicating with a central database server or other mobile clients from remote
sites.
With the help of a mobile database, mobile users must be able to work without a wireless connection due to poor or
even non-existent connections (disconnected).
A mobile database is used to analyze and manipulate data on mobile devices.
Mobile Database typically involves three parties :
Fixed Hosts –
It performs the transactions and data management functions with the help of database servers.
Mobiles Units –
These are portable computers that move around a geographical region that includes the cellular network that these
units use to communicate to base stations.
Base Stations –
These are two-way radios installation in fixed locations, that pass communication with the mobile units to and from the
fixed hosts.
Limitations :
Here, we will discuss the limitation of mobile databases as follows.
Distributed database management basically proposed for the various reason from organizational decentralization and
economical processing to greater autonomy. Some of these advantages are as follows:
Network transparency:
This basically refers to the freedom for the user from the operational details of the network. These are of two types
Location and naming transparency.
Replication transparencies:
It basically made user unaware of the existence of copies as we know that copies of data may be stored at multiple sites
for better availability performance and reliability.
Fragmentation transparency:
It basically made user unaware about the existence of fragments it may be the vertical fragment or horizontal
fragmentation.
2. Increased Reliability and availability –
Reliability is basically defined as the probability that a system is running at a certain time whereas Availability is defined
as the probability that the system is continuously available during a time interval. When the data and DBMS software
are distributed over several sites one site may fail while other sites continue to operate and we are not able to only
access the data that exist at the failed site and this basically leads to improvement in reliability and availability.
3. Easier Expansion –
In a distributed environment expansion of the system in terms of adding more data, increasing database sizes or adding
more processor is much easier.
4. Improved Performance –
We can achieve interquery and intraquery parallelism by executing multiple queries at different sites by breaking up a
query into a number of subqueries that basically executes in parallel which basically leads to improvement in
performance.
Improved scalability: Distributed databases can be scaled horizontally by adding more nodes to the network. This allows
for increased capacity and performance as data and user demand grow.
Increased availability: Distributed databases can provide increased availability and uptime by distributing the data
across multiple nodes. If one node goes down, the data can still be accessed from other nodes in the network.
Increased flexibility: Distributed databases can be more flexible than centralized databases, allowing data to be stored
in a way that best suits the needs of the application or user.
Improved fault tolerance: Distributed databases can be designed with redundancy and failover mechanisms that allow
the system to continue operating in the event of a node failure.
Improved security: Distributed databases can be more secure than centralized databases by implementing security
measures at the network, node, and application levels.
1)Autonomous − Each database is independent that functions on its own. They are integrated by a controlling
application and use message passing to share data updates.
2)Non-autonomous − Data is distributed across the homogeneous nodes and a central or master DBMS co-ordinates
data updates across the sites.
2)The system may be composed of a variety of DBMSs like relational, network, hierarchical or object oriented.
2)Un-federated − The database systems employ a central coordinating module through which the databases are
accessed.
A Distributed Database System is a kind of database that is present or divided in more than one location, which means
it is not limited to any single computer system. It is divided over the network of various systems. The Distributed
Database System is physically present on the different systems in different locations. This can be necessary when
different users from all over the world need to access a specific database. For a user, it should be handled in such a
way that it seems like a single database.
Client−Server Architecture
A common method for spreading database functionality is the client−server architecture. Clients communicate with a
central server, which controls the distributed database system, in this design. The server is in charge of maintaining
data storage, controlling access, and organizing transactions. This architecture has several clients and servers
connected. A client sends a query and the server which is available at the earliest would help solve it. This Architecture
is simple to execute because of the centralised server system.
Peer−to−Peer Architecture
Each node in the distributed database system may function as both a client and a server in a peer−to−peer architecture.
Each node is linked to the others and works together to process and store data. Each node is in charge of managing its
data management and organizing node−to−node interactions. Because the loss of a single node does not cause the
system to collapse, peer−to−peer systems provide decentralized control and high fault tolerance. This design is ideal for
distributed systems with nodes that can function independently and with equal capabilities.
Federated
Architecture
Multiple independent databases with various types are combined into a single meta−database using a federated
database design. It offers a uniform interface for navigating and exploring distributed data. In the federated design, each
site maintains a separate, independent database, while the virtual database manager internally distributes requests.
When working with several data sources or legacy systems that can't be simply updated, federated architectures are
helpful.
Shared−Nothing
Architecture
Data is divided up and spread among several nodes in a shared−nothing architecture, with each node in charge of a
particular portion of the data. Resources are not shared across nodes, and each node runs independently. Due to the
system's capacity to add additional nodes as needed without affecting the current nodes, this design offers great
scalability and fault tolerance. Large−scale distributed systems, such as data warehouses or big data analytics
platforms, frequently employ shared−nothing designs.
Q.4)EXPLAIN WITH RESPECT TO DISTRIBUTED DATABASES
1)
Distributed databases (DDBs) are databases that are distributed across multiple interconnected computers or nodes,
and they offer several advantages, such as improved data availability and fault tolerance. However, they also introduce
a set of unique design and operational challenges. Let's discuss the key issues related to distributed databases:
Query Processing:
In distributed databases, query processing involves the distribution of queries across multiple nodes and the
coordination of their results. Key considerations include:
a) Query Optimization: Optimizing query execution plans to minimize data transfer and maximize parallel processing.
b) Distributed Query Execution: Coordinating the execution of subqueries on different nodes and merging the results.
c) Data Localization: Exploiting data locality to reduce data transfer costs.
Transaction Management:
Transaction management in distributed databases ensures that multiple operations on the database maintain the ACID
(Atomicity, Consistency, Isolation, Durability) properties. Key aspects include:
a) Two-Phase Commit (2PC): Ensuring that distributed transactions are committed or aborted consistently across all
participating nodes.
b) Distributed Lock Management: Coordinating access to shared resources while maintaining isolation and avoiding
deadlocks.
c) Recovery and Logging: Managing logs to recover the database to a consistent state in case of failures.
The following are the main control measures are used to provide security of
data in databases:
1. Authentication
2. Access control
3. Inference control
4. Flow control
5. Database Security applying Statistical Method
6. Encryption
These are explained as following below.
Authentication :
Authentication is the process of confirmation that whether the user log in only
according to the rights provided to him to perform the activities of data base. A
particular user can login only up to his privilege but he can’t access the other
sensitive data. The privilege of accessing sensitive data is restricted by using
Authentication.
By using these authentication tools for biometrics such as retina and figure
prints can prevent the data base from unauthorized/malicious users.
Access Control :
The security mechanism of DBMS must include some provisions for restricting
access to the data base by unauthorized users. Access control is done by
creating user accounts and to control login process by the DBMS. So, that
database access of sensitive data is possible only to those people (database
users) who are allowed to access such data and to restrict access to
unauthorized persons.
The database system must also keep the track of all operations performed by
certain user throughout the entire login time.
Inference Control :
This method is known as the countermeasures to statistical database security
problem. It is used to prevent the user from completing any inference channel.
This method protect sensitive information from indirect disclosure.
Inferences are of two types, identity disclosure or attribute disclosure.
Flow Control :
This prevents information from flowing in a way that it reaches unauthorized
users. Channels are the pathways for information to flow implicitly in ways that
violate the privacy policy of a company are called convert channels.
Database Security applying Statistical Method :
Statistical database security focuses on the protection of confidential individual
values stored in and used for statistical purposes and used to retrieve the
summaries of values based on categories. They do not permit to retrieve the
individual information.
This allows to access the database to get statistical information about the
number of employees in the company but not to access the detailed
confidential/personal information about the specific individual employee.
Encryption :
This method is mainly used to protect sensitive data (such as credit card
numbers, OTP numbers) and other sensitive numbers. The data is encoded using
some encoding algorithms.
An unauthorized user who tries to access this encoded data will face difficulty in
decoding it, but authorized users are given decoding keys to decode data.
Discretionary access control (DAC) is a type of security access control that grants or restricts object access
via an access policy determined by an object’s owner group and/or subjects. DAC mechanism controls are
defined by user identification with supplied credentials during authentication, such as username and
password. DACs are discretionary because the subject (owner) can transfer authenticated objects or
information access to other users. In other words, the owner determines object access privileges.
In DAC, each system object (file or data object) has an owner, and each initial object owner is the subject that
causes its creation. Thus, an object’s access policy is determined by its owner.
A typical example of DAC is Unix file mode, which defines the read, write and execute permissions in each of
● Unauthorized users are blind to object characteristics, such as file size, file name and directory path.
● Object access is determined during access control list (ACL) authorization and based on user
DAC is easy to implement and intuitive but has certain disadvantages, including: