0% found this document useful (0 votes)
12 views

ADBMS Answer Bank

The document discusses various concepts related to database management systems including: 1. Indexing and hashing techniques are covered along with different types of storage devices like primary, secondary, and tertiary storage. 2. Key concepts such as records, blocking, file descriptors, and blocking factor are defined and explained with examples. 3. Common file operations like open, read, write and close are outlined along with disadvantages of file systems.

Uploaded by

boran28232
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views

ADBMS Answer Bank

The document discusses various concepts related to database management systems including: 1. Indexing and hashing techniques are covered along with different types of storage devices like primary, secondary, and tertiary storage. 2. Key concepts such as records, blocking, file descriptors, and blocking factor are defined and explained with examples. 3. Common file operations like open, read, write and close are outlined along with disadvantages of file systems.

Uploaded by

boran28232
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 90

Advanced Database Management

Systems
______________________________

Module 1 : Indexing and Hashing Techniques


____________________________________________________
1. Explain Disk Storage Devices

Databases are stored in file formats, which contain records. At physical level, the actual
data is stored in electromagnetic format on some device. These storage devices can be
broadly categorized into three types −
​ Primary Storage − The memory storage that is directly accessible to the CPU
comes under this category. CPU's internal memory (registers), fast memory
(cache), and main memory (RAM) are directly accessible to the CPU, as they are all
placed on the motherboard or CPU chipset. This storage is typically very small,
ultra-fast, and volatile. Primary storage requires continuous power supply in order
to maintain its state. In case of a power failure, all its data is lost.
​ Secondary Storage − Secondary storage devices are used to store data for future
use or as backup. Secondary storage includes memory devices that are not a part
of the CPU chipset or motherboard, for example, magnetic disks, optical disks
(DVD, CD, etc.), hard disks, flash drives, and magnetic tapes.
​ Tertiary Storage − Tertiary storage is used to store huge volumes of data. Since
such storage devices are external to the computer system, they are the slowest in
speed. These storage devices are mostly used to take the back up of an entire
system. Optical disks and magnetic tapes are widely used as tertiary storage.

Disk storage is a fundamental component in computer systems, providing


non-volatile storage for data and enabling the persistence of information even
when the power is turned off. Two primary types of disk storage devices are
commonly used in computing: Hard Disk Drives (HDD) and Solid State Drives
(SSD).

Hard Disk Drives (HDD) utilize magnetic storage technology, with data stored on
spinning disks called platters. These platters, along with read/write heads,
constitute the main components of an HDD. HDDs are known for their relatively
large storage capacities and cost-effectiveness per gigabyte. However, they have
mechanical parts and, as a result, exhibit slower access times and data transfer
rates compared to SSDs. HDDs are often employed in scenarios where massive
data storage is required, such as in servers and desktop computers.

On the other hand, Solid State Drives (SSD) leverage NAND-based flash memory
technology. Unlike HDDs, SSDs have no moving parts, storing data electronically
on NAND flash memory chips. This absence of mechanical components contributes
to faster access times, increased data transfer rates, and enhanced durability.
While historically more expensive per gigabyte, SSD prices have been decreasing,
and they have become increasingly popular for applications requiring high-speed
data access. SSDs are commonly used as the primary storage for operating
systems, applications, and in scenarios where rapid data retrieval is crucial, such
as in gaming systems.

2. Explain with examples

### 1. **Records:**
- **Definition:** Records are units of data within a database or file
that contain information about a particular entity. In a database
context, a record represents a row in a table, and it consists of fields
that hold specific pieces of data.
- **Example:** In a student database, a record might represent an
individual student and include fields such as "Student ID," "Name,"
"Age," and "Grade."

### 2. **Blocking:**
- **Definition:** Blocking refers to the practice of grouping
multiple records together into a block or a cluster when storing data
on disk. This is done to improve data retrieval efficiency.

- **Example:** If each record in a file contains information about a


customer, blocking might involve grouping several customer records
together in a block on the disk. This way, when the system needs to
access one record, it fetches an entire block, improving data retrieval
speed.

### 3. **Unspanned and Spanned Records:**


- **Unspanned Records:** In unspanned records, each record
must entirely fit within a single block on the storage medium. If a
record is too large to fit, it cannot be stored in that block.
- **Example:** If a block on disk can only hold 1,000 bytes, and a
record is 1,200 bytes, unspanned records would dictate that the record
cannot be stored in a single block.

- **Spanned Records:** In spanned records, a record can span


multiple blocks. This allows larger records to be accommodated, with
each block holding a portion of the record.

- **Example:** Using the same example, if a record is 1,200 bytes,


a spanned record approach would allow the system to store the record
across two blocks, with each block containing a segment of the record.

### 4. **File:**
- **Definition:** A file is a collection of related data stored
together with a file name. Files are used to organize and store data on
a computer's storage media.

- **Example:** An employee database file could include records for


each employee, with fields such as "Employee ID," "Name," "Position,"
and "Salary."
### 5. **File Descriptor:**
- **Definition:** A file descriptor is a data structure or an
identifier that the operating system uses to represent an open file. It
contains information about the file, such as its current position, access
mode, and other attributes.

- **Example:** When a program opens a file, the operating system


assigns it a file descriptor. The program can then use this file
descriptor to read from or write to the file.

### 6. **Blocking Factor (BFR):**


- **Definition:** Blocking factor (BFR) is the number of records
that can be accommodated in one block of storage. It is calculated by
dividing the block size by the size of each record.

- **Example:** If a block on disk can hold 2,000 bytes, and each


record in a file is 200 bytes, the blocking factor would be 2,000 / 200
= 10. This means that 10 records can be stored in a single block.
In summary, these concepts are fundamental in the design and
management of databases and file systems, contributing to efficient
data organization, storage, and retrieval.

3. Explain Operation on Files.

Operations on database files can be broadly classified into two


categories −

Update Operations
Retrieval Operations

Update operations change the data values by insertion, deletion, or


update. Retrieval operations, on the other hand, do not alter the data
but retrieve them after optional conditional filtering.

In both types of operations, selection plays a significant role. Other


than creation and deletion of a file, there could be several
operations, which can be done on files.
Open − A file can be opened in one of the two modes, read mode or
write mode. In read mode, the operating system does not allow
anyone to alter data. In other words, data is read only. Files opened
in read mode can be shared among several entities. Write mode
allows data modification. Files opened in write mode can be read but
cannot be shared.
Locate − Every file has a file pointer, which tells the current position
where the data is to be read or written. This pointer can be adjusted
accordingly. Using find (seek) operation, it can be moved forward or
backward.
Read − By default, when files are opened in read mode, the file
pointer points to the beginning of the file. There are options where
the user can tell the operating system where to locate the file
pointer at the time of opening a file. The very next data to the file
pointer is read.
Write − User can select to open a file in write mode, which enables
them to edit its contents. It can be deletion, insertion, or
modification. The file pointer can be located at the time of opening
or can be dynamically changed if the operating system allows to do
so.
Close − This is the most important operation from the operating
system’s point of view. When a request to close a file is generated,
the operating system
removes all the locks (if in shared mode),
saves the data (if altered) to the secondary storage media, and
releases all the buffers and file handlers associated with the file.
The organization of data inside a file plays a major role here. The
process to locate the file pointer to a desired record inside a file
various based on whether the records are arranged sequentially or
clustered.

Disadvantages of File system

1. Data redundancy and inconsistency.


2. Difficulty in accessing data
3. Data isolation
4. Integrity problems.
5. Atomicity problems.
4. Explain Unordered Files and Ordered Files with average access time.
5. What is indexing?
6. Explain Index Evaluation Metrics

Access Types Supported Efficiently:

Definition: This metric assesses the types of access operations that the
index can efficiently support. It focuses on the ability of the index to
facilitate quick retrieval of records based on specific criteria.
Example: An index might efficiently support access operations for
records with a specified value in a particular attribute or for records with
an attribute value falling within a specified range of values.
Access Time:

Definition: Access time measures the time required to locate and retrieve
records using the index. It is a crucial metric for assessing the speed and
efficiency of index-based data retrieval operations.
Example: A well-designed index should minimize access time, ensuring
that queries are executed swiftly.

Insertion Time:

Definition: Insertion time refers to the time required to add a new record
to the database when an index is present. It evaluates how the presence
of the index impacts the efficiency of insertion operations.
Example: If an index significantly slows down the insertion of records, it
may affect real-time data processing scenarios.

Deletion Time:

Definition: Deletion time measures the time needed to remove a record


from the database when an index is present. It evaluates the efficiency of
deletion operations with respect to the presence of the index.
Example: An index that slows down deletion operations might be a
concern in situations where frequent data purging is required.

Space Overhead:

Definition: Space overhead evaluates the additional storage requirements


imposed by the index structure. It assesses the impact of the index on the
overall storage space of the database.
Example: If an index consumes a large amount of storage space, it may
lead to increased storage costs and could be less practical in
resource-constrained environments.

These metrics collectively provide a comprehensive evaluation of the


index's performance characteristics, addressing different aspects of data
access, modification, and storage. Database administrators use these
metrics to make informed decisions about the suitability of specific
indexing structures for the requirements of their databases, considering
factors such as query patterns, data distribution, and system constraints.
7. Explain with examples
a. ordered index

b. Primary index

c. Secondary index
d. Sparse Index
8. Explain Single-level Ordered Indexes
Single Level Indexing

It is somewhat like the index (or the table of contents) found in a book.
Index of a book contains topic names along with the page number
similarly the index table of the database contains keys and their
corresponding block address.

Single Level Indexing is further divided into three categories:

1. Primary Indexing: The indexing or the index table created using Primary keys is known as Primary
Indexing. It is defined on ordered data. As the index is comprised of primary keys, they are unique, not
null, and possess one to one relationship with the data blocks.

Example:

Characteristics of Primary Indexing:


● Search Keys are unique.
● Search Keys are in sorted order.
● Search Keys cannot be null as it points to a block of data.
● Fast and Efficient Searching.
2. Secondary Indexing: It is a two-level indexing technique used to reduce the mapping size of the
primary index. The secondary index points to a certain location where the data is to be found but the
actual data is not sorted like in the primary indexing. Secondary Indexing is also known as
non-clustered Indexing.

Example:

Characteristics of Secondary Indexing:

● Search Keys are Candidate Keys.


● Search Keys are sorted but actual data may or may not be sorted.
● Requires more time than primary indexing.
● Search Keys cannot be null.
● Faster than clustered indexing but slower than primary indexing.
3. Cluster Indexing: Clustered Indexing is used when there are multiple related records found at one
place. It is defined on ordered data. The important thing to note here is that the index table of
clustered indexing is created using non-key values which may or may not be unique. To achieve faster
retrieval, we group columns having similar characteristics. The indexes are created using these
groups and this process is known as Clustering Index.

Example:

Characteristics of Clustered Indexing:

● Search Keys are non-key values.


● Search Keys are sorted.
● Search Keys cannot be null.
● Search Keys may or may not be unique.
● Requires extra work to create indexing.

Ordered Indexing:
Ordered indexing is the traditional way of storing that gives fast
retrieval. The indices are stored in a sorted manner hence it is also
known as ordered indices.

Ordered Indexing is further divided into two categories:

1. Dense Indexing: In dense indexing, the index table contains records for every search key value of the
database. This makes searching faster but requires a lot more space. It is like primary indexing but
contains a record for every search key.

Example:

2. Sparse Indexing: Sparse indexing consumes lesser space than dense indexing, but it is a bit slower as
well. We do not include a search key for every record despite that we store a Search key that points to
a block. The pointed block further contains a group of data. Sometimes we have to perform double
searching this makes sparse indexing a bit slower.
Example:

9. Explain Secondary Index


10. Explain the Properties of Index Types

Certainly, let's focus solely on the properties associated with different


index types:

1. **Data Ordering:**
- *Description:* Entries in the index are arranged based on the order of
the indexed data, facilitating efficient range queries and ordered retrieval.

2. **Uniqueness:**
- *Description:* The index is based on unique values, ensuring that each
entry is unique and allowing for fast and accurate searches.

3. **Additional Paths:**
- *Description:* The index is created on non-primary key columns,
providing additional paths for efficient retrieval based on different criteria.

4. **Storage Efficiency:**
- *Description:* The index structure is designed to optimize storage
space, often by not having entries for every possible search key value.
5. **Physical Ordering:**
- *Description:* In a clustered index, the physical order of data records
in the table corresponds to the order of entries in the index, potentially
improving retrieval speed for range queries.

6. **Complete Key Mapping:**


- *Description:* In a dense index, every possible search key value has
an entry in the index, providing a complete mapping between keys and
records.

7. **Bitwise Representation:**
- *Description:* A bitmap index uses a bitmap to represent the
presence or absence of a particular value in the indexed column, enabling
efficient compression and bitwise operations.

Understanding these properties is essential for making decisions about


which index type to use based on the characteristics of the data and the
specific optimization goals for a given database.
11. Explain Index Update
12. Explain with examples B-Trees and B+-Trees.

B Tree
B Tree is a specialized m-way tree that can be widely used for disk access. A B-Tree of order m can have at
most m-1 keys and m children. One of the main reason of using B tree is its capability to store large number
of keys in a single node and large key values by keeping the height of the tree relatively small.

A B tree of order m contains all the properties of an M way tree. In addition, it contains the following
properties.

1. Every node in a B-Tree contains at most m children.

2. Every node in a B-Tree except the root node and the leaf node contain at least m/2 children.

3. The root nodes must have at least 2 nodes.

4. All leaf nodes must be at the same level.

It is not necessary that, all the nodes contain the same number of children but, each node must have m/2
number of nodes. Eg. of order 4 below

While performing some operations on B Tree, any property of B Tree may violate such as number of
minimum children a node can have. To maintain the properties of B Tree, the tree may split or join.
B+ tree

B+ Tree
B+ Tree is an extension of B Tree which allows efficient insertion, deletion and search operations.

In B Tree, Keys and records both can be stored in the internal as well as leaf nodes. Whereas, in B+ tree,
records (data) can only be stored on the leaf nodes while internal nodes can only store the key values.

The leaf nodes of a B+ tree are linked together in the form of a singly linked lists to make the search
queries more efficient.

B+ Tree are used to store the large amount of data which can not be stored in the main memory. Due to the
fact that, size of main memory is always limited, the internal nodes (keys to access records) of the B+ tree
are stored in the main memory whereas, leaf nodes are stored in the secondary memory.

The internal nodes of B+ tree are often called index nodes. A B+ tree of order 3 is shown in the following
figure.

Advantages of B+ Tree

1. Records can be fetched in equal number of disk accesses.

2. Height of the tree remains balanced and less as compare to B tree.

3. We can access the data stored in a B+ tree sequentially as well as directly.

4. Keys are used for indexing.

5. Faster search queries as the data is stored only on the leaf nodes.
13. Difference between B-tree and B+-tree
14. Explain Hashing: static and dynamic techniques

Hashing is a computation technique in which hashing functions take variable-length data


as input and issue a shortened fixed-length data as output. The output data is often
called a "Hash Code", "Key", or simply "Hash". The data on which hashing works is called
a "Data Bucket".

Characteristics of Hashing Technique


Hashing techniques come with the following characteristics −

​ The first characteristic is, hashing technique is deterministic. Means, whatever


number of times you invoke the function on the same test variable, it delivers the
same fixed-length result.
​ The second characteristic is its unidirectional action. There is no way you can use
the Key to retrieve the original data. Hashing is irreversible.

What are Hash Functions?


Hash functions are mathematical functions that are executed to generate addresses of
data records. Hash functions use memory locations that store data, called ‘Data Buckets’.
Hash functions are used in cryptographic signatures, securing privacy of vulnerable data,
and verifying correctness of the received files and texts. In computation, hashing is used
in data processing to locate a single string of data in an array, or to calculate direct
addresses of records on the disk by requesting its Hash Code or Key.

Applications of Hashing
Hashing is applicable in the following area −

​ Password verification
​ Associating filename with their paths in operating systems
​ Data Structures, where a key-value pair is created in which the key is a unique
value, whereas the value associated with the keys can be either same or different
for different keys.
​ Board games such as Chess, tic-tac-toe, etc.
​ Graphics processing, where a large amount of data needs to be matched and
fetched.

Database Management Systems where phenomenal records are required to be searched,


queried, and matched for retrieval. For example, DBMS used in banking or large public
transport reservation software.

Read through this article to find out more about Hashing and specifically the difference
between two important hashing techniques − static hashing and dynamic hashing.

What is Static Hashing?


It is a hashing technique that enables users to lookup a definite data set. Meaning, the
data in the directory is not changing, it is "Static" or fixed. In this hashing technique, the
resulting number of data buckets in memory remains constant.

Operations Provided by Static Hashing

Static hashing provides the following operations −


​ Delete − Search a record address and delete a record at the same address or delete
a chunk of records from records for that address in memory.
​ Insertion − While entering a new record using static hashing, the hash function (h)
calculates bucket address "h(K)" for the search key (k), where the record is going
to be stored.
​ Search − A record can be obtained using a hash function by locating the address of
the bucket where the data is stored.
​ Update − It supports updating a record once it is traced in the data bucket.

Advantages of Static Hashing

Static hashing is advantageous in the following ways −

​ Offers unparalleled performance for small-size databases.


​ Allows Primary Key value to be used as a Hash Key.

Disadvantages of Static Hashing

Static hashing comes with the following disadvantages −

​ It cannot work efficiently with the databases that can be scaled.


​ It is not a good option for large-size databases.
​ Bucket overflow issue occurs if there is more data and less memory.

AD

What is Dynamic Hashing?


It is a hashing technique that enables users to lookup a dynamic data set. Means, the
data set is modified by adding data to or removing the data from, on demand hence the
name ‘Dynamic’ hashing. Thus, the resulting data bucket keeps increasing or decreasing
depending on the number of records.

In this hashing technique, the resulting number of data buckets in memory is


ever-changing.
Operations Provided by Dynamic Hashing

Dynamic hashing provides the following operations −

​ Delete − Locate the desired location and support deleting data (or a chunk of data)
at that location.
​ Insertion − Support inserting new data into the data bucket if there is a space
available in the data bucket.
​ Query − Perform querying to compute the bucket address.
​ Update − Perform a query to update the data.

Advantages of Dynamic Hashing

Dynamic hashing is advantageous in the following ways −

​ It works well with scalable data.


​ It can handle addressing large amount of memory in which data size is always
changing.
​ Bucket overflow issue comes rarely or very late.

Disadvantages of Dynamic Hashing

Dynamic hashing comes with the following disadvantage −

​ The location of the data in memory keeps changing according to the bucket size.
Hence if there is a phenomenal increase in data, then maintaining the bucket
address table becomes a challenge.

AD

Differences between Static and Dynamic Hashing


Here are some prominent differences by which Static Hashing is different than Dynamic
Hashing −
Static Hashing Dynamic Hashing
Key Factor

Fixed-size, non-changing data. Variable-size, changing


Form of Data
data.

The resulting Data Bucket is of The resulting Data Bucket


Result
fixed-length. is of variable-length.

Challenge of Bucket overflow Bucket overflow can


Bucket
can arise often depending upon occur very late or doesn’t
Overflow
memory size. occur at all.

Complexity Simple Complex


AD

Conclusion
Hashing is a computation technique that uses mathematical functions called Hash
Functions to calculate the location (address) of the data in the memory. We learnt that
there are two different hashing functions namely, Static hashing and Dynamic hashing.

Each hashing technique is different in terms of whether they work on fixed-length data
bucket or a variable-length data bucket. Selecting a proper hashing technique is required
by considering the amount of data needed to be handled, and the intended speed of the
application.
Module 2 : Query Processing and Optimization
____________________________________________________
1. Explain basic steps in Query Processing Query Processing in
DBMS
Query Processing is the activity performed in extracting data from the database. In query processing, it
takes various steps for fetching the data from the database. The steps involved are:

1. Parsing and translation

2. Optimization

3. Evaluation

The query processing works in the following way:

Parsing and Translation


As query processing includes certain activities for data retrieval. Initially, the given user queries get
translated in high-level database languages such as SQL. It gets translated into expressions that can be
further used at the physical level of the file system. After this, the actual evaluation of the queries and a
variety of query -optimizing transformations and takes place. Thus before processing a query, a computer
system needs to translate the query into a human-readable and understandable language. Consequently,
SQL or Structured Query Language is the best suitable choice for humans. But, it is not perfectly suitable
for the internal representation of the query to the system. Relational algebra is well suited for the internal
representation of a query. The translation process in query processing is similar to the parser of a query.
When a user executes any query, for generating the internal form of the query, the parser in the system
checks the syntax of the query, verifies the name of the relation in the database, the tuple, and finally the
required attribute value. The parser creates a tree of the query, known as 'parse-tree.' Further, translate it
into the form of relational algebra. With this, it evenly replaces all the use of the views when used in the
query.
Thus, we can understand the working of a query processing in the below-described diagram:

Suppose a user executes a query. As we have learned that there are various methods of extracting the data
from the database. In SQL, a user wants to fetch the records of the employees whose salary is greater than
or equal to 10000. For doing this, the following query is undertaken:

select emp_name from Employee where salary>10000;

Thus, to make the system understand the user query, it needs to be translated in the form of relational
algebra. We can bring this query in the relational algebra form as:

○ σsalary>10000 (πsalary (Employee))

○ πsalary (σsalary>10000 (Employee))

After translating the given query, we can execute each relational algebra operation by using different
algorithms. So, in this way, a query processing begins its working.

Evaluation
For this, with addition to the relational algebra translation, it is required to annotate the translated relational
algebra expression with the instructions used for specifying and evaluating each operation. Thus, after
translating the user query, the system executes a query evaluation plan.

Query Evaluation Plan

○ In order to fully evaluate a query, the system needs to construct a query evaluation plan.

○ The annotations in the evaluation plan may refer to the algorithms to be used for the particular index
or the specific operations.

○ Such relational algebra with annotations is referred to as Evaluation Primitives. The evaluation
primitives carry the instructions needed for the evaluation of the operation.

○ Thus, a query evaluation plan defines a sequence of primitive operations used for evaluating a
query. The query evaluation plan is also referred to as the query execution plan.

○ A query execution engine is responsible for generating the output of the given query. It takes the
query execution plan, executes it, and finally makes the output for the user query.

Optimization

○ The cost of the query evaluation can vary for different types of queries. Although the system is
responsible for constructing the evaluation plan, the user does need not to write their query
efficiently.

○ Usually, a database system generates an efficient query evaluation plan, which minimizes its cost.
This type of task performed by the database system and is known as Query Optimization.

○ For optimizing a query, the query optimizer should have an estimated cost analysis of each
operation. It is because the overall operation cost depends on the memory allocations to several
operations, execution costs, and so on.

Finally, after selecting an evaluation plan, the system evaluates the query and produces the output of the
query.
2. Explain Query Optimization in detail

Query optimization is used to access and modify the database in the most efficient way
possible. It is the art of obtaining necessary information in a predictable, reliable, and timely
manner. Query optimization is formally described as the process of transforming a query
into an equivalent form that may be evaluated more efficiently. The goal of query
optimization is to find an execution plan that reduces the time required to process a query.
We must complete two major tasks to attain this optimization target.

The first is to determine the optimal plan to access the database, and the second is to reduce
the time required to execute the query plan.

Optimizer Components

The optimizer is made up of three parts: the transformer, the estimator, and the plan
generator. The figure below depicts those components.
● Query Transformer: The query transformer determines whether it is advantageous to rewrite the
original SQL statement into a semantically equivalent SQL statement at a lower cost for some
statements.
● Estimator The estimator is the optimizer component that calculates the total cost of a given execution
plan. To determine the cost, the estimator employs three different methods:
○ Selectivity: The query picks a percentage of the rows in the row set, with 0 indicating no rows
and 1 indicating all rows. Selectivity is determined by a query predicate, such as WHERE the
last name LIKE X%, or by a mix of predicates. As the selectivity value approaches zero, a
predicate gets more selective, and as the value nears one, it becomes less selective (or more
unselective).
○ Cardinality: The cardinality of an execution plan is the number of rows returned by each
action. This input is shared by all cost functions and is essential for determining the best
strategy. Cardinality in DBMS can be calculated using DBMS STATS table statistics or after
taking into account the impact of predicates (filter, join, and so on), DISTINCT or GROUP BY
operations, and so on. In an execution plan, the Rows column displays the estimated
cardinality.
○ Cost: This metric represents the number of units of labor or resources used. The query
optimizer uses disc I/O, CPU utilization, and memory usage as units of effort. For example, if
the plan for query A has a lower cost than the plan for query B, then the following outcomes
are possible: A executes faster than B, A executes slower than B or A executes in the same
amount of time as B.
● Plan Generator : The plan generator investigates multiple plans for a query block by experimenting
with various access paths, join methods, and join orders.
3. Explain Query Cost Measurement.
4. Explain Cost estimation of selection algorithms in detail.

Cost Estimation
Here, the overall cost of the algorithm is composed by adding the cost of individual index scans and cost of
fetching the records in the intersection of the retrieved lists of pointers. We can minimize the cost by
sorting the list of pointers and fetching the sorted records. So, we found the following two points for cost
estimation:

○ We can fetch all selected records of the block using a single I/O operation because each pointer in
the block appears together.

○ The disk-arm movement gets minimized as blocks are read in sorted order.

Cost Estimation Chart for various Selection algorithms

Here, br is the number of blocks in the file.

hi denotes the height of the index

b is the number of blocks holding records with specified search key

n is the number of fetched records

Selection Algorithms Cost Why So?

Linear Search ts + br * tT It needs one initial seek with br block transfers.

Linear Search, Equality ts + (br/2) It is the average case where it needs only one record satisfying

on Key * tT the condition. So as soon as it is found, the scan terminates.

Primary B+-tree index, (hi +1) * (tr Each I/O operation needs one seek and one block transfer to

Equality on Key + ts) fetch the record by traversing the height of the tree.

Primary B+-tree index, hi * (tT + ts) It needs one seek for each level of the tree, and one seek for the

Equality on a Nonkey + b * tT first block.


Secondary B+-tree index, (hi + 1) * Each I/O operation needs one seek and one block transfer to

Equality on Key (tr + ts) fetch the record by traversing the height of the tree.

Secondary B+-tree index, (hi + n) * It requires one seek per record because each record may be on a

Equality on Nonkey (tr + ts) different block.

Primary B+-tree index, hi * (tr + ts) It needs one seek for each level of the tree, and one seek for the

Comparison + b * tT first block.

Secondary B+-tree index, (hi + n) * It requires one seek per record because each record may be on a

Comparison (tr + ts) different block.


5. Explain external merge sort with an example.
6. Explain Join operation algorithms.

1. Nested Loop Join:


Description: For each row in the outer table, the algorithm iterates
through all rows in the inner table to find matching rows based on the
join condition.
Example: If joining a table of employees with a table of departments
based on the department ID, the nested loop join
would iterate through each employee and check for a matching
department in the department table.
2. Block Nested Loop Join:
Description: Similar to the nested loop join, but instead of processing
one row at a time, it processes blocks of rows, reducing the number of
times the inner table is scanned.
Example: If joining large tables, block nested loop join can improve
performance by processing a block of rows at once.
3. Indexed Nested Loop Join:
Description: Utilizes indexes on the join columns to speed up the
join process. The algorithm performs an index lookup for each row
in the outer table to find matching rows in the inner table.
Example: If there's an index on the department ID in the employee
table, the indexed nested loop join would use the index to quickly
locate matching departments.
4. Merge Join:
Description: Requires both input tables to be sorted on the join key. It
then iterates through both tables simultaneously, merging matching
rows.
Example: If joining two tables of ordered timestamps, a merge join
could efficiently combine rows where timestamps match.
5. Hash Join:
Description: Involves creating a hash table on the smaller of the two
tables based on the join key. Then, it probes this hash table for each
row in the larger table to find matches.
Example: If joining a small table of employee IDs with a larger table of
employee details, a hash join would create a hash table on the
employee ID and quickly find matches.

7. Explain cost-based query optimization.


Query optimization is the process of choosing the most efficient or the
most favorable type of executing an SQL statement. Query optimization
is an art of science for applying rules to rewrite the tree of operators that
is invoked in a query and to produce an optimal plan. A plan is said to be
optimal if it returns the answer in the least time or by using the least
space.
Cost-Based Optimization:
For a given query and environment, the Optimizer allocates a cost in
numerical form which is related to each step of a possible plan and then
finds these values together to get a cost estimate for the plan or for the
possible strategy. After calculating the costs of all possible plans, the
Optimizer tries to choose a plan which will have the possible lowest cost
estimate. For that reason, the Optimizer may be sometimes referred to as
the Cost-Based Optimizer. Below are some of the features of the
cost-based optimization-

1. The cost-based optimization is based on the cost of the query that


to be optimized.
2. The query can use a lot of paths based on the value of indexes,
available sorting methods, constraints, etc.
3. The aim of query optimization is to choose the most efficient path of
implementing the query at the possible lowest minimum cost in the
form of an algorithm.
4. The cost of executing the algorithm needs to be provided by the
query Optimizer so that the most suitable query can be selected for
an operation.
5. The cost of an algorithm also depends upon the cardinality of the
input.

Cost Estimation:
To estimate the cost of different available execution plans or the
execution strategies the query tree is viewed and studied as a data
structure that contains a series of basic operation which are linked in
order to perform the query. The cost of the operations that are present in
the query depends on the way in which the operation is selected such
that, the proportion of select operation that forms the output. It is also
important to know the expected cardinality of an operation output. The
cardinality of the output is very important because it forms the input to
the next operation.
The cost of optimization of the query depends upon the following-

Cardinality-
Cardinality is known to be the number of rows that are returned by
performing the operations specified by the query execution plan. The
estimates of the cardinality must be correct as it highly affects all the
possibilities of the execution plan.
Selectivity-
Selectivity refers to the number of rows that are selected. The selectivity
of any row from the table or any table from the database almost depends
upon the condition. The satisfaction of the condition takes us to the
selectivity of that specific row. The condition that is to be satisfied can be
any, depending upon the situation.

Cost-
Cost refers to the amount of money spent on the system to optimize the
system. The measure of cost fully depends upon the work done or the
number of resources used.

Issues In Cost-Based Optimization:


1. In cost-based optimization, the number of execution strategies that
can be considered is not really fixed. The number of execution
strategies may vary based on the situation.
2. Sometimes, this process is really very time-consuming to cost
because it does not always guarantee finding the best optimal
strategy
3. It is an expensive process.
Module 3 : Transactions Management and
Concurrency
____________________________________________________

Q.1 Draw the state transition diagram of transition with transaction states ?

Transaction States in DBMS

States through which a transaction goes during its lifetime. These are the states which tell about the current state of the
Transaction and also tell how we will further do the processing in the transactions. These states govern the rules which
decide the fate of the transaction whether it will commit or abort.

They also use Transaction log. Transaction log is a file maintain by recovery management component to record all the
activities of the transaction. After commit is done transaction log file is removed.

These are different types of Transaction States :

Active State –
When the instructions of the transaction are running then the transaction is in active state. If all the ‘read and write’
operations are performed without any error then it goes to the “partially committed state”; if any instruction fails, it goes
to the “failed state”.
Partially Committed –
After completion of all the read and write operation the changes are made in main memory or local buffer. If the changes
are made permanent on the DataBase then the state will change to “committed state” and in case of failure it will go to
the “failed state”.

Failed State –
When any instruction of the transaction fails, it goes to the “failed state” or if failure occurs in making a permanent
change of data on Data Base.

Aborted State –
After having any type of failure the transaction goes from “failed state” to “aborted state” and since in previous states,
the changes are only made to local buffer or main memory and hence these changes are deleted or rolled-back.

Committed State –
It is the state when the changes are made permanent on the Data Base and the transaction is complete and therefore
terminated in the “terminated state”.

Terminated State –
If there isn’t any roll-back or the transaction comes from the “committed state”, then the system is consistent and ready
for new transaction and the old transaction is terminated.

Q.2 Explain why concurrency control is needed ?

Concurrency control concept comes under the Transaction in database management system (DBMS). It is a procedure
in DBMS which helps us for the management of two simultaneous processes to execute without conflicts between each
other, these conflicts occur in multi user systems.

Concurrency can simply be said to be executing multiple transactions at a time. It is required to increase time efficiency.
If many transactions try to access the same data, then inconsistency arises. Concurrency control required to maintain
consistency data.

For example, if we take ATM machines and do not use concurrency, multiple persons cannot draw money at a time in
different places. This is where we need concurrency.

Advantages

● Waiting time will be decreased.


● Response time will decrease.
● Resource utilization will increase.
● System performance & Efficiency is increased.

Control concurrency

The simultaneous execution of transactions over shared databases can create several data integrity and consistency
problems.

For example, if too many people are logging in the ATM machines, serial updates and synchronization in the bank
servers should happen whenever the transaction is done, if not it gives wrong information and wrong data in the
database.
Main problems in using Concurrency
The problems which arise while using concurrency are as follows −

Updates will be lost − One transaction does some changes and another transaction deletes that change. One
transaction nullifies the updates of another transaction.

Uncommitted Dependency or dirty read problem − On variable has updated in one transaction, at the same time another
transaction has started and deleted the value of the variable there the variable is not getting updated or committed that
has been done on the first transaction this gives us false values or the previous values of the variables this is a major
problem.

Inconsistent retrievals − One transaction is updating multiple different variables, another transaction is in a process to
update those variables, and the problem occurs is inconsistency of the same variable in different instances.

Concurrency control techniques

Locking
Lock guaranties exclusive use of data items to a current transaction. It first accesses the data items by acquiring a lock,
after completion of the transaction it releases the lock.

Types of Locks

The types of locks are as follows −

● Shared Lock [Transaction can read only the data item values]

● Exclusive Lock [Used for both read and write data item values]

Time Stamping
Time stamp is a unique identifier created by DBMS that indicates relative starting time of a transaction. Whatever
transaction we are doing it stores the starting time of the transaction and denotes a specific time.

This can be generated using a system clock or logical counter. This can be started whenever a transaction is started.
Here, the logical counter is incremented after a new timestamp has been assigned.

Optimistic
It is based on the assumption that conflict is rare and it is more efficient to allow transactions to proceed without
imposing delays to ensure serializability.

Q.3 Explain properties of transaction ?

Transaction property
The transaction has the four properties. These are used to maintain consistency in a database, before and after the
transaction.
Atomicity

It states that all operations of the transaction take place at once if not, the transaction is aborted.
There is no midway, i.e., the transaction cannot occur partially. Each transaction is treated as one unit and either run to
completion or is not executed at all.
Atomicity involves the following two operations:

● Abort: If a transaction aborts then all the changes made are not visible.

● Commit: If a transaction commits then all the changes made are visible.

Example: Let's assume that following transaction T consisting of T1 and T2. A consists of Rs 600 and B consists of Rs
300. Transfer Rs 100 from account A to account B.

T1 T2
Read(A)
A:= A-100
Write(A) Read(B)
Y:= Y+100
Write(B)
After completion of the transaction, A consists of Rs 500 and B consists of Rs 400.
If the transaction T fails after the completion of transaction T1 but before completion of transaction T2, then the amount
will be deducted from A but not added to B. This shows the inconsistent database state. In order to ensure correctness
of database state, the transaction must be executed in entirety.

Consistency

The integrity constraints are maintained so that the database is consistent before and after the transaction.
The execution of a transaction will leave a database in either its prior stable state or a new stable state.
The consistent property of database states that every transaction sees a consistent database instance.
The transaction is used to transform the database from one consistent state to another consistent state.

For example: The total amount must be maintained before or after the transaction.

Total before T occurs = 600+300=900


Total after T occurs= 500+400=900
Therefore, the database is consistent. In the case when T1 is completed but T2 fails, then inconsistency will occur.

Isolation

It shows that the data which is used at the time of execution of a transaction cannot be used by the second transaction
until the first one is completed.
In isolation, if the transaction T1 is being executed and using the data item X, then that data item can't be accessed by
any other transaction T2 until the transaction T1 ends.
The concurrency control subsystem of the DBMS enforced the isolation property.

Durability

The durability property is used to indicate the performance of the database's consistent state. It states that the
transaction made the permanent changes.
They cannot be lost by the erroneous operation of a faulty transaction or by the system failure. When a transaction is
completed, then the database reaches a state known as the consistent state. That consistent state cannot be lost, even
in the event of a system's failure.
The recovery subsystem of the DBMS has the responsibility of Durability property.

Q.4 Explain following terms wrt transaction processing

a. Schedules
b. Characterizing schedules based on serialization
c. Characterizing schedules based on recoverability ?

A. Schedule, as the name suggests, is a process of lining the transactions and executing them one by one. When there
are multiple transactions that are running in a concurrent manner and the order of operation is needed to be set so that
the operations do not overlap each other, Scheduling is brought into play and the transactions are timed accordingly.
The basics of Transactions and Schedules is discussed in Concurrency Control (Introduction), and Transaction Isolation
Levels in DBMS articles. Here we will discuss various types of schedules.
B. Serializability is a concept that is used to ensure that the concurrent execution of multiple transactions does not result
in inconsistencies or conflicts in a database management system. In other words, it ensures that the results of
concurrent execution of transactions are the same as if the transactions were executed one at a time in some order.

A schedule is considered to be serializable if it is equivalent to some serial schedule, which is a schedule where all
transactions are executed one at a time. This means that if a schedule is serializable, it does not result in any
inconsistencies or conflicts in the database.

Testing for Serializability


There are several methods for testing if a schedule is serializable, including −

● Conflict serializability − A schedule is conflict serializable if it is equivalent to some serial schedule and does not
contain any conflicting operations.

● View serializability − A schedule is a view serializable if it is equivalent to some serial schedule, but the order of
the transactions may be different.

To check for conflict serializability, we can use the conflict graph method which involves creating a graph where each
transaction is represented by a node and each conflicting operation is represented by an edge. A schedule is
considered to be conflict serializable if there are no cycles in the graph.

To check for view serializability, we can use the view equivalence method which involves comparing the results of a
schedule with the results of a serial schedule. If the results are the same, the schedule is considered to be view
serializable.

C. Recoverability refers to the ability of a system to restore its state in the event of a failure. The recoverability of a
system is directly impacted by the type of schedule that is used.

A serial schedule is considered to be the most recoverable, as there is only one transaction executing at a time, and it is
easy to determine the state of the system at any given point in time.
A parallel schedule is less recoverable than a serial schedule, as it can be more difficult to determine the state of the
system at any given point in time.
A concurrent schedule is the least recoverable, as it can be very difficult to determine the state of the system at any
given point in time.

Q.4 Explain following terms wrt transaction processing

● Concerrency control
● Locking portocols
● Timestamp based protocols
● Deadlock prevention, detection and recovery strategies ?

Concurrency control

Concurrency control concept comes under the Transaction in database management system (DBMS). It is a procedure
in DBMS which helps us for the management of two simultaneous processes to execute without conflicts between each
other, these conflicts occur in multi user systems.
Concurrency can simply be said to be executing multiple transactions at a time. It is required to increase time efficiency.
If many transactions try to access the same data, then inconsistency arises. Concurrency control required to maintain
consistency data.

For example, if we take ATM machines and do not use concurrency, multiple persons cannot draw money at a time in
different places. This is where we need concurrency.

Advantages

● The advantages of concurrency control are as follows −

● Waiting time will be decreased.

● Response time will decrease.

● Resource utilization will increase.

● System performance & Efficiency is increased.

Locking portocols

Lock-Based Protocols -
It is a mechanism in which a transaction cannot read or write data unless the appropriate lock is acquired. This helps in
eliminating the concurrency problem by locking a particular transaction to a particular user. The lock is a variable that
denotes those operations that can be executed on the particular data item.

The various types of lock include :

● Binary lock: It ensures that the data item can be in either locked or unlocked state
● Shared Lock: A shared lock is also called read only lock because you don’t have permission to update data on
the data item. With this lock data item can be easily shared between different transactions. For example, if two
teams are working on employee payment accounts, they would be able to access it but wouldn’t be able to
modify the data on the payment account.
● Exclusive Lock: With exclusive locks, the data items will not be just read but can also be written
● Simplistic Lock Protocol: this lock protocol allows transactions to get lock on every object at the start of
operation. Transactions are able to unlock the data item after completing the write operations
● Pre-claiming locking: This protocol evaluates the operations and builds a list of the necessary data items which
are required to initiate the execution of the transaction. As soon as the locks are acquired, the execution of
transaction takes place. When the operations are over, then all the locks release.
● Starvation: It is the condition where a transaction has to wait for an indefinite period for acquiring a lock.
● Deadlock: It is the condition when two or more processes are waiting for each other to get a resource released
Timestamp based protocols

Timestamp-based protocols in dbms are used to order the transaction in ascending order of their creation time.
The creation time is the system time or a logical counter.

The transaction which is created first or you can say older transactions are given high priority over new
transactions.

For example, if there are two transactions T1 and T2. T1 enters the system at 008 and T2 enters the system at
009 then T1 is given priority over T2.

1 Basic Timestamp Ordering


When a transaction enters a system timestamp is created for the particular transaction. Suppose if the T1
transaction enters the system then it is assigned a timestamp TS(T1) and after T1 if the T2 transaction enters
the system then T2 is assigned a timestamp TS(T2). According to the Timestamp ordering protocol
TS(T1)<TS(T2) because T1 is an older transaction and T2 is a new transaction created after T1.

The timestamp of the protocol determines the serializability order of the transactions. Timestamp ordering
protocol ensures that any conflicting Read or write operation must follow the timestamp ordering protocols.

Suppose any transaction T tries to perform a Read(X) or Write(X) on item X. In that case, the Basic timestamp
ordering algorithm compares the timestamp of Read(X) and Write(X) with R_TS(X) and W_TS(X) and ensures
that the timestamp ordering protocol is not violated.

2 Strict Timestamp Ordering


strict timestamp ordering is a variation of basic timestamp ordering. Strict timestamp ordering ensures that the
transaction is both strict and conflicts serializable. In Strict timestamp ordering a transaction T that issues a
Read_item(X) or Write_item(X) such that TS(T) > W_TS(X) has its read or write operation delayed until the
Transaction T‘ that wrote the values of X has committed or aborted.

Deadlock prevention, detection and recovery strategies ?

Deadlock detection and recovery is the process of detecting and resolving deadlocks in an operating system. A
deadlock occurs when two or more processes are blocked, waiting for each other to release the resources they need.
This can lead to a system-wide stall, where no process can make progress.
There are two main approaches to deadlock detection and recovery:
Prevention: The operating system takes steps to prevent deadlocks from occurring by ensuring that the system is
always in a safe state, where deadlocks cannot occur. This is achieved through resource allocation algorithms such as
the Banker’s Algorithm.
Detection and Recovery: If deadlocks do occur, the operating system must detect and resolve them. Deadlock detection
algorithms, such as the Wait-For Graph, are used to identify deadlocks, and recovery algorithms, such as the Rollback
and Abort algorithm, are used to resolve them. The recovery algorithm releases the resources held by one or more
processes, allowing the system to continue to make progress.
Difference Between Prevention and Detection/Recovery: Prevention aims to avoid deadlocks altogether by carefully
managing resource allocation, while detection and recovery aim to identify and resolve deadlocks that have already
occurred.

Deadlock detection and recovery is an important aspect of operating system design and management, as it affects the
stability and performance of the system. The choice of deadlock detection and recovery approach depends on the
specific requirements of the system and the trade-offs between performance, complexity, and risk tolerance. The
operating system must balance these factors to ensure that deadlocks are effectively detected and resolved.

Deadlock Recovery :
A traditional operating system such as Windows doesn’t deal with deadlock recovery as it is a
time and space-consuming process. Real-time operating systems use Deadlock recovery.
1. Killing the process –

Killing all the processes involved in the deadlock. Killing process one by one. After killing

each process check for deadlock again and keep repeating the process till the system

recovers from deadlock. Killing all the processes one by one helps a system to break

circular wait conditions.

2. Resource Preemption –

Resources are preempted from the processes involved in the deadlock, and preempted

resources are allocated to other processes so that there is a possibility of recovering the

system from the deadlock. In this case, the system goes into starvation.

3. Concurrency Control – Concurrency control mechanisms are used to prevent data

inconsistencies in systems with multiple concurrent processes. These mechanisms

ensure that concurrent processes do not access the same data at the same time, which

can lead to inconsistencies and errors. Deadlocks can occur in concurrent systems when

two or more processes are blocked, waiting for each other to release the resources they

need. This can result in a system-wide stall, where no process can make progress.

Concurrency control mechanisms can help prevent deadlocks by managing access to

shared resources and ensuring that concurrent processes do not interfere with each

other.

Q.5 ARIES recovery algorithm in detail ?

Algorithm for Recovery and Isolation Exploiting Semantics (ARIES) is based on the Write Ahead Log (WAL) protocol.
Every update operation writes a log record which is one of the following :

● Undo-only log record:


Only the before image is logged. Thus, an undo operation can be done to retrieve the old data.
● Redo-only log record:
Only the after image is logged. Thus, a redo operation can be attempted.
● Undo-redo log record:
Both before images and after images are logged.

In it, every log record is assigned a unique and monotonically increasing log sequence number (LSN). Every data page
has a page LSN field that is set to the LSN of the log record corresponding to the last update on the page. WAL requires
that the log record corresponding to an update make it to stable storage before the data page corresponding to that
update is written to disk. For performance reasons, each log write is not immediately forced to disk. A log tail is
maintained in main memory to buffer log writes. The log tail is flushed to disk when it gets full. A transaction cannot be
declared committed until the commit log record makes it to disk.

Once in a while the recovery subsystem writes a checkpoint record to the log. The checkpoint record contains the
transaction table and the dirty page table. A master log record is maintained separately, in stable storage, to store the
LSN of the latest checkpoint record that made it to disk. On restart, the recovery subsystem reads the master log record
to find the checkpoint’s LSN, reads the checkpoint record, and starts recovery from there on.

The recovery process actually consists of 3 phases:

● Analysis:
The recovery subsystem determines the earliest log record from which the next pass must start. It also scans
the log forward from the checkpoint record to construct a snapshot of what the system looked like at the instant
of the crash.
● Redo:
Starting at the earliest LSN, the log is read forward and each update redone.
● Undo:
The log is scanned backward and updates corresponding to loser transactions are undone.

Q.6 Explain Shadow Paging ?

Shadow Paging is recovery technique that is used to recover database. In this recovery technique, database is
considered as made up of fixed size of logical units of storage which are referred as pages. pages are mapped into
physical blocks of storage, with help of the page table which allow one entry for each logical page of database. This
method uses two page tables named current page table and shadow page table. The entries which are present in
current page table are used to point to most recent database pages on disk. Another table i.e., Shadow page table is
used when the transaction starts which is copying current page table. After this, shadow page table gets saved on disk
and current page table is going to be used for transaction. Entries present in current page table may be changed during
execution but in shadow page table it never get changed. After transaction, both tables become identical. This
technique is also known as Cut-of-Place updating.
To understand concept, consider above figure. In this 2 write operations are performed on page 3 and 5. Before start of
write operation on page 3, current page table points to old page 3. When write operation starts following steps are
performed :

● Firstly, search start for available free block in disk blocks.


● After finding free block, it copies page 3 to free block which is represented by Page 3 (New).
● Now current page table points to Page 3 (New) on disk but shadow page table points to old page 3 because it is
not modified.
● The changes are now propagated to Page 3 (New) which is pointed by current page table.

COMMIT Operation : To commit transaction following steps should be done :

● All the modifications which are done by transaction which are present in buffers are transferred to physical
database.
● Output current page table to disk.
● Disk address of current page table output to fixed location which is in stable storage containing address of
shadow page table. This operation overwrites address of old shadow page table. With this current page table
becomes same as shadow page table and transaction is committed.

Advantages :

● This method require fewer disk accesses to perform operation.


● In this method, recovery from crash is inexpensive and quite fast.
● There is no need of operations like- Undo and Redo.
● Recovery using this method will be faster.

Disadvantages :

● Due to location change on disk due to update database it is quite difficult to keep related pages in database
closer on disk.
● During commit operation, changed blocks are going to be pointed by shadow page table which have to be
returned to collection of free blocks otherwise they become accessible.
● The commit of single transaction requires multiple blocks which decreases execution speed.
● To allow this technique to multiple transactions concurrently it is difficult.
● Data fragmentation: The main disadvantage of this technique is the updated Data will suffer from fragmentation
as the data is divided up into pages that may or not be in linear order for large sets of related hence, complex
storage management strategies.

Q.7 Explain recovery concept ?

● Write ahead logging


● In place vs shadow updates
● Rollbacks
● Deferred updates
● Immediate updates

In Advanced Database Management Systems (ADBMS), recovery refers to the process of restoring a database to a
consistent state after a failure or an abnormal termination. Here are explanations for the recovery concepts you
mentioned:

1. **Write-Ahead Logging:**
Write-Ahead Logging (WAL) is a protocol used to ensure database durability and consistency. In WAL, changes (such
as modifications or additions) to the database are first recorded in a log before being applied to the actual database.
This means that before any data is written to the database, a log entry is made. This protocol guarantees that changes
are logged before the corresponding data is updated in the database itself, ensuring that in the event of a system crash,
the system can recover by using the log to redo or undo changes.

2. **In-place vs. Shadow Updates:**


- *In-place Updates:* This method updates the actual database records in their original location. In-place updates
modify the existing data directly, replacing or altering the old values with the new ones.
- *Shadow Updates:* In contrast, shadow updates involve creating and maintaining separate shadow or temporary
copies of the data. Changes are made to the shadow copies, leaving the original data intact until the changes are
committed and applied. Once the changes are verified, the original data is updated with the shadow data.

3. **Rollbacks:**
Rollbacks refer to the process of undoing a set of transactions or changes that have not been committed yet. When a
transaction is rolled back, all its changes are reverted, and the database returns to the state it was in before the
transaction began. This is essential in maintaining the integrity of the database, especially when a transaction fails or
needs to be canceled.

4. **Deferred Updates:**
Deferred updates refer to delaying the application of modifications until a certain point in the transaction. In this
approach, changes made by a transaction are recorded in a separate space or buffer and are only applied to the actual
database when the transaction is committed successfully. This ensures that changes are not visible to other
transactions until the entire transaction is completed.
5. **Immediate Updates:**
Immediate updates, in contrast to deferred updates, directly apply changes to the database as the transaction
progresses. This means that changes become immediately visible and permanent once the database system confirms
the success of each operation within the transaction.

In ADBMS, these recovery concepts and methods are crucial for maintaining the database's consistency, durability, and
integrity, particularly in the event of system failures, crashes, or other unforeseen events. These mechanisms ensure
that data remains in a valid and reliable state despite such occurrences.

Module 4 : Advanced data models


____________________________________________________
Q1 Explain concepts related to temporal databases

Temporal databases are databases that are designed to store and manage data with a temporal aspect, which means
they can record and query data at different points in time. Here's an explanation of various temporal database concepts
in the context of an Active Database Management System (ADBMS):

Tuple Versioning:
Tuple versioning in an ADBMS refers to maintaining multiple versions of the same tuple over time. Each time a tuple is
updated, a new version is created, allowing users to track changes and access historical data. This is particularly useful
for auditing, historical analysis, and regulatory compliance.

Valid Time Databases:


Valid time databases store information about when a fact is true in the real world. It associates a time period with each
tuple, indicating the period during which the data is valid. Queries in valid time databases retrieve data that was true
during a specific time interval, making them suitable for historical data analysis.

Transaction Time Databases:


Transaction time databases record the time when a particular data modification occurred in the database. This
information is essential for auditing and tracking changes made by different transactions. Transaction time databases
help in identifying who made specific changes and when.

Bitemporal Databases:
Bitemporal databases combine both valid time and transaction time information. They track when data was valid in the
real world (valid time) and when it was recorded or modified in the database (transaction time). Bitemporal databases
provide a comprehensive view of data changes and are often used in applications where data correctness and historical
tracking are critical, such as financial systems.

Attribute Versioning:
Attribute versioning is an approach where specific attributes of a tuple can have multiple versions independently of each
other. This means that some attributes may change over time while others remain constant. Attribute versioning is
valuable in scenarios where only certain aspects of an entity evolve or need to be tracked.

Q.2 EXPLAIN TEMPORAL DATABASES?

A temporal database is a database that needs some aspect of time for the organization of information. In the temporal
database, each tuple in relation is associated with time. It stores information about the states of the real world and time.
The temporal database does store information about past states it only stores information about current states.
Whenever the state of the database changes, the information in the database gets updated. In many fields, it is very
necessary to store information about past states. For example, a stock database must store information about past
stock prizes for analysis. Historical information can be stored manually in the schema.

There are various terminologies in the temporal database:

Valid Time: The valid time is a time in which the facts are true with respect to the real world.
Transaction Time: The transaction time of the database is the time at which the fact is currently present in the database.
Decision Time: Decision time in the temporal database is the time at which the decision is made about the fact.
Temporal databases use a relational database for support. But relational databases have some problems in temporal
database, i.e. it does not provide support for complex operations. Query operations also provide poor support for
performing temporal queries.

Applications of Temporal Databases


Finance: It is used to maintain the stock price histories.

It can be used in Factory Monitoring System for storing information about current and past readings of sensors in the
factory.
Healthcare: The histories of the patient need to be maintained for giving the right treatment.
Banking: For maintaining the credit histories of the user.

Types of Temporal Relation


There are mainly three types of temporal relations:
1. Uni-Temporal Relation: The relation which is associated with valid or transaction time is called Uni-Temporal relation.
It is related to only one time.

2. Bi-Temporal Relation: The relation which is associated with both valid time and transaction time is called a
Bi-Temporal relation. Valid time has two parts namely start time and end time, similar in the case of transaction time.

3. Tri-Temporal Relation: The relation which is associated with three aspects of time namely Valid time, Transaction
time, and Decision time called as Tri-Temporal relation.

Features of Temporal Databases


The temporal database provides built-in support for the time dimension.
Temporal database stores data related to the time aspects.
A temporal database contains Historical data instead of current data.
It provides a uniform way to deal with historical data.

Q.3EXPLAIN SPATIAL DATABASES?

Spatial data support in database is important for efficiently storing, indexing and querying of data on the basis of spatial
location. For example, suppose that we want to store a set of polygons in a database and to query the database to find
all polygons that intersect a given polygon. We cannot use standard index structures, such as B-trees or hash indices,
to answer such a query efficiently. Efficient processing of the above query would require special-purpose index
structures, such as R-trees for the task.

Two types of Spatial data are particularly important:

Computer-aided design (CAD)data, which include spatial information about how objects-such as building, cars, or
aircraft-are constructed. Other important examples of computer-aided-design databases are integrated-circuit and
electronic-device layouts.

CAD systems traditionally stored data in memory during editing or other processing, and wrote the data back to a file at
the end of a session of editing. The drawbacks of such a schema include cost(programming complexity, as well as time
cost) of transforming data from one form to another, and the need to read in an entire file even if only parts of it are
required. For large design of an entire airplane, it may be impossible to hold the complete design in memory. Designers
of object oriented database were motivated in large part by the database requirements of CAD systems. Object-oriented
database represent components of design as objects, and the connections between the objects indicate how the design
is structure.

Geographic data such as road maps, land-usage maps, topographic elevation maps, political maps showing
boundaries, land-ownership maps, and so on. Geographical information system are special purpose databases for
storing geographical data. Geographical data are differ from design data in certain ways. Maps and satellite images are
typical examples of geographic data. Maps may provide not only location information associated with locations such as
elevations. Soil type, land type and annual rainfall.

Types of geographical data :

Raster data
Vector data
1.Raster data: Raster data consist of pixels also known as grid cells in two or more dimensions. For example, image of
Satellites , digital pictures, and scanned maps.

2.Vector data: Vector data consist of triangles, lines, and various geometrical objects in two dimensions and cylinders,
cuboids, and other polyhedrons in three dimensions. For example, building boundaries and roads.

Applications of Spatial databases in DBMS :

Microsoft SQL server: Since the 2008 version of Microsoft SQL server supported spatial databases.
CouchDB : This is document-based database in which spatial data is enabled by plugin called GeoCouch.
Neo4j database.

Q.5)WHAT IS SPATIAL DATATYPE AND MODEL?


Spatial data is the data collected through with physical real life locations like towns, cities, islands etc. Spatial data are
basically of three different types and are wisely used in commercial sectors :

Map data : Map data includes different types of spatial features of objects in map, e.g – an object’s shape and location
of object within map. The three basic types of features are points, lines, and polygons (or areas).
Points – Points are used to represent spatial characteristics of objects whose locations correspond to single 2-D
coordinates (x, y, or longitude/latitude) in the scale of particular application. For examples : Buildings, cellular towers, or
stationary vehicles. Moving vehicles and other moving objects can be represented by sequence of point locations that
change over time.
Lines – Lines represent objects having length, such as roads or rivers, whose spatial characteristics can be
approximated by sequence of connected lines.
Polygons – Polygons are used to represent characteristics of objects that have boundary, like states, lakes, or countries.
Attribute data : It is the descriptive data that Geographic Information Systems associate with features in the map. For
example, in map representing countries within an Indian state (ex – Odisha or Mumbai). Attributes- Population, largest
city/town, area in square miles, and so on.
Image data : It includes camera created data like satellite images and aerial photographs. Objects of interest, such as
buildings and roads, can be identified and overlaid on these images. Aerial and satellite images are typical examples of
raster data.
Models of Spatial Information : It is divided into two categories :

Field : These models are used to model spatial data that is continuous in nature, e.g. terrain elevation, air quality index,
temperature data, and soil variation characteristics.
Object : These models have been used for applications such as transportation networks, land parcels, buildings, and
other objects that possess both spatial and non-spatial attributes. A spatial application is modeled using either field or
an object based model, which depends on the requirements and the traditional choice of model for the application.

Q.6)
Multimedia databases deal with the storage, retrieval, and management of multimedia data, which can include text,
images, audio, video, and other types of multimedia content. Let's discuss how SQLite and peer-to-peer mobile
databases relate to multimedia databases:

1) SQLite:
SQLite is a widely used relational database management system (RDBMS) that is particularly suitable for embedded
systems, mobile devices, and applications that require a lightweight and self-contained database. It can also be used in
multimedia database applications. Here's how SQLite is relevant to multimedia databases:

- Storage of Metadata: SQLite can be used to store metadata about multimedia files. For example, in a multimedia
database, you can create tables to store information about each multimedia object, such as the title, author, date, file
location, and descriptions. SQLite's structured storage allows for efficient organization and retrieval of metadata.

- Indexing and Search: SQLite supports indexing, which can significantly enhance the performance of multimedia
database queries. You can create indexes on specific attributes (e.g., title, author) to speed up search operations,
allowing users to find multimedia content quickly.

- Query and Retrieval: SQLite provides a powerful SQL query language for searching, filtering, and retrieving
multimedia content based on various criteria. You can use SQL statements to retrieve multimedia files matching certain
conditions, making it suitable for building multimedia database applications.
2) Peer-to-Peer Mobile Database:
A peer-to-peer (P2P) mobile database is a database system designed to operate in a decentralized, peer-to-peer
network, where mobile devices communicate and share data directly with each other. In the context of multimedia
databases, a peer-to-peer mobile database can offer unique advantages:

- Distributed Multimedia Sharing: Peer-to-peer mobile databases can enable users to share multimedia content
directly from their mobile devices to others in the network. For example, users can share photos, videos, or audio files
without relying on a central server.

- Offline Collaboration: In scenarios where network connectivity may be intermittent or unavailable, P2P mobile
databases can allow users to collaborate and share multimedia content even when not connected to the internet. Data
synchronization can occur when devices come into contact with each other.

- Decentralized Data Management: In multimedia databases, users may have multimedia content on their mobile
devices that they want to share or collaborate on. A P2P mobile database allows for distributed and decentralized data
management, ensuring that each user's device acts as a node in the network.

- Redundancy and Resilience: P2P mobile databases can provide redundancy and resilience for multimedia content. If
one device goes offline, others can still access and share the content, making it useful for multimedia applications
requiring data availability and fault tolerance.

Overall, both SQLite and peer-to-peer mobile databases have their respective roles in multimedia database
applications. SQLite is suitable for structured storage and query capabilities, while peer-to-peer mobile databases offer
decentralized, collaborative, and offline-capable solutions for sharing and managing multimedia content among mobile
devices.

Q.7)EXPLAIN MOBILE DATABASES?


A Mobile database is a database that can be connected to a mobile computing device over a mobile network (or
wireless network). Here the client and the server have wireless connections. In today’s world, mobile computing is
growing very rapidly, and it is huge potential in the field of the database. It will be applicable on different-different
devices like android based mobile databases, iOS based mobile databases, etc. Common examples of databases are
Couch base Lite, Object Box, etc.

Features of Mobile database :


Here, we will discuss the features of the mobile database as follows.

A cache is maintained to hold frequent and transactions so that they are not lost due to connection failure.
As the use of laptops, mobile and PDAs is increasing to reside in the mobile system.
Mobile databases are physically separate from the central database server.
Mobile databases resided on mobile devices.
Mobile databases are capable of communicating with a central database server or other mobile clients from remote
sites.
With the help of a mobile database, mobile users must be able to work without a wireless connection due to poor or
even non-existent connections (disconnected).
A mobile database is used to analyze and manipulate data on mobile devices.
Mobile Database typically involves three parties :
Fixed Hosts –
It performs the transactions and data management functions with the help of database servers.

Mobiles Units –
These are portable computers that move around a geographical region that includes the cellular network that these
units use to communicate to base stations.

Base Stations –
These are two-way radios installation in fixed locations, that pass communication with the mobile units to and from the
fixed hosts.
Limitations :
Here, we will discuss the limitation of mobile databases as follows.

It has Limited wireless bandwidth.


In the mobile database, Wireless communication speed.
It required Unlimited battery power to access.
It is Less secured.
It is Hard to make theft-proof.

Module 5 : Distributed Databases


____________________________________________________

Q.EXPLAIN ADVANTAGES OF DISTRIBUTED DATA BASES?


Distributed databases basically provide us the advantages of distributed computing to the database management
domain. Basically, we can define a Distributed database as a collection of multiple interrelated databases distributed
over a computer network and a distributed database management system as a software system that basically manages
a distributed database while making the distribution transparent to the user.

Distributed database management basically proposed for the various reason from organizational decentralization and
economical processing to greater autonomy. Some of these advantages are as follows:

1. Management of data with different level of transparency –


Ideally, a database should be distribution transparent in the sense of hiding the details of where each file is physically
stored within the system. The following types of transparencies are basically possible in the distributed database
system:

Network transparency:
This basically refers to the freedom for the user from the operational details of the network. These are of two types
Location and naming transparency.
Replication transparencies:
It basically made user unaware of the existence of copies as we know that copies of data may be stored at multiple sites
for better availability performance and reliability.
Fragmentation transparency:
It basically made user unaware about the existence of fragments it may be the vertical fragment or horizontal
fragmentation.
2. Increased Reliability and availability –
Reliability is basically defined as the probability that a system is running at a certain time whereas Availability is defined
as the probability that the system is continuously available during a time interval. When the data and DBMS software
are distributed over several sites one site may fail while other sites continue to operate and we are not able to only
access the data that exist at the failed site and this basically leads to improvement in reliability and availability.

3. Easier Expansion –
In a distributed environment expansion of the system in terms of adding more data, increasing database sizes or adding
more processor is much easier.

4. Improved Performance –
We can achieve interquery and intraquery parallelism by executing multiple queries at different sites by breaking up a
query into a number of subqueries that basically executes in parallel which basically leads to improvement in
performance.
Improved scalability: Distributed databases can be scaled horizontally by adding more nodes to the network. This allows
for increased capacity and performance as data and user demand grow.
Increased availability: Distributed databases can provide increased availability and uptime by distributing the data
across multiple nodes. If one node goes down, the data can still be accessed from other nodes in the network.
Increased flexibility: Distributed databases can be more flexible than centralized databases, allowing data to be stored
in a way that best suits the needs of the application or user.
Improved fault tolerance: Distributed databases can be designed with redundancy and failover mechanisms that allow
the system to continue operating in the event of a node failure.
Improved security: Distributed databases can be more secure than centralized databases by implementing security
measures at the network, node, and application levels.

Q.2)TYPES OF DISTRIBUTED DATABASES?

Types of Distributed Databases


Distributed databases can be broadly classified into homogeneous and heterogeneous distributed database
environments, each with further sub-divisions, as shown in the following illustration.
Distributed Database Environments
Homogeneous Distributed Databases
In a homogeneous distributed database, all the sites use identical DBMS and operating systems. Its properties are −

● The sites use very similar software.



● The sites use identical DBMS or DBMS from the same vendor.

● Each site is aware of all other sites and cooperates with other sites to process user requests.

● The database is accessed through a single interface as if it is a single database.

Types of Homogeneous Distributed Database
There are two types of homogeneous distributed database −

1)Autonomous − Each database is independent that functions on its own. They are integrated by a controlling
application and use message passing to share data updates.

2)Non-autonomous − Data is distributed across the homogeneous nodes and a central or master DBMS co-ordinates
data updates across the sites.

Heterogeneous Distributed Databases


In a heterogeneous distributed database, different sites have different operating systems, DBMS products and data
models. Its properties are −

1)Different sites use dissimilar schemas and software.

2)The system may be composed of a variety of DBMSs like relational, network, hierarchical or object oriented.

3)Query processing is complex due to dissimilar schemas.

4)Transaction processing is complex due to dissimilar software.


5)A site may not be aware of other sites and so there is limited co-operation in processing user requests.

Types of Heterogeneous Distributed Databases


1)Federated − The heterogeneous database systems are independent in nature and integrated together so that they
function as a single database system.

2)Un-federated − The database systems employ a central coordinating module through which the databases are
accessed.

Q.3)EXPLAIN DISTRIBUTED DATABASE ARCHITECTURE?

A Distributed Database System is a kind of database that is present or divided in more than one location, which means
it is not limited to any single computer system. It is divided over the network of various systems. The Distributed
Database System is physically present on the different systems in different locations. This can be necessary when
different users from all over the world need to access a specific database. For a user, it should be handled in such a
way that it seems like a single database.

Client−Server Architecture
A common method for spreading database functionality is the client−server architecture. Clients communicate with a
central server, which controls the distributed database system, in this design. The server is in charge of maintaining
data storage, controlling access, and organizing transactions. This architecture has several clients and servers
connected. A client sends a query and the server which is available at the earliest would help solve it. This Architecture
is simple to execute because of the centralised server system.

Peer−to−Peer Architecture
Each node in the distributed database system may function as both a client and a server in a peer−to−peer architecture.
Each node is linked to the others and works together to process and store data. Each node is in charge of managing its
data management and organizing node−to−node interactions. Because the loss of a single node does not cause the
system to collapse, peer−to−peer systems provide decentralized control and high fault tolerance. This design is ideal for
distributed systems with nodes that can function independently and with equal capabilities.
Federated
Architecture
Multiple independent databases with various types are combined into a single meta−database using a federated
database design. It offers a uniform interface for navigating and exploring distributed data. In the federated design, each
site maintains a separate, independent database, while the virtual database manager internally distributes requests.
When working with several data sources or legacy systems that can't be simply updated, federated architectures are
helpful.

Shared−Nothing
Architecture
Data is divided up and spread among several nodes in a shared−nothing architecture, with each node in charge of a
particular portion of the data. Resources are not shared across nodes, and each node runs independently. Due to the
system's capacity to add additional nodes as needed without affecting the current nodes, this design offers great
scalability and fault tolerance. Large−scale distributed systems, such as data warehouses or big data analytics
platforms, frequently employ shared−nothing designs.
Q.4)EXPLAIN WITH RESPECT TO DISTRIBUTED DATABASES

1)
Distributed databases (DDBs) are databases that are distributed across multiple interconnected computers or nodes,
and they offer several advantages, such as improved data availability and fault tolerance. However, they also introduce
a set of unique design and operational challenges. Let's discuss the key issues related to distributed databases:

DDB Design Issues:


Designing a distributed database involves making several critical decisions, such as data distribution, partitioning, and
replication strategies. Here are some design issues to consider:
a) Data Distribution: How the data should be distributed across multiple nodes to ensure balanced workloads and
minimize data transfer overhead.
b) Data Partitioning: How to divide the database into smaller subsets (partitions) to enable efficient parallel processing.
c) Replication: Deciding which data should be replicated across multiple nodes for redundancy and fault tolerance.
d) Data Integrity: Ensuring data consistency, integrity, and security in a distributed environment.
e) Scalability: Designing the database to be able to handle increasing data volumes and user loads.

Fragmentation, Replication, and Allocation:


a) Fragmentation: Data fragmentation involves dividing a database into smaller, manageable pieces or fragments. There
are horizontal, vertical, and hybrid fragmentation strategies. Horizontal fragmentation divides tables into rows, while
vertical fragmentation divides tables into columns. Fragmentation helps distribute data across multiple nodes efficiently.
b) Replication: Replication involves maintaining multiple copies of the same data on different nodes. Replication
improves fault tolerance, data availability, and load balancing.
c) Allocation: Allocation refers to the process of deciding which fragments or replicas should be stored on which nodes.
Proper allocation ensures that data is evenly distributed and easily accessible.

Query Processing:
In distributed databases, query processing involves the distribution of queries across multiple nodes and the
coordination of their results. Key considerations include:
a) Query Optimization: Optimizing query execution plans to minimize data transfer and maximize parallel processing.
b) Distributed Query Execution: Coordinating the execution of subqueries on different nodes and merging the results.
c) Data Localization: Exploiting data locality to reduce data transfer costs.

Transaction Management:
Transaction management in distributed databases ensures that multiple operations on the database maintain the ACID
(Atomicity, Consistency, Isolation, Durability) properties. Key aspects include:
a) Two-Phase Commit (2PC): Ensuring that distributed transactions are committed or aborted consistently across all
participating nodes.
b) Distributed Lock Management: Coordinating access to shared resources while maintaining isolation and avoiding
deadlocks.
c) Recovery and Logging: Managing logs to recover the database to a consistent state in case of failures.

Concurrency Control and Recovery:


a) Concurrency Control: Managing concurrent access to data to avoid conflicts and maintain data consistency.
Techniques like locking, timestamps, and optimistic concurrency control are used.
b) Recovery: Ensuring that the database can recover from failures, such as node crashes or network problems, without
losing data or violating the ACID properties. Recovery protocols like ARIES are often employed.

Module 6 : Access Control and Data Security


____________________________________________________
1. Explain Database security mechanisms in detail
Database Security means keeping sensitive information safe and prevent the
loss of data. Security of data base is controlled by Database Administrator
(DBA).

The following are the main control measures are used to provide security of
data in databases:

1. Authentication
2. Access control
3. Inference control
4. Flow control
5. Database Security applying Statistical Method
6. Encryption
These are explained as following below.

Authentication :
Authentication is the process of confirmation that whether the user log in only
according to the rights provided to him to perform the activities of data base. A
particular user can login only up to his privilege but he can’t access the other
sensitive data. The privilege of accessing sensitive data is restricted by using
Authentication.
By using these authentication tools for biometrics such as retina and figure
prints can prevent the data base from unauthorized/malicious users.
Access Control :
The security mechanism of DBMS must include some provisions for restricting
access to the data base by unauthorized users. Access control is done by
creating user accounts and to control login process by the DBMS. So, that
database access of sensitive data is possible only to those people (database
users) who are allowed to access such data and to restrict access to
unauthorized persons.
The database system must also keep the track of all operations performed by
certain user throughout the entire login time.
Inference Control :
This method is known as the countermeasures to statistical database security
problem. It is used to prevent the user from completing any inference channel.
This method protect sensitive information from indirect disclosure.
Inferences are of two types, identity disclosure or attribute disclosure.
Flow Control :
This prevents information from flowing in a way that it reaches unauthorized
users. Channels are the pathways for information to flow implicitly in ways that
violate the privacy policy of a company are called convert channels.
Database Security applying Statistical Method :
Statistical database security focuses on the protection of confidential individual
values stored in and used for statistical purposes and used to retrieve the
summaries of values based on categories. They do not permit to retrieve the
individual information.
This allows to access the database to get statistical information about the
number of employees in the company but not to access the detailed
confidential/personal information about the specific individual employee.
Encryption :
This method is mainly used to protect sensitive data (such as credit card
numbers, OTP numbers) and other sensitive numbers. The data is encoded using
some encoding algorithms.
An unauthorized user who tries to access this encoded data will face difficulty in
decoding it, but authorized users are given decoding keys to decode data.

2. Explain Discretionary Access control (DAC)

Discretionary access control (DAC) is a type of security access control that grants or restricts object access
via an access policy determined by an object’s owner group and/or subjects. DAC mechanism controls are
defined by user identification with supplied credentials during authentication, such as username and
password. DACs are discretionary because the subject (owner) can transfer authenticated objects or
information access to other users. In other words, the owner determines object access privileges.

In DAC, each system object (file or data object) has an owner, and each initial object owner is the subject that

causes its creation. Thus, an object’s access policy is determined by its owner.
A typical example of DAC is Unix file mode, which defines the read, write and execute permissions in each of

the three bits for each user, group and others.

DAC attributes include:

● User may transfer object ownership to another user(s).

● User may determine the access type of other users.

● After several attempts, authorization failures restrict user access.

● Unauthorized users are blind to object characteristics, such as file size, file name and directory path.

● Object access is determined during access control list (ACL) authorization and based on user

identification and/or group membership.

DAC is easy to implement and intuitive but has certain disadvantages, including:

● Inherent vulnerabilities (Trojan horse)

● ACL maintenance or capability

● Grant and revoke permissions maintenance

● Limited negative authorization power

3. Explain W.R.T. Access control and data security

**a. Mandatory Access Control (MAC):**


Mandatory Access Control (MAC) is a security model that plays a crucial
role in access control and data security. In this model, access permissions
are determined by a central authority based on predefined security labels
and the sensitivity of data. MAC enforces strict access controls, ensuring
that only authorized entities can access specific resources or
information. This model is particularly effective in environments where
there are stringent confidentiality requirements. By adhering to
predetermined security policies, MAC contributes significantly to
preventing unauthorized access and maintaining data security.

**b. Role-based Access Control (RBAC):**


Role-based Access Control (RBAC) is a security model that simplifies
access management and enhances data security. In RBAC, access
permissions are associated with roles, and users are assigned roles based
on their responsibilities within an organization. This approach
streamlines the assignment of permissions by linking them to specific
roles, making it easier to manage and control access rights. RBAC is
instrumental in preventing unauthorized access and ensuring data
security by aligning permissions with organizational roles.

**c. SQL Injection:**


SQL Injection poses a significant threat to access control and data
security in the realm of database management. It is an attack method
where malicious SQL code is injected into input fields or parameters,
exploiting vulnerabilities in an application's handling of user inputs. If left
unchecked, SQL Injection can compromise the confidentiality, integrity,
and availability of a database. To mitigate this risk, robust input
validation, parameterized queries, and prepared statements are essential.
These measures help prevent unauthorized access and ensure the
security of sensitive database information.

**d. Introduction to Statistical Database Security:**


Statistical Database Security addresses the challenge of protecting
sensitive statistical information during data analysis. While releasing
aggregated data is essential for various purposes, preserving individual
data confidentiality is equally crucial. Techniques such as differential
privacy and secure multi-party computation are integral components of
statistical database security. These methods allow organizations to
provide useful aggregate information while safeguarding the privacy and
security of individual records, ensuring a balance between data utility and
confidentiality.

**e. Introduction to Flow Control:**


Flow control, within the context of access control and data security,
focuses on managing the movement of data within a system or network.
Effective flow control mechanisms, including firewalls, intrusion
detection systems, and encryption, play a vital role in preventing
unauthorized access and data breaches. By regulating the transmission of
data, these measures contribute to maintaining the integrity and
confidentiality of information during its transfer across networks. Flow
control is a fundamental aspect of overall data security, ensuring that
data is transmitted securely and only accessible by authorized entities.

You might also like