DBMS Que
DBMS Que
QUE. WHAT IS DATA INDEPENDENCE ? Difference between Physical and Logical Data Independence
Data independence is the ability to modify the scheme without affecting the programs and the application
to be rewritten. Data is separated from the programs, so that the changes made to the data will not affect
the program execution and the application.
We know the main purpose of the three levels of data abstraction is to achieve data independence. If the
database changes and expands over time, it is very important that the changes in one level should not
affect the data at other levels of the database. This would save time and cost required when changing the
database.
Difference Between Physical and Logical Data Independence
It primarily concerns the manner in which data is The structure of the evolving data definition was the
saved within the system. key topic.
The main focus of logical data independence is on Logical data independence is mostly interested in data
altering the definition or structure of the data. storage.
In most cases, a change at the physical level does not If new fields are added to or removed from the
necessitate a change at the application program level. database, changes must be made in the application
software.
It is mostly focused on data storage. It is mostly interested in modifying the data definition
or structure.
It may or may not be necessary to make adjustments Anytime the logical structures of the database are
at the internal levels to enhance the structure's modified, there must be significant modifications at the
performance. logical levels.
6) Integrity constraints: Data stored in databases must satisfy integrity constraints. For example, Consider a
database schema consisting of the various educational programs offered by a university such
as(B.Tech/M.Tech/B.Sc/M.Sc/BCA/MCA) etc. Then we have a schema of students enrolled in these programs. A
DBMS ensures that it is only out of one of the programs offered schema , that the student is enrolled in, i.e. Not
anything out of the blue. Hence, database integrity is preserved.
Apart from the above mentioned features a database management also provides the following:
• Multiple User Interface
• Data scalability, expandability and flexibility: We can change schema of the database, all schema
will be updated according to it.
• Overall the time for developing an application is reduced.
• Security: Simplifies data storage as it is possible to assign security permissions allowing restricted
access to data.
QUE4. WHAT IS DATA MODEL ? DIFFERENTTYPES OF DATA MODELS :
Data models define how the logical structure of a database is modeled. Data Models are fundamental entities to
introduce abstraction in a DBMS. Data models define how data is connected to each other and how they are
processed and stored inside the system.
The very first data model could be flat data-models, where all the data used are to be kept in the same plane.
Earlier data models were not so scientific, hence they were prone to introduce lots of duplication and update
anomalies.
1) Relational Data Model: This type of model designs the data in the form of rows and columns within a
table. Thus, a relational model uses tables for representing data and in-between relationships. Tables are also
called relations. This model was initially described by Edgar F. Codd, in 1969. The relational data model is the
widely used model which is primarily used by commercial data processing applications.
2) Entity-Relationship Data Model: An ER model is the logical representation of data as objects and relationships
among them. These objects are known as entities, and relationship is an association among these entities. This
model was designed by Peter Chen and published in 1976 papers. It was widely used in database designing. A set
of attributes describe the entities. For example, student_name, student_id describes the 'student' entity. A set of
the same type of entities is known as an 'Entity set', and the set of the same type of relationships is known as
'relationship set'.
3) Object-based Data Model: An extension of the ER model with notions of functions, encapsulation, and object
identity, as well. This model supports a rich type system that includes structured and collection types. Thus, in
1980s, various database systems following the object-oriented approach were developed. Here, the objects are
nothing but the data carrying its properties.
4) Semistructured Data Model: This type of data model is different from the other three data models (explained
above). The semistructured data model allows the data specifications at places where the individual data items
of the same type may have different attributes sets. The Extensible Markup Language, also known as XML, is
widely used for representing the semistructured data. Although XML was initially designed for including the
markup information to the text document, it gains importance because of its application in the exchange of data.
QUE5. WHAT IS VIEW?
Writing Complex queries and Securing Database access is very challenging for any Database Developer and
Database Administrator. Sometimes SQL queries get very complicated by joins, Group By clauses, and other
referential dependencies, So those Types of queries can be simplified to proxy data or virtual data which
simplifies the queries.
Views act as a proxy or virtual table created from the original table. Views simplify SQL queries and allow
secure access to underlying tables. Views in DBMS can be visualized as virtual tables that are formed by original
tables from the database.
HOW VIEW CAN BE CREATED : The following statement defines the syntax of a view:
4
CANDIDATE KEY in SQL is a set of attributes that uniquely identify tuples in a table. Candidate Key is a super key
with no repeated attributes. The Primary key should be selected from the candidate keys. Every table must have
at least a single candidate key. A table can have multiple candidate keys but only a single primary key.
Candidate key Example: In the given table Stud ID, Roll No, and email are candidate keys which help us to
uniquely identify the student record in the table.
StudID Roll No First Name LastName Email
1 11 Tom Price [email protected]
2 12 Nick Wright [email protected]
3 13 Dana Natan [email protected]
A foreign key is a column or group of columns in a relational database table that provides a link between data in
two tables. It acts as a cross-reference between tables because it references the primary key of another table,
thereby establishing a link between them.
The majority of tables in a relational database system adhere to the foreign key concept. In complex databases
and data warehouses, data in a domain must be added across multiple tables, thus maintaining a relationship
between them. The concept of referential integrity is derived from foreign key theory.
QUE.7 DEFINE EXTERNAL SORT. EXTERNAL SORT MERGE ALGORITHM :
An external sorting algorithm is an algorithm that can handle massive amounts of information. Users utilize it
when the data that needs sorting doesn’t fit into a computer’s primary memory (usually the random access
memory [RAM]). In such a case, you must place the information in an external memory device (usually a hard
disk drive [HDD]). External sorting algorithms typically use a hybrid sort-merge strategy, which allows a computer
to sort data into chunks small enough to fit in the RAM. Each chunk is read, sorted, and written out to a
temporary file. Once the entire mass gets sorted, the outputs get merged to form a single larger file.
External merge sort
6
The external merge sort is a technique in which the data is stored in intermediate files and then each
intermediate files are sorted independently and then combined or merged to get a sorted data.
For example: Let us consider there are 10,000 records which have to be sorted. For this, we need to apply the
external merge sort method. Suppose the main memory has a capacity to store 500 records in a block, with
having each block size of 100 records.
• First, the operating system is asked to make sure that all pages of the new copy of the database have been
written out to disk. (Unix systems use the flush command for this purpose.)
• After the operating system has written all the pages to disk, the database system updates the pointer db-
pointer to point to the new copy of the database; the new copy then becomes the current copy of the
database. The old copy of the database is then deleted.
Figure below depicts the scheme, showing the database state before and after the update.
The transaction is said to have been committed at the point where the updated db pointer is written to disk
Que. 10 explain view serializability AND Blind write :
DBMS View Serializability is a method to discover that a specified schedule is either view serializable or not. To
prove whether a specified schedule is view serializable, the user involves testing whether the agreed schedule is
View Equivalent to its serial schedule. Since there is no synchronized transactions execution, we can confirm that
a serial schedule will certainly not leave the database unpredictable. Conversely, the database can be left in an
inconsistent state in a non-serial schedule because there exist multiple transactions executing concurrently in the
database server. By testing that a specified non-serial schedule is view serializable, we need to ensure that it
holds a consistent schedule.
Blind write : In computing, a blind write occurs when a transaction writes a value without reading it.
Any view serializable schedule that is not conflict serializable must contain a blind write. blind write
is simply when a transaction writes without reading. i.e a transaction have WRITE(Q), but no
READ(Q) before it. So, the transaction is writing to the database "blindly" without reading previous
value.
Blind write : Performing the Writing operation (updation), without reading operation, such write operation is
known as a blind write. If no blind write exists, then the schedule must be a non-View-Serializable schedule. Stop
and submit your final answer.
Que.Explain ACID properties of transaction :
ACID is an acronym that stands for atomicity, consistency, isolation, and durability.
Together, these ACID properties ensure that a set of database operations (grouped together in a transaction)
leave the database in a valid state even in the event of unexpected errors.
Atomicity :
Atomicity guarantees that all of the commands that make up a transaction are treated as a single unit and either
succeed or fail together. This is important as in the case of an unwanted event, like a crash or power outage, we
can be sure of the state of the database. The transaction would have either completed successfully or been
rollbacked if any part of the transaction failed.
If we continue with the above example, money is deducted from the source and if any anomaly occurs, the
changes are discarded and the transaction fails.
Consistency :
Consistency guarantees that changes made within a transaction are consistent with database constraints. This
includes all rules, constraints, and triggers. If the data gets into an illegal state, the whole transaction fails.
Going back to the money transfer example, let’s say there is a constraint that the balance should be a positive
integer. If we try to overdraw money, then the balance won’t meet the constraint. Because of that, the
consistency of the ACID transaction will be violated and the transaction will fail.
Isolation :
Isolation ensures that all transactions run in an isolated environment. That enables running transactions
concurrently because transactions don’t interfere with each other.
9
For example, let’s say that our account balance is $200. Two transactions for a $100 withdrawal start at the same
time. The transactions run in isolation which guarantees that when they both complete, we’ll have a balance of
$0 instead of $100.
Durability :
Durability guarantees that once the transaction completes and changes are written to the database, they are
persisted. This ensures that data within the system will persist even in the case of system failures like crashes or
power outages.
The ACID characteristics of transactions are what allow developers to perform complex, coordinated updates and
sleep well at night knowing that their data is consistent and safely stored.
• Example : Atomicity: Money needs to both be removed from one account and added to the other, or the
transaction will be aborted. Removing money from one account without adding it to another would leave
the data in an inconsistent state.
• Consistency: Consider a database constraint that an account balance cannot drop below zero dollars. All
updates to an account balance inside of a transaction must leave the account with a valid, non-negative
balance, or the transaction should be aborted.
• Isolation: Consider two concurrent requests to transfer money from the same bank account. The final
result of running the transfer requests concurrently should be the same as running the transfer requests
sequentially.
• Durability: Consider a power failure immediately after a database has confirmed that money has been
transferred from one bank account to another. The database should still hold the updated information
even though there was an unexpected failure.
Que.11 Explain deadlock
A Deadlock in DBMS can be termed as the undesirable condition which appears when a process waits for a
resource indefinitely whereas this resource is detained by another process. In order to understand the deadlock
concept better, let us consider a transaction T1 which has a
lock on a few rows in the table Employee and it requires to
update some rows in another table Salary. Also, there exists
another transaction T2 that has a lock on the table Salary
and it also requires updating a few rows in the Employee
table which already is held by the transaction T1. In this
situation both the transactions wait for each other to
release the lock and the processes end up waiting for each
other to release the resources. As a result of the above
scenario, none of the tasks gets completed and this is
known as deadlock.
Deadlock Detection :
1. If resources have a single instance –
In this case for Deadlock detection, we can run an algorithm to check for the cycle in the Resource Allocation
Graph. The presence of a cycle in the graph is a sufficient condition for deadlock.
In the above diagram, resource 1 and resource 2 have single instances. There is a cycle R1 → P1 → R2 →
P2. So, Deadlock is Confirmed. 2. If there are multiple instances of resources –
Detection of the cycle is necessary but not sufficient condition for deadlock detection, in this case, the system
may or may not be in deadlock varies according to different situations.
Deadlock Recovery :
A traditional operating system such as Windows doesn’t deal with deadlock recovery as it is a time and space-
consuming process. Real-time operating systems use Deadlock recovery.
10
• This protocol is useful and gives greater degree of concurrency if probability of conflicts is low. That is
because the serializability order is not pre-decided and relatively less transactions will have to be rolled
back.
• Validation Test for Transaction Tj
i. If for all Ti with TS (Ti ) < TS (Tj ) either one of the following condition
holds:
o finish(Ti ) < start(Tj )
o start(Tj ) < finish(Ti ) < validation(Tj ) and the set of data items
written by Ti does not intersect with the set of data items
read by Tj.
o Then validation succeeds and Tj can be committed.
Otherwise, validation fails and Tj is aborted.
ii. Justification: Either first condition is satisfied, and there is no overlapped
execution, or second condition is satisfied and
o The writes of Tj do not affect reads of Ti since they occur after
Ti has finished its reads.
o the writes of Ti do not affect reads of Tj since Tj does not read any item written by Ti
iii. Schedule Produced by Validation Example of schedule produced using validation
Lock-based Protocols
Database systems equipped with lock-based protocols use a mechanism by which any transaction cannot read or
write data until it acquires an appropriate lock on it.
Locks are of two kinds:
• Binary Locks − A lock on a data item can be in two states; it is either locked or unlocked.
• Shared/exclusive − This type of locking mechanism differentiates the locks based on their uses. If a lock is
acquired on a data item to perform a write operation, it is an exclusive lock. Allowing more than one
transaction to write on the same data item would lead the database into an inconsistent state. Read locks
are shared because no data value is being changed.
There are four types of lock protocols available:
i. Simplistic Lock Protocol
Simplistic lock-based protocols allow transactions to obtain a lock on every object before a 'write' operation is
performed. Transactions may unlock the data item after completing the ‘write’ operation.
ii. Pre-claiming Lock Protocol
Pre-claiming protocols evaluate their operations and create a list of
data items on which they need locks. Before initiating an execution,
the transaction requests the system for all the locks it needs
beforehand. If all the locks are granted, the transaction executes
and releases all the locks when all its operations are over. If all the
locks are not granted, the transaction rolls back and waits until all the locks are granted.
iii. Two-Phase Locking 2PL
12
X 10
Now let the transaction T1 (having timestamp 15) call a read operation to read the newly created value X, then
the newly created variable contained will be
X 10 15
2. Media Failure
This one is very risky! Media failures are caused by a head crash or unreadable media. These types of data
failures are considered one of the most serious because it is possible for entire data loss. Media failures usually
leave database systems unavailable for several hours until recovery is complete, especially in applications with
large devices and high transaction volume.
The best way to prevent this type of database failure is to protect your data with adequate malware protection
and backing up your data frequently.
3. Application Software Errors
When the resource limit is exceeded, bad input, logical or internal errors occur, or any other factors related to
the application software is compromised, transactions can fail giving way to database failure.
It is generally recommended that application software errors are minimized in the software code during
conception and the software engineering process. It is better for developers to put mechanisms and controls into
place during the design of the architecture and coding operation, than trying to remedy mistakes later. Asides
from failures, malicious code may exploit known vulnerabilities, especially those associated with a particular
programming software tool or known human software coding error.
Qee.17 explain set operations with example in dbms
SQL set operators are used to combine the results obtained from two or more queries into a single result. The
queries which contain two or more subqueries are known as compounded queries.
There are four major types of SQL operators, namely:
• Union Union all Intersect Minus
Here is an abstract table for whatever we will be learning in this article.
Union All Combines all results of two or more SELECT statements, including
duplicates.
Intersect Returns only the common records obtained from two or more SELECT
statements.
Minus Returns only those records which are exclusive to the first table.
SELECT column_name
FROM table_name_2
SET OPERATOR
SELECT column_name
FROM table_name_3.
. . .
Parameters:
The different parameters used in the syntax are :
• SET OPERATOR: Mention the type of set operation you want to perform from { Union, Union all, Intersect,
Minus}
• column_name: Mention the column name on which you want to perform the set operation and want in
the result set
• FROM table_name_1: Mention the first table name from which the column has to be fetched
• FROM table_name_2: Mention the second table name from which the column has to be fetched
From the above-mentioned parameters, all the parameters are mandatory. You may use WHERE GROUP BY and
HAVING clauses based on your requirements.
Que.18 explain hash joining working in query in dbms
The name Hash join comes from the hash function(). This hash join is useful for middle to large inputs, but it is
not efficient for every small set. Hash join requires at least one equi join(=), and it supports all joins (left/ right
semi/ anti join). Hash join is the only physical operator that needs memory. Hash join consists of 2 phases.
1. Building or blocking phase
2. Probe or non-blocking phase
How it works
The Hash join is one of the three available joins for joining two tables. However, it is not only about joining. Hash
join is used to find the matching in two tables with a hash table, several joins are available, like nested loop join,
but the hash join is more efficient than the nested loop join.
Build phase
In the first phase, sever creates a hash table in the
memory. In this hash table, rows of the input will be
stored using join. Hash join attributes are used as
hash table keys. This build is called build input. Let’s
assume countries is designated as the build input.
The hash join condition is countries.country_id,
which belongs to the build input. It will be used as
the key in the hash table. Once all the rows are
stored in the hash table, the build phase is completed.
Probe phase
During the probe phase, the server reads rows from the probe input (persons in our illustration). For individual
rows, the server probes the hash table for comparing rows using the value from persons.country_id as the lookup
key. For the particular match, a joined row is sent to the client. In the end, the server scanned each input only
once, using constant time lookup to find matching rows between the two inputs.