Dbms Notes
Dbms Notes
DATA - Data refers to the raw facts , details or distinct process of information that are coollected stored
and processed. It can be in various forms such as numbers , text , images or any other format.
INFORMATION - Information refers to the processed and organized data that has context relevance and
purpose. It results from the interpretation and analysis of raw facts , making it meaningful and valuable
for decision making.
Example - If a train is coming to a particularr station then the viisula display of the traiins status on the
station board showing - train no , platform , arrival time is the information.
DBMS - DBMS is an organized way of managing , retrieving and storing from a collection of logically
related information.i.e a software application that capture store and analyze the required data.
It provides protection and security to the database. In the case of multiple users, it also maintains data
consistency.
Characterstics of DBMS -
It can provide a clear and logical view of the process that manipulates data.
It contains ACID properties which maintain data in a healthy state in case of failure.
It can view the database from different viewpoints according to the requirements of the user.
COMPONENT OF DBMS
Database -> Database is an organised collection of structured information or data typically stored
electronically inside the system.
For Example :
The college Database organizes the data about the admin, staff, students and faculty etc.
Using the database, you can easily retrieve, insert, and delete the information.
RDBMS - RDBMS is same as DBMS but it provides these functionalities with relational integrity
File System : A File Management system is a DBMS that allows access to single files or tables at a time.
In a File System, data is directly stored in a set of files. It contains flat files that have no relation to other
files (when only one table is stored in a single file, then this file is known as a flat file).
DBMS: A Database Management System (DBMS) is application software that allows users to efficiently
define, create, maintain and share databases.
Defining a database involves specifying the data types, structures and constraints of the data to be
stored in the database.
Creating a database involves storing the data on some storage medium that is controlled by DBMS.
Maintaining a database involves updating the database whenever required to evolve and reflect
changes in the miniworld and also generating reports for each change.
Sharing a database involves allowing multiple users to access the database. DBMS also serves as an
interface between the database and end users or application programs. It provides control access to
the data and ensures that data is consistent and correct by defining rules on them.
An application program accesses the database by sending queries or requests for data to the DBMS. A
query causes some data to be retrieved from the database.
Data sharing: The file system does not allow sharing of data or sharing is too complex. Whereas in
DBMS, data can be shared easily due to a centralized system.
Data concurrency: Concurrent access to data means more than one user is accessing the same data at
the same time. Anomalies occur when changes made by one user get lost because of changes made by
another user. The file system does not provide any procedure to stop anomalies. Whereas DBMS
provides a locking system to stop anomalies to occur.
Data searching: For every search operation performed on the file system, a different application program
has to be written. While DBMS provides inbuilt searching operations. The user only has to write a small
query to retrieve data from the database.
Data integrity: There may be cases when some constraints need to be applied to the data before
inserting it into the database to make sure it's validity and accuracy. The file system does not provide
any procedu re to check these constraints automatically. Whereas DBMS maintains data integrity by
enforcing user-defined constraints on data by itself.
System crashing: In some cases, systems might have crashed due to various reasons. It is a bane in the
case of file systems because once the system crashes, there will be no recovery of the data that’s been
lost. A DBMS will have the recovery manager which retrieves the data making it another advantage
over file systems.
Data security: A file system provides a password mechanism to protect the database but how long can
the password be protected? No one can guarantee that. This doesn’t happen in the case of DBMS. DBMS
has specialized features that help provide shielding to its data.
Interfaces: It provides different multiple user interfaces like graphical user interface and application
program interface.
Properties of Relation :
Each relation must have the unique name by which it is idntified in the dtabase.
The data must be atomic in a relation i.e each cell of a relation must contain exacttly one value.
Properties of a row/tuple.
RELATIONAL INTEGRITY = Relational Integrity ensures that there should be a relational among the tables
of the database and the data within them that have some unique constraints and rules , which can help
to maintain accuracy and reliability pf the data throughout the database lifecycle.
1. Entity Integrity
Entity integrity ensures that each table has a primary key and that the primary key is unique and not
null. This guarantees that each record within a table can be uniquely identified.
Primary Key Constraint: Ensures that each record has a unique identifier that is not null. For b example,
in a table of employees, the employee ID might be a primary key.
2. Referential Integrity
Referential integrity ensures that relationships between tables remain consistent. It ensures that a
foreign key in one table matches a primary key in another table, and that the foreign key values are valid.
Foreign Key Constraint: Ensures that a value in one table (child) must match a value in another table
(parent) or be null. For example, in an order table, the customer ID should match a valid ID in the
customer table.
3. Domain Integrity
Domain integrity ensures that all values in a column adhere to the defined domain constraints, such
as .data type , format an**d range etc**
Check Constraint: Ensures that the values in a column meet a specific condition. For example, a check
constraint might ensure that the age column in a table only contains positive numbers.
Data Type Constraint: Ensures that values in a column are of a specified data type, such as integer,
string, date, etc.
1. Data Independence - This allows to change the structure of the data without changing the structure of
any of the application program.
2. SHaring OF Data - Multiple users and access the data from the same database simultaneously.
3. Redundancy Control - It enforces the duplicacy and redundancy check by integrating all the data into a
single database.
4. Integrity Constraints - These constraints allows databases to store the records in a refined manner.
5. Backup and Recovery Facilities - It provides the feature of automatically creating a data backup and
provide it when required. In case of data loss or recovery.
DBMS ARCHITECHTURE :
The DBMS design depends upon its architecture. The basic client/server architecture is used to deal with
a large number of PCs, web servers, database servers and other components that are connected with
networks.
The client/server architecture consists of many PCs and a workstation which are connected via the
network.
DBMS architecture depends upon how users are connected to the database to get their request done.
The 2-Tier architecture is same as basic client-server. In the two-tier architecture, applications on the
client end can directly communicate with the database at the server side. For this interaction, API's like:
ODBC, JDBC are used.***
The user interfaces and application programs are run on the client-side.
The server side is responsible to provide the functionalities like: query processing and transaction
management.
To communicate with the DBMS, client-side application establishes a connection with the server side.
3-Tier Architecture
The 3-Tier architecture contains another layer between the client and server. In this architecture, client
can't directly communicate with the server.
The application on the client-end interacts with an application server which further communicates with
the database system.
End user has no idea about the existence of the database beyond the application server. The database
also has no idea about any other user beyond the application.
***
Data independence refers characteristic of being able to modify the schema at one level of the database
system without altering the schema at the next higher level.
Logical data independence refers characteristic of being able to change the conceptual schema without
having to change the external schema.
Logical data i****ndependence is used to separate the external level from the conceptual view.
If we do any changes in the conceptual view of the data, then the user view of the data would not be
affected.
Physical data independence can be defined as the capacity to change the internal schema without having
to change the conceptual schema.
If we do any changes in the storage size of the database system server, then the Conceptual structure of
the database will not be affected.
Physical data independence is used to separate conceptual levels from the internal levels.
1. DDL (Data Definition **Languages) - It consists of the commands which deals with the creation and
definition of the database schema** tables, indexes, constraints, etc. in the database..
2.** DML (*Data MANIpulation Language) = It consists of the commands that deal with the
manipulation of t***************he data present in these database.
****************3. DCL (Data Control Language) - It consists of commands which deals with the rights
, permissions and other controls of the Database system.
4. TCL (Transaction Control Language) = It consists of commands thst mainly deal with the transactions
inside the database.
DDL :
TRUNCATE: Removes all records from a table, but does not delete the table itself.
DML :
Example: INSERT INTO students (id, name, age) VALUES (1, 'John Doe', 20);
UPDATE: Modifies existing records in a table.
DCL :
Example:
TCL :
Example: COMMIT;
Example: ROLLBACK;
SAVEPOINT: Sets a point within a transaction to which you can later roll back.
QUERY OPTIMIZATION - Query Optimization is the phase that identifies the query that has the has the
least minimum evaluation cost.
This phase comes into picture when there are a lot of methods or alogrithms to excecute the same task.
Advanatages of Query optimization.
1. output is faster.
AGGREGATION -> aggregation refers to a property or function that allows for the computation of a single
value from a collection of values. Aggregation is commonly used in SQL to perform calculations on
multiple rows of data to return a single result.
Atomicity -> Atomicity ensures that each transaction is treated as a single, indivisible unit of work. This
means that a transaction must either be fully completed or fully aborted; partial transactions are not
allowed.
1. Physical level - The physical level is the lowest level of abstraction, which describes how data is
physically stored in the database. It includes details about data storage structures such as files, indices,
and the implementation of data access methods.
example - layout of the data stored on disk , data compression methods and indexing techniques.
2. Logical level - he logical level is a higher level of abstraction than the physical level. It describes what
data is stored in the database and the relationships among those data. It represents the entire database
as a collection of logical structures.
example - It includes schema of the database tables that contains definition of the tables , columns ,
data types , constraints and relationshup.
3. View Level - The view level is the highest level of abstraction, describing only a part of the entire
database. This level allows different users to have customized views of the database that hide the
complexities of the other levels.
example - A sales department might have a view that includes customer names and orders, while the
HR department might have a view that includes employee details without exposing sensitive information
like salaries.
E-R MODEL -> Entity Relationship Model is a diagrammatic representation of the database design where
you present real world objects as entities and mention relationship among them.
ENTITY -> Entity is a real world object , having some characteristics called as attributes.
ENTITY TYPE -> It is nothing but a collection of entities having same attribute.
ENTITY SET -> An entity set is a collection of all the entities of a patiicular entity type , such as set of
employee, set of companies and set of people etc..
RELATIONSHIP -> It can be defined as the relation between the entities that has something to do with
each other or a link between two or more tables in a database.
ONE TO ONE :
MANY TO ONE :
ONE TO MANY :
SELF REFERENCING :
Optimistic Approach : Involves Versioning , Validation (read validation and write validation) ,
commit/rollback.
ATOMICITY -> Atomicity refers to those transactions which are completely successful or failed.
CONSISTENCY -> Consistency means that the data must meet all the validation rules throughout the
database lifecycle.
ISOLATION -> Main goal of isolation is concurrency control. Which means that the inermediate state of a
transaction is invisible to other transaction until that transaction is complete.
DURABILITY -> Durability means that if a transaction has been committed it will happen whatever may
be the scenario.
NORMALIZATION : The way of organizing the data inside the database in order to maintain data
consistency and removal of duplicacy. It involves decomposing the databases into a smaller tables and
defining relationship among them in order to makle the database more flexible.
2NF :
3NF :
BCNF : BCNF comes into picture when your table is already present in 3NF But there are more than one
candidate key so in order to minimize the complexity of the table we further divide the tables in such a
manner so that there is only one candidate key left.
Data PARTIITIIIONING IS PROCESS OF DIVIDING THE LOCAL PART OF THE DATABASE INTO SMALLER UNIT
IN OREER TO MANTAIN , PERFORMANCE , MANAGEability.
CHECKPOINT IS A MECHANISM WHERE ALL THE PREVIOUS LOCKS ARE REMOVED FROM THE SYSTEM
AND PERMNAENTLY STORED OVER THE STORAGE DISK.
A trigger in a database is a set of SQL statements that automatically "fires" or executes when a specified
event occurs. Triggers are useful for:
Maintaining Data Integrity: Enforcing complex business rules and ensuring data consistency across
multiple tables.
Auditing and Logging: Keeping track of changes in data for audit purposes. For example, logging updates,
deletions, and insertions.
Synchronous Replication: Ensuring that changes in one table are automatically replicated in another
table.
Complex Validation: Implementing complex data validation rules that cannot be enforced by simple
constraints.
Cascading Actions: Automatically performing actions like cascading deletes or updates in related tables.
Name VARCHAR(100),
Position VARCHAR(50),
Salary DECIMAL(10, 2)
);
EmpID INT,
UpdateDate TIMESTAMP
);
Write a trigger which will insert the updated salary, the old
named Employee_Backup.
BEGIN
END;
CURSOR : Cursor is a database object which helps to manipulate the data row by row and stored it as a
result set
There are two types of cursor in DBMS
IMPLICIT :
EXPLICIT :
Aggregation
In aggregation, the relation between two entities is treated as a single entity. In aggregation, relationship
with its corresponding entities is aggregated into a higher level entity.
For example: Center entity offers the Course entity act as a single entity in the relationship which is in a
relationship with another entity visitor. In the real world, if a visitor visits a coaching center then he will
never enquiry about the Course only or just about the Center instead he will ask the enquiry about both.
In the database, every entity set or relationship set can be represented in tabular form.
In the given ER diagram, LECTURE, STUDENT, SUBJECT and COURSE forms individual tables.
In the STUDENT entity, STUDENT_NAME and STUDENT_ID form the column of STUDENT table. Similarly,
COURSE_NAME and COURSE_ID form the column of COURSE table and so on.
In the given ER diagram, COURSE_ID, STUDENT_ID, SUBJECT_ID, and LECTURE_ID are the key attribute of
the entity.
In the student table, a hobby is a multivalued attribute. So it is not possible to represent multiple values
in a single column of STUDENT table. Hence we create a table STUD_HOBBY with column name
STUDENT_ID and HOBBY. Using both the column, we create a composite key.
Composite attribute represented by components.
In the given ER diagram, student address is a composite attribute. It contains CITY, PIN, DOOR#, STREET,
and STATE. In the STUDENT table, these attributes can merge as an individual column.
In the STUDENT table, Age is the derived attribute. It can be calculated at any point of time by calculating
the difference between current date and Date of Birth.
-----------------------------------------
Using these rules, you can convert the ER diagram to tables and columns and assign the mapping
between the tables. Table structure for the given ER diagram is as below:
Relational Algebra
Relational algebra is a procedural query language. It gives a step by step process to obtain the result of
the query. It uses operators to perform queries.
Relational Calculus
There is an alternate way of formulating queries known as Relational Calculus. Relational calculus is a
non-procedural query language. In the non-procedural query language, the user is concerned with the
details of how to obtain the end results. The relational calculus tells what to do but never explains how
to do. Most commercial relational languages are based on aspects of relational calculus including SQL-
QBE and QUEL.
It is based on Predicate calculus, a name derived from branch of symbolic language. A predicate is a
truth-valued function with arguments. On substituting values for the arguments, the function result in
an expression called a proposition. It can be either true or false. It is a tailored version of a subset of the
Predicate Calculus to communicate with the relational database
Universal Quantifiers: The universal quantifier denoted by ∀ is read as for all which means that in a
given set of tuples exactly all tuples satisfy a given condition.
Existential Quantifiers: The existential quantifier denoted by ∃ is read as for all which means that in a
given set of tuples there is at least one occurrences whose value satisfy a given condition.
Before using the concept of quantifiers in formulas, we need to know the concept of Free and Bound
Variables.
A tuple variable t is bound if it is quantified which means that if it appears in any occurrences a variable
that is not bound is said to be free.
It is a non-procedural query language which is based on finding a number of tuple variables also known
as range variable for which predicate holds true. It describes the desired information without giving a
specific procedure for obtaining that information. The tuple relational calculus is specified to select the
tuples in a relation.
Notation:
For example:
Output: This query selects the tuples from the AUTHOR relation. It returns a tuple with 'name' from
Author who has written an article on 'database'.
TRC (tuple relation calculus) can be quantified. In TRC, we can use Existential (∃) and Universal
Quantifiers (∀).
For example:
Output: This query will yield the same result as the previous one.
The second form of relation is known as Domain relational calculus. In domain relational calculus,
as tuple calculus. It uses logical connectives ∧ (and), ∨ (or) and ┓ (not). It uses Existential (∃) and
filtering variable uses the domain of attributes. Domain relational calculus uses the same operators
Notation:
Where
For example:
Output: This query will yield the article, page, and subject from the relational
javatpoint, where the subject is a database.
Functional Dependency
X → Y
The left side of FD is known as a determinant, the right side of the production is
known as a dependent.
Example:
ID → Name,
Name → DOB
INFERENCE RULE :
-> The inference rule is a type of assertion. It can apply to a set of FD(functional
dependency) to derive other FD.
-> Using the inference rule, we can derive additional functional dependency from
the initial set
If X ⊇ Y then X → Y
Example:
X = {a, b, c, d, e}
Y = {a, b, c}
Example:
If X → Y and Y → Z then X → Z
Union rule says, if X determines Y and X determines Z, then X must also determine
Y and Z.
If X → Y and X → Z then X → YZ
*Proof:
1. X → Y (given)
2. X → Z (given)
Decomposition rule is also known as project rule. It is the reverse of union rule.
Proof:
1. X → YZ (given)
Normalization
A large database defined as a single relation may result in data duplication. This
repetition of data may result in:
->* Making relations very large.
-> It isn't easy to maintain and update data as it would involve searching many
records in relation.
What is Normalization?
-> Normalization divides the larger table into smaller and links them using
relationships.
-> The normal form is used to reduce redundancy from the database table.
Insertion Anomaly: Insertion Anomaly refers to when one cannot insert a new tuple
into a relationship due to lack of data.
Deletion Anomaly: The delete anomaly refers to the situation where the deletion of
data results in the unintended loss of some other important data.
Updatation Anomaly: The update anomaly is when an update of a single data value
requires multiple rows of data to be updated.
Normalization works through a series of stages called Normal forms. The normal
forms apply to individual relations. The relation is said to be in particular normal
form if it satisfies constraints.
It states that an attribute of a table cannot hold multiple values. It must hold only
single-valued attribute.
First normal form disallows the multi-valued attribute, composite attribute, and
their combinations.
Example :
StudentId → StudentName
ProjectId → ProjectName
A relation will be in 3NF if it is in 2NF and not contain any transitive partial
dependency.
3NF is used to reduce the data duplication. It is also used to achieve the data
integrity.
A relation is in third normal form if it holds atleast one of the following conditions
for every non-trivial function dependency X → Y.
conditions :
X is a super key.
For BCNF, the table should be in 3NF, and for every FD, LHS is super key.
Example: Let's assume there is a company where employees work in more than one
department.
The table is not in BCNF because neither EMP_DEPT nor EMP_ID alone are keys.
To convert the given table into BCNF, we decompose it into three tables:
A relation will be in 4NF if it is in Boyce Codd normal form and has no multi-valued
dependency.
Example :
The given STUDENT table is in 3NF, but the COURSE and HOBBY are two
independent entity. Hence, there is no relationship between COURSE and HOBBY.
A relation is in 5NF if it is in 4NF and not contains any join dependency and joining
should be lossless.
5NF is satisfied when all the tables are broken into as many tables as possible in
order to avoid redundancy.
Example :
In the above table, John takes both Computer and Math class for Semester 1 but he
doesn't take Math class for Semester 2. In this case, combination of all these fields
required to identify a valid data.
Suppose we add a new Semester as Semester 3 but do not know about the subject
and who will be taking that subject so we leave Lecturer and Subject as NULL. But
all three columns together acts as a primary key, so we can't leave other two
columns blank.
So to make the above table into 5NF, we can decompose it into three relations P1,
P2 & P3:
P1 :
P2 :
P3 :
Transaction3
Example: Suppose an employee of bank transfers Rs 800 from X's account to Y's
account. This small transaction contains several low-level tasks:
Operations of Transaction:
Read(X): Read operation is used to read the value of X from the database and
stores it in a buffer in main memory.
Write(X): Write operation is used to write the value back to the database from the
buffer.
same is for Y
The first operation reads X's value from database and stores it in a buffer.
The second operation will decrease the value of X by 500. So buffer will contain
3500.
The third operation will write the buffer's value to the database. So X's final value
will be 3500.
But it may be possible that because of the failure of hardware, software or power,
etc. that transaction may fail before finished all the operations in the set.
For example: If in the above transaction, the debit transaction fails after executing
operation 2 then X's value will remain 4000 in the database which is not acceptable
by the bank.
States of Transaction
Schedule
1. Serial Schedule
For example: Suppose there are two transactions T1 and T2 which have some
operations. If it has no interleaving of operations, then there are the following two
possible outcomes:
Execute all the operations of T1 which was followed by all the operations of T2.
Execute all the operations of T2 which was followed by all the operations of T1.
2. Non-serial Schedule
It contains many possible orders in which the system can execute the individual
operations of the transactions.
3. Serializable schedule
The serializability of schedules is used to find non-serial schedules that allow the
transaction to execute concurrently without interfering with one another.
It identifies which schedules are correct when executions of the transaction have
interleaving of their operations.
A non-serial schedule will be serializable if its result is equal to the result of its
transactions executed serially.
Conflicting Operations
Example:
Conflict Equivalent
The view serializable which does not conflict serializable contains blind writes.
INITIAL READ :
An initial read of both schedules must be the same. Suppose two schedule S1 and S2. In schedule S1, if
a transaction T1 is reading the data item A, then in S2, transaction T1 should also read A.
2. Updated Read
In schedule S1, if Ti is reading A which is updated by Tj then in S2 also, Ti should read A which is updated
by Tj.
. Final Write
A final write must be the same between both the schedules. In schedule S1, if a transaction updates A at
last then in S2, final writes operations should also be done by T1.
333
3333
3333
What is Anomaly?333333333333333333333333333333333333
Anomaly means inconsistency in the pattern from the normal form. In Database Management System
(DBMS)33333, anomaly means the inconsistency occurred in the relational table during the operations
p33erformed on the relational table.
There can33 be various reasons for anomalies to occur in the database. For example, if there is a lot of
redundant 3data present in our database then DBMS anomalies can occur. If a table is constructed in a
very poor manner then there is a chance of database anomaly. Due to database anomalies, the integrity
of the database suffers.
The other reason for the database anomalies is that all the data is stored in a single table. So, to remove
the anomalies of the database, normalization is the process which is done where the splitting of the
table and joining of the table (different types of join) occurs.33333
We will see the anomalies present in a table by the different examples:
When we update some rows in the table, and if it leads to the inconsistency of the table then this
anomaly occurs. This type of anomaly is known as an updation anomaly. In the above table, if we want to
update the address of Ramesh then we will have to update all the rows where Ramesh is present. If
during the update we miss any single row, then there will be two addresses of Ramesh, which will lead to
inconsistent and wrong databases.
If we delete some rows from the table and if any other information or data which is required is also
deleted from the database, this is called the deletion anomaly in the database. For example, in the above
table, if we want to delete the department number ECT669 then the details of Rajesh will also be
deleted since Rajesh's details are dependent on the row of ECT669. So, there will be deletion anomalies
in the table.II
Insertion / Anamoly
In a multi-user system, multiple users can access and use the same database at one time, which is known
as the concurrent execution of the database. It means that the same database is executed
simultaneously on a multi-user system by different users.
While working on the database transactions, there occurs the requirement of using the database by
multiple users for performing different operations, and in that case, concurrent execution of the
database is performed.
The thing is that the simultaneous execution that is performed should be done in an interleaved manner,
and no operation should affect the other executing operations, thus maintaining the consistency of the
database. Thus, on making the concurrent execution of the transaction operations, there occur several
challenging problems that need to be solved.
In a database transaction, the two main operations are READ and WRITE operations. So, there is a need
to manage these two operations in the concurrent execution of the transactions as if these operations
are not performed in an interleaved manner, and the data may become inconsistent. So, the following
problems occur with the Concurrent Execution of the operations:
The problem occurs when two different database transactions perform the read/write operations on the
same database items in an interleaved manner (i.e., concurrent execution) that makes the values of the
items incorrect hence making the database inconsistent.
For example:
Consider two transactions TX and TY in the below diagram performing read/write operations on account
A where the available balance in account A is $300:
Also known as Inconsistent Retrievals Problem that occurs when in a transaction, two different values
are read for the same database item.
For example:
Consider two transactions, TX and TY, performing the read/write operations on account A, having an
available balance = $300. The diagram is shown below:
Recoverability of Schedule
Sometimes a transaction may not execute completely due to a software issue, system crash or hardware
failure. In that case, the failed transaction has to be rollback. But some other transaction may also have
used value produced by the failed transaction. So we also have to rollback those transactions.
The above table 1 shows a schedule which has two transactions. T1 reads and writes the value of A and
that value is read and written by T2. T2 commits but later on, T1 fails. Due to the failure, we have to
rollback T1. T2 should also be rollback because it reads the value written by T1, but T2 can't be rollback
because it already committed. So this type of schedule is known as irrecoverable schedule.
A cascading rollback occurs in database systems when a transaction (T1) causes a failure and a rollback
must be performed. Other transactions dependent on T1's actions must also be rollbacked due to T1's
failure, thus causing a cascading effect.
Transaction failure
The transaction failure occurs when it fails to execute or when it reaches a point from where it can't go
any further. If a few transaction or process is hurt, then this is called as transaction failure.
Logical errors : If a transaction cannot complete due to some code error or an internal error condition,
then the logical error occurs.
Syntax error : It occurs where the DBMS itself terminates an active transaction because the database
system is not able to execute it. For example, The system aborts an active transaction, in case of
deadlock or resource unavailability.
LOG - BASED RECOVERY
Log-Based Recovery
The log is a sequence of records. Log of each transaction is maintained in some stable storage so that if
any failure occurs, then it can be recovered from there.
If any operation is performed on the database, then it will be recorded in the log.
But the process of storing the logs should be done before the actual transaction is applied in the
database.
Let's assume there is a transaction to modify the City of a student. The following logs are written for this
transaction.
<Tn, Start>
When the transaction modifies the City from 'Noida' to 'Bangalore', then another log is written to the
file.
When the transaction is finished, then it writes another log to indicate the end of the transaction.
<Tn, Commit>
The deferred modification technique occurs if the transaction does not modify the database until it has
committed.
In this method, all the logs are created and stored in the stable storage, and the database is updated
when a transaction commits.
The Immediate modification technique occurs if database modification occurs while the transaction is
still active.
In this technique, the database is modified immediately after every operation. It follows an actual
database modification.
When the system is crashed, then the system consults the log to find which transactions need to be
undone and which need to be redone.
If the log contains the record <Ti, Start> and <Ti, Commit> or <Ti, Commit>, then the Transaction Ti
needs to be redone.
If log contains record <Tn, Start> but does not contain the record either <Ti, commit> or <Ti, abort>,
then the Transaction Ti needs to be undone.
Checkpoint
The checkpoint is a type of mechanism where all the previous logs are removed from the system and
permanently stored in the storage disk.
The checkpoint is like a bookmark. While the execution of the transaction, such checkpoints are marked,
and the transaction is executed then using the steps of the transaction, the log files will be created.
When it reaches to the checkpoint, then the transaction will be updated into the database, and till that
point, the entire log file will be removed from the file. Then the log file is updated with the new step of
transaction till next checkpoint and so on.
The checkpoint is used to declare a point before which the DBMS was in the consistent state, and all
transactions were committed.
In the following manner, a recovery system recovers the database from this failure:
The recovery system reads log files from the end to start. It reads log files from T4 to T1.
The transaction is put into redo state if the recovery system sees a log with <Tn, Start> and <Tn,
Commit> or just <Tn, Commit>. In the redo-list and their previous list, all the transactions are removed
and then redone before saving their logs.
For example: In the log file, transaction T2 and T3 will have <Tn, Start> and <Tn, Commit>. The T1
transaction will have only <Tn, commit> in the log file. That's why the transaction is committed after the
checkpoint is crossed. Hence it puts T1, T2 and T3 transaction into redo list.
The transaction is put into undo state if the recovery system sees a log with <Tn, Start> but no commit or
abort log found. In the undo-list, all the transactions are undone, and their logs are removed.
For example: Transaction T4 will have <Tn, Start>. So T4 will be put into undo list since this transaction is
not yet complete and failed amid.
Deadlock in DBMS
A deadlock is a condition where two or more transactions are waiting indefinitely for one another to
give up locks. Deadlock is said to be one of the most feared complications in DBMS as no task ever gets
finished and is in waiting state forever.
For example: In the student table, transaction T1 holds a lock on some rows and needs to update some
rows in the grade table. Simultaneously, transaction T2 holds locks on some rows in the grade table and
needs to update the rows in the Student table held by Transaction T1.
Deadlock Avoidance
When a database is stuck in a deadlock state, then it is better to avoid the database rather than aborting
or restating the database. This is a waste of time and resource.
Deadlock avoidance mechanism is used to detect any deadlock situation in advance. A method like "wait
for graph" is used for detecting the deadlock situation but this method is suitable only for the smaller
database. For the larger database, deadlock prevention method can be used.
Deadlock Detection
In a database, when a transaction waits indefinitely to obtain a lock, then *the DBMS should detect
whether the transaction is involved in a deadlock or not. The lock manager maintains a Wait for the
graph to detect the deadlock cycle in the database
This is the suitable method for deadlock detection. In this method, a graph is created based on the
transaction and their lock. If the created graph has a cycle or closed loop, then there is a deadlock.
The wait for the graph is maintained by the system for every transaction w**hich is waiting for some
data held by the others. The system keeps checking the graph if there is any cycle in the graph.
The wait for a graph for the above scenario is shown below:
Deadlock Prevention
Deadlock prevention method is suitable for a large database. If the resources are allocated in such a way
that deadlock never occurs, then the deadlock can be prevented.
T*he Database management system analyzes the operations of the transaction whether they can create
a deadlock situation or not. If they do, then the DBMS never allowed that transaction to be executed.
Wait-Die scheme
In this scheme, if a transaction requests for a resource which is already held with a conflicting lock by
another transaction then the DBMS simply checks the timestamp of both transactions. It allows the
older transaction to wait until the resource is available for execution.
Check if TS(Ti) < TS(Tj) - If Ti is the older transaction and Tj has held some resource, then Ti is allowed
to wait until the data-item is available for execution.
That means if the older transaction is waiting for a resource which is locked by the younger transaction,
then the older transaction is allowed to wait for resource until it is available.
Check if TS(Ti) < TS(Tj) - If Ti is older transaction and has held some resource and if Tj is waiting for it,
then Tj is killed and restarted later with the random delay but with the same timestamp.
In wound wait scheme, if the older transaction requests for a resource which is held by the younger
transaction, then older transaction forces younger one to kill the transaction and release the resource.
After the minute delay, the younger transaction is restarted but with the same timestamp.
If the older transaction has held a resource which is requested by the Younger transaction, then the
younger transaction is asked to wait until older releases it.
DEADLOCK : Deadlock is a state where two transaction wait on a resource which is locked or other
transaction holds.
Deadlock can be prevented by making all the trnasaction acquirng the locks at the same instance of
time.
and if by chance a deadlock occurs the only way to cure it is to abort one of the transaction.
SHARED LOCK : A shared lock allows more than one transaction to read the data items.
It is the simplest way of locking the data while transaction. Simplistic lock-based protocols allow all the
transactions to get the lock on the data before insert or delete or update on it. It will unlock the data
item after completing the transaction.
The two-phase locking protocol divides the execution phase of the transaction into three parts.
In the first** part, when the execution of the transaction starts, it seeks permission for the lock it
requires.
In the second part, the transaction acquires all the locks. The third phase is started as soon as the
transaction releases its first lock.
In the third phase, the transaction cannot demand any new locks. It only releases the acquired locks.*
TIME STAMPING PROTOCOL -> Basic idea of time stamping protocol is to decide the order between the
transactions before they enter into the system, so that in case of conflict during the execution we can
resolve the connflict using ordering.
The reason w*e call time stamp and not stamp because for stamping the value of the system's clock
cycle is taken when a transaction enters into a system. ( as it will always be unique).
Time stamp with Transaction -> With each transaction ti we associate a tim*e stamp denoted by T.S(ti) ,
if a new transaction tj enters after ti then , T.S.(ti) < T.S.(tj) then the system ensures that in the resultant
conflict serializ****************************************able schedule ti will execute first before
tj.
Time stamp with data item -> For each data item , it maintains a two protcol.
Write - Time Stamp(Q) : It is the largest time stamp of any transaction that executed write(Q).
Read - Time Stamp(Q) : It is the largest time stamp of any transaction that executed read(Q).
if T.S.(ti) >= W.T.S(Q) - ope*ration ti can be allowed and R.T.S(Q) will have max(R.T.S(Q) , T.S.(ti)).
if T.S.(ti) < R.T.S(Q) - means value of Q that ti is producing was needed previously and the system
a*ssumed that the value would never be produced hence Reject and Rollback.
if T.S.(ti) < W.T.S(Q) - ti is attempting to write an obsolete value of Q reject and rollback *otherwise
W.T.S(Q) = max(T.S(ti) , W.T.S(Q)).
Possibility of a dirty read *,** there**fore irrecoverable schedules and cascading rollbacks are possible.*
Thom***** ***************
************************************************************************** ******* **
* *** *** ************************************************* * as Write Rule : It* is a
modification of a time stamping protocol which can generate a shcedule which is not conflict serializable
but view serializable.
it modifies time stamping protocl in obsolete write case = when T.S.(ti) < W.T.S(Q) where ti request for
write therefore instead of rollbacking ti we can ignore it as the final change would be of the W.T.S(Q).
**
LOCK BASED PROTOCOL -> To achieve consistency , isolation is the most important idea_locking is
simplest idea to achieve isolation first obtain a locks on a data item then preferred a desired operation
and then unlock it.
To provide better concurrency along with isolation we use different modes of locking.
Shared Mode : denoted by lock S(Q) transaction can perform Read Operation , any other transaction
can also obtain the same lock on the data item, at the same time( so called shared).
Exclusive Mode : Denoted by lock - X(Q) transaction can perform both Read/Write operation , any other
transaction cannot obtained either shared /Exclusive lock . when this lock is on.
if *****we do unlocking inconsistency will occur , and if we do not unlock then concurrency will be
poor.*
so we ***require that transaction must follow some set of rules for locking and unlocking of data item.
therefore we say that a schedule is legal under a protocol if it can be generated using the rules of the
protocol.
2 Phase Locking Protocol -> This protocol states that each transaction in a schedule will be in it's two
phases i. Growing Phase or the ii. Shrinking Phase
In growing phase transaction can only acquire locks but cannot release any lock.
In Shrinking Phase transaction can only release a lock but cannot cannot acquire anty lock.
Transaction can perfom read and write operation in both the phases.
transaction willl be conflict serializable as well as view serializable and the order in which transactionn
reaches the lock point will be the order of serializability.
TYPES OF 2 PL :
Conservative 2PL : There is no growing phase i.e first the transacion will acquire all the locks , ie it
reaches to the lock point.
If all the locks are not availiable then transaction will release the lock acquired so far and wait.
Here we must have all the knowledge that what data item will be required during execution.
Rigorous 2PL : IT an imporved version of 2PL protcol where we try to ensure recoverability and
cascadlessness.
-> Rigorous PL requires that all the locks must be held until transaction commits , i.e there is no shrinking
phase in the life of a transaction.
suffer from dea****dlock and inefficiency . ineffieciency because no other transaction can access the
data item which has been locked by other transaction until that tranasaction has committed.
-> In the shrinking phase unlocking of exclusive locks are not allowed but unlocking of shared locks can
be done.
-*****> All the properties **are same as that of rigorous 2PL , but it provides better concurrency.
-> There are various model that can give additional information each differing in the amount of info
provided.
Prerequisite :
-> The idea is to have the prior knowledge of the order in which the database item will be accessed.
we impose partial ordering -> on set all data items D - { d1 , d2 , d3 , ....... dn } if di -> dj then any
transaction accessing both di and dj must access di before dj.
-> Partial Ordering may be because of logical or physical organization or only because of conurrency
control.
-> After Partial Ordering set of all data items D will now be viewed as Directed Ascyclic Graph.
Multiple Granularity
KEY POINTS :
It can be defined as hierarchically breaking up the database into blocks which can be locked.
The Multiple Granularity protocol enhances concurrency and reduces lock overhead.
It makes easy to decide either to lock a data item or to unlock a data item. This type of hierarchy can be
graphically represented as a tree.
The second level represents a node of type area. The higher level database consists of exactly these
areas.
The area consists of children nodes which are known as files. No file can be present in more than one
area.
Finally, each file contains child nodes known as records. The file has exactly those records that are its
child nodes. No records represent in more than one file.
Hence, the levels of the tree starting from the top level are as follows:
Database
Area
File
Record
Intention-shared (IS): It contains explicit locking at a lower level of the tree but only with shared locks.
AT Least one descendants should be at the shared lock S.
Intention-Exclusive (IX): It contains explicit locking at a lower level with exclusive or shared locks.AT
Least one descendants should be at the exclusive lock X.
Shared & Intention-Exclusive (SIX): In this lock, the node is locked in shared mode, and some node is
locked in exclusive mode by the same transaction. The node is in Shared lock and atleast one of it's
decendants must be having exclusive lock.
To determine whether the relation \( R(P, Q, R, S, T) \) is in 2NF, we first need to understand what 2NF
entails and then examine the relation and its functional dependencies (FDs).
1. It is in 1NF (First Normal Form), which means that all the attributes contain only atomic (indivisible)
values.
2. It has no partial dependency; that is, no non-prime attribute (attribute that is not part of a
candidate key) is dependent on a part of any candidate key.
- To determine the candidate keys, we need to find the attribute closure and ensure all attributes can
be derived.
- \( PQ \to R \): \( R \) is non-prime, and \( PQ \) is part of the candidate key \( \{P, Q, S\} \). This is
a partial dependency.
- \( S \to T \): \( T \) is non-prime, and \( S \) is part of the candidate key \( \{P, Q, S\} \). This is a
partial dependency.
Converting to 2NF
3. **Remaining Attributes:**
- Create a new relation with the remaining attributes that are part of the candidate key: \( R3(P, Q, S)
\).
3. \( R3(P, Q, S) \)
- \( R3 \): \( PQS \) is the candidate key, and there are no partial dependencies. (Here, \( P, Q, S \) are
collectively the candidate key, and there are no non-prime attributes to consider.)
PL/SQL Cursor
When an SQL statement is processed, Oracle creates a memory area known as context area. A cursor is
a pointer to this context area. It contains all information needed for processing the statement. In
PL/SQL, the context area is controlled by Cursor. A cursor contains information on a select statement
and the rows of data accessed by it.
A cursor is used to referred to a program to fetch and process the rows returned by the SQL statement,
one at a time. There are two types of cursors:
The implicit cursors are automatically generated by Oracle while an SQL statement is executed, if you
don't use an explicit cursor for the statement.
These are created by default to process the statements when DML statements like INSERT, UPDATE,
DELETE etc. are executed.
The Explicit cursors are defined by the programmers to gain more control over the context area. These
cursors should be defined in the declaration section of the PL/SQL block. It is created on a SELECT
statement which returns more than one row.
PL/SQL Trigger
Trigger is invoked by Oracle engine automatically whenever a specified event occurs.Trigger is stored into
database and invoked repeatedly, when specific condition match.
Triggers are stored programs, which are automatically executed or fired when some event occurs.
Advantages of Triggers
Auditing