Unit 3 NEP DBMS
Unit 3 NEP DBMS
Introduction
Database anomaly is a flaw in databases because of poor planning and storing everything in a
flat database. Anomalies occur when there is too much redundancy in the database. Poor table
design has related data scattered over various tables. Any new change in the database should
be updated in many places. It is also possible that the information is only half present. It's there
in one table, but missing in another one. Let's understand this by an example.
Example
Assume manufacturing company stores employee details in a table called as an employee
having four attributes:
• Emp_id for employee's id.
• Emp_name for employee's name.
• Emp_address for employee's address.
• Emp_dept for the department details in which the employee works.
The table will look like this. The table given below is not normalized. We will see how
problems arise when a table is not normalized.
Types of Anomalies
Insert anomaly
If a tuple is inserted in referencing relation and referencing attribute value is not present in
referenced attribute, it will not allow inserting in referencing relation.
Example
Assume that a new employee is joining the company under training and not assigned to any
department. Then, we would not insert the data into the table if the emp_dept field doesn't allow
nulls.
Update anomaly
If a tuple is updated from referenced relation and the referenced attribute value is used by
referencing attribute in referencing relation. In that case, it will not allow updating the tuple
from referenced relation.
Example
In the given table, we have two rows for an employee named Rick, and he belongs to two
different departments of the company. If we need to update Rick's address, we must update the
same address in two rows. Otherwise, the data will become inconsistent.
If, in some way, we can update the correct address in one department but not the other, then
according to the database, Rick will have two different addresses, which is not correct and
would lead to inconsistent data.
Delete anomaly:
If a tuple is deleted from referenced relation and the referenced attribute value is used by
referencing attribute in referencing relation, it will not allow deleting the tuple from referenced
relation.
Example
Assume that if the company closes the department D890, then deleting the rows that have
emp_dept as D890 would also delete the information of employee Maggie since she is assigned
only to this department.
Removal of Anomalies
To prevent anomalies, we need to normalize the database by efficiently organizing the data in
a database. Normalization is a systematic approach to eliminate data redundancy and Insertion,
Modification, and Deletion Anomalies by decomposing tables. The database designer
organizes the data to eliminate unnecessary duplications and provides a quick search path to
all necessary information.
According to Edgar F Codd, the inventor of relational databases, the goals of normalization
include:
• removing all redundant (or repeated) data from the database
• removing undesirable insertions, updates, and deletion dependencies
• reducing the need to restructure the entire database every time new fields are added to it
• making the relationships between tables more useful and understandable.
Relational Decomposition
o When a relation in the relational model is not in appropriate normal form then the
decomposition of a relation is required.
o In a database, it breaks the table into multiple tables.
o If the relation has no proper decomposition, then it may lead to problems like loss of
information.
o Decomposition is used to eliminate some of the problems of bad design like anomalies,
inconsistencies, and redundancy.
Types of Decomposition
Lossless Decomposition
o If the information is not lost from the relation that is decomposed, then the
decomposition will be lossless.
o The lossless decomposition guarantees that the join of relations will result in the same
relation as it was decomposed.
o The relation is said to be lossless decomposition if natural joins of all the decomposition
give the original relation.
Example:
EMPLOYEE_DEPARTMENT table:
The above relation is decomposed into two relations EMPLOYEE and DEPARTMENT
EMPLOYEE table:
22 Denim 28 Mumbai
33 Alina 25 Delhi
46 Stephan 30 Bangalore
52 Katherine 36 Mumbai
60 Jack 40 Noida
DEPARTMENT table
827 22 Sales
438 33 Marketing
869 46 Finance
575 52 Production
678 60 Testing
Now, when these two relations are joined on the common column "EMP_ID", then the resultant
relation will look like:
Employee ⋈ Department
Dependency Preserving
o It is an important constraint of the database.
o In the dependency preservation, at least one decomposed table must satisfy every
dependency.
o If a relation R is decomposed into relation R1 and R2, then the dependencies of R either
must be a part of R1 or R2 or must be derivable from the combination of functional
dependencies of R1 and R2.
o For example, suppose there is a relation R (A, B, C, D) with functional dependency set
(A->BC). The relational R is decomposed into R1(ABC) and R2(AD) which is
dependency preserving because FD A->BC is a part of relation R1(ABC).
Functional Dependency
The functional dependency is a relationship that exists between two attributes. It typically
exists between the primary key and non-key attribute within a table.
1. X → Y
The left side of FD is known as a determinant, the right side of the production is known as a
dependent.
For example:
Here Emp_Id attribute can uniquely identify the Emp_Name attribute of employee table
because if we know the Emp_Id, we can tell that employee name associated with it.
1. Emp_Id → Emp_Name
We can say that Emp_Name is functionally dependent on Emp_Id.
Example:
1. ID → Name,
2. Name → DOB
Armstrong's Axioms
A. Primary Rules
Rule 1 Reflexivity
If A is a set of attributes and B is a subset of A, then A holds B. { A → B }
Rule 2 Augmentation
If A hold B and C is a set of attributes, then AC holds BC. {AC → BC}
It means that attribute in dependencies does not change the basic dependencies.
Rule 3 Transitivity
If A holds B and B holds C, then A holds C.
If {A → B} and {B → C}, then {A → C}
A holds B {A → B} means that A functionally determines B.
B. Secondary Rules
Rule 1 Union
If A holds B and A holds C, then A holds BC.
If{A → B} and {A → C}, then {A → BC}
Rule 2 Decomposition
If A holds BC and A holds B, then A holds C.
If{A → BC} and {A → B}, then {A → C}
Sometimes Functional Dependency Sets are not able to reduce if the set has following
properties,
1. The Right-hand side set of functional dependency holds only one attribute.
2. The Left-hand side set of functional dependency cannot be reduced, it changes the entire
content of the set.
3. Reducing any functional dependency may change the content of the set.
A set of functional dependencies with the above three properties are also called as Canonical
or Minimal.
Trivial Functional Dependency
Example:
Consider relation E = (P, Q, R, S, T, U) having set of Functional Dependencies (FD).
P → Q P → R
QR → S Q → T
QR → U PR → U
1. P → T
2. PR → S
3. QR → SU
4. PR → SU
Solution:
1. P → T
In the above FD set, P → Q and Q → T
So, Using Transitive Rule: If {A → B} and {B → C}, then {A → C}
∴ If P → Q and Q → T, then P → T.
P → T
2. PR → S
In the above FD set, P → Q
As, QR → S
So, Using Pseudo Transitivity Rule: If{A → B} and {BC → D}, then {AC → D}
∴ If P → Q and QR → S, then PR → S.
PR → S
3. QR → SU
In above FD set, QR → S and QR → U
So, Using Union Rule: If{A → B} and {A → C}, then {A → BC}
∴ If QR → S and QR → U, then QR → SU.
QR → SU
4. PR → SU
So, Using Pseudo Transitivity Rule: If{A → B} and {BC → D}, then {AC → D}
∴ If PR → S and PR → U, then PR → SU.
PR → SU
What is the minimal set of functional dependencies or canonical cover of FD?
A minimal cover of a set of functional dependencies (FD) E is a minimal set of dependencies
F that is equivalent to E.
The formal definition is: A set of FD F to be minimal if it satisfies the following conditions −
• Every dependency in F has a single attribute for its right-hand side.
• We cannot replace any dependency X->A in F with a dependency Y->A, where Y is a
proper subset of X, and still have a set of dependencies that is equivalent to F.
• We cannot remove any dependency from F and still have a set of dependencies that are
equivalent to F.
Canonical cover is called minimal cover which is called the minimum set of FDs. A set of FD
FC is called canonical cover of F if each FD in FC is a −
• Simple FD.
• Left reduced FD.
• Non-redundant FD.
Simple FD − X->Y is a simple FD if Y is a single attribute.
Left reduced FD − X->Y is a left reduced FD if there are no extraneous attributes in X.
{extraneous attributes: Let XA->Y then, A is a extraneous attribute if X_>Y}
Non-redundant FD − X->Y is a Non-redundant FD if it cannot be derived from F- {X->y}.
Example
Consider an example to find canonical cover of F.
The given functional dependencies are as follows −
A -> BC
B -> C
A -> B
AB -> C
• Minimal cover: The minimal cover is the set of FDs which are equivalent to the given
FDs.
• Canonical cover: In canonical cover, the LHS (Left Hand Side) must be unique.
First of all, we will find the minimal cover and then the canonical cover.
First step − Convert RHS attribute into singleton attribute.
A -> B
A -> C
B -> C
A -> B
AB -> C
Second step − Remove the extra LHS attribute
Find the closure of A.
A+ = {A, B, C}
So, AB -> C can be converted into A -> C
A -> B
A -> C
B -> C
A -> B
A -> C
Third step − Remove the redundant FDs.
A -> B
B -> C
Now, we will convert the above set of FDs into canonical cover.
The canonical cover for the above set of FDs will be as follows −
A -> BC
B -> C
Normalization
A large database defined as a single relation may result in data duplication. This repetition of
data may result in:
So to handle these problems, we should analyze and decompose the relations with redundant
data into smaller, simpler, and well-structured relations that are satisfy desirable properties.
Normalization is a process of decomposing the relations into relations with fewer attributes.
What is Normalization?
o Normalization is the process of organizing the data in the database.
o Normalization is used to minimize the redundancy from a relation or set of relations. It
is also used to eliminate undesirable characteristics like Insertion, Update, and Deletion
Anomalies.
o Normalization divides the larger table into smaller and links them using relationships.
o The normal form is used to reduce redundancy from the database table.
o Insertion Anomaly: Insertion Anomaly refers to when one cannot insert a new tuple
into a relationship due to lack of data.
o Deletion Anomaly: The delete anomaly refers to the situation where the deletion of
data results in the unintended loss of some other important data.
o Updatation Anomaly: The update anomaly is when an update of a single data value
requires multiple rows of data to be updated.
Normalization works through a series of stages called Normal forms. The normal forms apply
to individual relations. The relation is said to be in particular normal form if it satisfies
constraints.
Normal Description
Form
2NF A relation will be in 2NF if it is in 1NF and all non-key attributes are
fully functional dependent on the primary key.
4NF A relation will be in 4NF if it is in Boyce Codd's normal form and has
no multi-valued dependency.
5NF A relation is in 5NF. If it is in 4NF and does not contain any join
dependency, joining should be lossless.
Advantages of Normalization
o Normalization helps to minimize data redundancy.
o Greater overall database organization.
o Data consistency within the database.
o Much more flexible database design.
o Enforces the concept of relational integrity.
Disadvantages of Normalization
o You cannot start building the database before knowing what the user needs.
o The performance degrades when normalizing the relations to higher normal forms, i.e.,
4NF, 5NF.
o It is very time-consuming and difficult to normalize relations of a higher degree.
o Careless decomposition may lead to a bad database design, leading to serious problems.
EMPLOYEE table:
14 John 7272826385, UP
9064738238
The decomposition of the EMPLOYEE table into 1NF has been shown below:
14 John 7272826385 UP
14 John 9064738238 UP
20 Harry 8574783832 Bihar
Example: Let's assume, a school can store the data of teachers and the subjects they teach. In
a school, a teacher can teach more than one subject.
TEACHER table
25 Chemistry 30
25 Biology 30
47 English 35
83 Math 38
83 Computer 38
To convert the given table into 2NF, we decompose it into two tables:
TEACHER_DETAIL table:
TEACHER_ID TEACHER_AGE
25 30
47 35
83 38
TEACHER_SUBJECT table:
TEACHER_ID SUBJECT
25 Chemistry
25 Biology
47 English
83 Math
83 Computer
Third Normal Form (3NF)
o A relation will be in 3NF if it is in 2NF and not contain any transitive partial
dependency.
o 3NF is used to reduce the data duplication. It is also used to achieve the data integrity.
o If there is no transitive dependency for non-prime attributes, then the relation must be
in third normal form.
A relation is in third normal form if it holds atleast one of the following conditions for every
non-trivial function dependency X → Y.
1. X is a super key.
2. Y is a prime attribute, i.e., each element of Y is part of some candidate key.
Example:
EMPLOYEE_DETAIL table:
Non-prime attributes: In the given table, all attributes except EMP_ID are non-prime.
That's why we need to move the EMP_CITY and EMP_STATE to the new
<EMPLOYEE_ZIP> table, with EMP_ZIP as a Primary key.
EMPLOYEE table:
EMPLOYEE_ZIP table:
201010 UP Noida
02228 US Boston
60007 US Chicago
06389 UK Norwich
462007 MP Bhopal
EMPLOYEE table:
1. EMP_ID → EMP_COUNTRY
2. EMP_DEPT → {DEPT_TYPE, EMP_DEPT_NO}
The table is not in BCNF because neither EMP_DEPT nor EMP_ID alone are keys.
To convert the given table into BCNF, we decompose it into three tables:
EMP_COUNTRY table:
EMP_ID EMP_COUNTRY
264 India
264 India
EMP_DEPT table:
EMP_DEPT_MAPPING table:
EMP_ID EMP_DEPT
D394 283
D394 300
D283 232
D283 549
Functional dependencies:
1. EMP_ID → EMP_COUNTRY
2. EMP_DEPT → {DEPT_TYPE, EMP_DEPT_NO}
Candidate keys:
Now, this is in BCNF because left side part of both the functional dependencies is a key.
Transaction
X's Account
1. Open_Account(X)
2. Old_Balance = X.balance
3. New_Balance = Old_Balance - 800
4. X.balance = New_Balance
5. Close_Account(X)
Y's Account
1. Open_Account(Y)
2. Old_Balance = Y.balance
3. New_Balance = Old_Balance + 800
4. Y.balance = New_Balance
5. Close_Account(Y)
Operations of Transaction:
Read(X): Read operation is used to read the value of X from the database and stores it in a
buffer in main memory.
Write(X): Write operation is used to write the value back to the database from the buffer.
Let's take an example to debit transaction from an account which consists of following
operations:
1. 1. R(X);
2. 2. X = X - 500;
3. 3. W(X);
o The first operation reads X's value from database and stores it in a buffer.
o The second operation will decrease the value of X by 500. So buffer will contain 3500.
o The third operation will write the buffer's value to the database. So X's final value will
be 3500.
But it may be possible that because of the failure of hardware, software or power, etc. that
transaction may fail before finished all the operations in the set.
For example: If in the above transaction, the debit transaction fails after executing operation
2 then X's value will remain 4000 in the database which is not acceptable by the bank.
Concurrency Control is the management procedure that is required for controlling concurrent
execution of the operations that take place on a database.
But before knowing about concurrency control, we should know about concurrent execution.
In a database transaction, the two main operations are READ and WRITE operations. So,
there is a need to manage these two operations in the concurrent execution of the transactions
as if these operations are not performed in an interleaved manner, and the data may become
inconsistent. So, the following problems occur with the Concurrent Execution of the
operations:
The problem occurs when two different database transactions perform the read/write
operations on the same database items in an interleaved manner (i.e., concurrent execution)
that makes the values of the items incorrect hence making the database inconsistent.
For example:
Consider the below diagram where two transactions TX and TY, are performed on the
same account A where the balance of account A is $300.
o At time t1, transaction TX reads the value of account A, i.e., $300 (only read).
o At time t2, transaction TX deducts $50 from account A that becomes $250 (only
deducted and not updated/write).
o Alternately, at time t3, transaction TY reads the value of account A that will be $300
only because TX didn't update the value yet.
o At time t4, transaction TY adds $100 to account A that becomes $400 (only added but
not updated/write).
o At time t6, transaction TX writes the value of account A that will be updated as $250
only, as TY didn't update the value yet.
o Similarly, at time t7, transaction TY writes the values of account A, so it will write as
done at time t4 that will be $400. It means the value written by T X is lost, i.e., $250 is
lost.
The dirty read problem occurs when one transaction updates an item of the database, and
somehow the transaction fails, and before the data gets rollback, the updated database item is
accessed by another transaction. There comes the Read-Write Conflict between both
transactions.
For example:
Consider two transactions TX and TY in the below diagram performing read/write
operations on account A where the available balance in account A is $300:
Failure Classification
To find that where the problem has occurred, we generalize a failure into the following
categories:
1. Transaction failure
2. System crash
3. Disk failure
1. Transaction failure
The transaction failure occurs when it fails to execute or when it reaches a point from
where it can't go any further. If a few transaction or process is hurt, then this is called
as transaction failure.
Reasons for a transaction failure could be -
2. System Crash
3. System failure can occur due to power failure or other hardware or software
failure. Example: Operating system error.
3. Disk Failure
4. It occurs where hard-disk drives or storage drives used to fail frequently. It was
a common problem in the early days of technology evolution.
5. Disk failure occurs due to the formation of bad sectors, disk head crash, and
unreachability to the disk or any other failure, which destroy all or part of disk
storage.
States of Transaction
Active state
o The active state is the first state of every transaction. In this state, the transaction is
being executed.
o For example: Insertion or deletion or updating a record is done here. But all the records
are still not saved to the database.
Partially committed
o In the partially committed state, a transaction executes its final operation, but the data
is still not saved to the database.
o In the total mark calculation example, a final display of the total marks step is executed
in this state.
Committed
Failed state
o If any of the checks made by the database recovery system fails, then the transaction is
said to be in the failed state.
o In the example of total mark calculation, if the database is not able to fire a query to
fetch the marks, then the transaction will fail to execute.
Aborted
o If any of the checks fail and the transaction has reached a failed state then the database
recovery system will make sure that the database is in its previous consistent state. If
not then it will abort or roll back the transaction to bring the database into a consistent
state.
o If the transaction fails in the middle of the transaction then before executing the
transaction, all the executed transactions are rolled back to its consistent state.
o After aborting the transaction, the database recovery module will select one of the two
operations:
1. Re-start the transaction
2. Kill the transaction
DBMS is the management of data that should remain integrated when any changes are done in
it. It is because if the integrity of the data is affected, whole data will get disturbed and
corrupted. Therefore, to maintain the integrity of the data, there are four properties described
in the database management system, which are known as the ACID properties. The ACID
properties are meant for the transaction that goes through a different group of tasks, and there
we come to see the role of the ACID properties.
In this section, we will learn and understand about the ACID properties. We will learn what
these properties stand for and what does each property is used for. We will also understand the
ACID properties with the help of some examples.
ACID Properties
1) Atomicity: The term atomicity defines that the data remains atomic. It means if any
operation is performed on the data, either it should be performed or executed completely or
should not be executed at all. It further means that the operation should not break in between
or execute partially. In the case of executing operations on the transaction, the operation should
be completely executed and not partially.
Example: If Remo has account A having $30 in his account from which he wishes to send $10
to Sheero's account, which is B. In account B, a sum of $ 100 is already present. When $10
will be transferred to account B, the sum will become $110. Now, there will be two operations
that will take place. One is the amount of $10 that Remo wants to transfer will be debited from
his account A, and the same amount will get credited to account B, i.e., into Sheero's account.
Now, what happens - the first operation of debit executes successfully, but the credit operation,
however, fails. Thus, in Remo's account A, the value becomes $20, and to that of Sheero's
account, it remains $100 as it was previously present.
In the above diagram, it can be seen that after crediting $10, the amount is still $100 in account
B. So, it is not an atomic transaction.
The below image shows that both debit and credit operations are done successfully. Thus the
transaction is atomic.
Thus, when the amount loses atomicity, then in the bank systems, this becomes a huge issue,
and so the atomicity is the main focus in the bank systems.
2) Consistency: The word consistency means that the value should remain preserved always.
In DBMS, the integrity of the data should be maintained, which means if a change in the
database is made, it should remain preserved always. In the case of transactions, the integrity
of the data is very essential so that the database remains consistent before and after the
transaction. The data should always be correct.
Example:
In the above figure, there are three accounts, A, B, and C, where A is making a transaction T
one by one to both B & C. There are two operations that take place, i.e., Debit and Credit.
Account A firstly debits $50 to account B, and the amount in account A is read $300 by B
before the transaction. After the successful transaction T, the available amount in B becomes
$150. Now, A debits $20 to account C, and that time, the value read by C is $250 (that is correct
as a debit of $50 has been successfully done to B). The debit and credit operation from account
A to C has been done successfully. We can see that the transaction is done successfully, and
the value is also read correctly. Thus, the data is consistent. In case the value read by B and C
is $300, which means that data is inconsistent because when the debit operation executes, it
will not be consistent.
4) Isolation: The term 'isolation' means separation. In DBMS, Isolation is the property of a
database where no data should affect the other one and may occur concurrently. In short, the
operation on one database should begin when the operation on the first database gets complete.
It means if two operations are being performed on two different databases, they may not affect
the value of one another. In the case of transactions, when two or more transactions occur
simultaneously, the consistency should remain maintained. Any changes that occur in any
particular transaction will not be seen by other transactions until the change is not committed
in the memory.
Example: If two operations are concurrently running on two different accounts, then the value
of both accounts should not get affected. The value should remain persistent. As you can see
in the below diagram, account A is making T1 and T2 transactions to account B and C, but
both are executing independently without affecting each other. It is known as Isolation.
4) Durability: Durability ensures the permanency of something. In DBMS, the term durability
ensures that the data after the successful execution of the operation becomes permanent in the
database. The durability of the data should be so perfect that even if the system fails or leads
to a crash, the database still survives. However, if gets lost, it becomes the responsibility of the
recovery manager for ensuring the durability of the database. For committing the values, the
COMMIT command must be used every time we make changes.
Therefore, the ACID property of DBMS plays a vital role in maintaining the consistency and
availability of data in the database.
Thus, it was a precise introduction of ACID properties in DBMS. We have discussed these
properties in the transaction section also.