0% found this document useful (0 votes)
7 views

Unit 3 NEP DBMS

The document discusses database anomalies caused by poor design and redundancy in relational models, highlighting types such as insert, update, and delete anomalies. It emphasizes the importance of normalization to eliminate these issues by organizing data efficiently, leading to lossless decomposition and dependency preservation. Additionally, it covers functional dependencies and Armstrong's Axioms as tools for reasoning about these dependencies in database design.

Uploaded by

Nishath Ashraf
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views

Unit 3 NEP DBMS

The document discusses database anomalies caused by poor design and redundancy in relational models, highlighting types such as insert, update, and delete anomalies. It emphasizes the importance of normalization to eliminate these issues by organizing data efficiently, leading to lossless decomposition and dependency preservation. Additionally, it covers functional dependencies and Armstrong's Axioms as tools for reasoning about these dependencies in database design.

Uploaded by

Nishath Ashraf
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 27

Unit 3:Data Normalization

Anomalies in Relational Model

Introduction
Database anomaly is a flaw in databases because of poor planning and storing everything in a
flat database. Anomalies occur when there is too much redundancy in the database. Poor table
design has related data scattered over various tables. Any new change in the database should
be updated in many places. It is also possible that the information is only half present. It's there
in one table, but missing in another one. Let's understand this by an example.
Example
Assume manufacturing company stores employee details in a table called as an employee
having four attributes:
• Emp_id for employee's id.
• Emp_name for employee's name.
• Emp_address for employee's address.
• Emp_dept for the department details in which the employee works.

The table will look like this. The table given below is not normalized. We will see how
problems arise when a table is not normalized.

Types of Anomalies
Insert anomaly
If a tuple is inserted in referencing relation and referencing attribute value is not present in
referenced attribute, it will not allow inserting in referencing relation.
Example
Assume that a new employee is joining the company under training and not assigned to any
department. Then, we would not insert the data into the table if the emp_dept field doesn't allow
nulls.
Update anomaly
If a tuple is updated from referenced relation and the referenced attribute value is used by
referencing attribute in referencing relation. In that case, it will not allow updating the tuple
from referenced relation.
Example
In the given table, we have two rows for an employee named Rick, and he belongs to two
different departments of the company. If we need to update Rick's address, we must update the
same address in two rows. Otherwise, the data will become inconsistent.
If, in some way, we can update the correct address in one department but not the other, then
according to the database, Rick will have two different addresses, which is not correct and
would lead to inconsistent data.
Delete anomaly:
If a tuple is deleted from referenced relation and the referenced attribute value is used by
referencing attribute in referencing relation, it will not allow deleting the tuple from referenced
relation.
Example
Assume that if the company closes the department D890, then deleting the rows that have
emp_dept as D890 would also delete the information of employee Maggie since she is assigned
only to this department.
Removal of Anomalies
To prevent anomalies, we need to normalize the database by efficiently organizing the data in
a database. Normalization is a systematic approach to eliminate data redundancy and Insertion,
Modification, and Deletion Anomalies by decomposing tables. The database designer
organizes the data to eliminate unnecessary duplications and provides a quick search path to
all necessary information.
According to Edgar F Codd, the inventor of relational databases, the goals of normalization
include:
• removing all redundant (or repeated) data from the database
• removing undesirable insertions, updates, and deletion dependencies
• reducing the need to restructure the entire database every time new fields are added to it
• making the relationships between tables more useful and understandable.
Relational Decomposition

o When a relation in the relational model is not in appropriate normal form then the
decomposition of a relation is required.
o In a database, it breaks the table into multiple tables.
o If the relation has no proper decomposition, then it may lead to problems like loss of
information.
o Decomposition is used to eliminate some of the problems of bad design like anomalies,
inconsistencies, and redundancy.
Types of Decomposition

Lossless Decomposition
o If the information is not lost from the relation that is decomposed, then the
decomposition will be lossless.
o The lossless decomposition guarantees that the join of relations will result in the same
relation as it was decomposed.
o The relation is said to be lossless decomposition if natural joins of all the decomposition
give the original relation.

Example:

EMPLOYEE_DEPARTMENT table:

EMP_ID EMP_NAME EMP_AGE EMP_CITY DEPT_ID DEPT_NAME

22 Denim 28 Mumbai 827 Sales

33 Alina 25 Delhi 438 Marketing

46 Stephan 30 Bangalore 869 Finance

52 Katherine 36 Mumbai 575 Production

60 Jack 40 Noida 678 Testing

The above relation is decomposed into two relations EMPLOYEE and DEPARTMENT
EMPLOYEE table:

EMP_ID EMP_NAME EMP_AGE EMP_CITY

22 Denim 28 Mumbai

33 Alina 25 Delhi

46 Stephan 30 Bangalore

52 Katherine 36 Mumbai

60 Jack 40 Noida

DEPARTMENT table

DEPT_ID EMP_ID DEPT_NAME

827 22 Sales

438 33 Marketing

869 46 Finance

575 52 Production

678 60 Testing

Now, when these two relations are joined on the common column "EMP_ID", then the resultant
relation will look like:

Employee ⋈ Department

EMP_ID EMP_NAME EMP_AGE EMP_CITY DEPT_ID DEPT_NAME

22 Denim 28 Mumbai 827 Sales


33 Alina 25 Delhi 438 Marketing

46 Stephan 30 Bangalore 869 Finance

52 Katherine 36 Mumbai 575 Production

60 Jack 40 Noida 678 Testing

Hence, the decomposition is Lossless join decomposition.

Dependency Preserving
o It is an important constraint of the database.
o In the dependency preservation, at least one decomposed table must satisfy every
dependency.
o If a relation R is decomposed into relation R1 and R2, then the dependencies of R either
must be a part of R1 or R2 or must be derivable from the combination of functional
dependencies of R1 and R2.
o For example, suppose there is a relation R (A, B, C, D) with functional dependency set
(A->BC). The relational R is decomposed into R1(ABC) and R2(AD) which is
dependency preserving because FD A->BC is a part of relation R1(ABC).

Functional Dependency

The functional dependency is a relationship that exists between two attributes. It typically
exists between the primary key and non-key attribute within a table.

1. X → Y

The left side of FD is known as a determinant, the right side of the production is known as a
dependent.

For example:

Assume we have an employee table with attributes: Emp_Id, Emp_Name, Emp_Address.

Here Emp_Id attribute can uniquely identify the Emp_Name attribute of employee table
because if we know the Emp_Id, we can tell that employee name associated with it.

Functional dependency can be written as:

1. Emp_Id → Emp_Name
We can say that Emp_Name is functionally dependent on Emp_Id.

Types of Functional dependency

1. Trivial functional dependency


o A → B has trivial functional dependency if B is a subset of A.
o The following dependencies are also trivial like: A → A, B → B

1. Consider a table with two columns Employee_Id and Employee_Name.


2. {Employee_id, Employee_Name} → Employee_Id is a trivial functional dependency as
3. Employee_Id is a subset of {Employee_Id, Employee_Name}.
4. Also, Employee_Id → Employee_Id and Employee_Name → Employee_Name are trivial
dependencies too.

2. Non-trivial functional dependency


o A → B has a non-trivial functional dependency if B is not a subset of A.
o When A intersection B is NULL, then A → B is called as complete non-trivial.

Example:

1. ID → Name,
2. Name → DOB

Armstrong's Axioms

Introduction to Axioms Rules

• Armstrong's Axioms is a set of rules.


• It provides a simple technique for reasoning about functional dependencies.
• It was developed by William W. Armstrong in 1974.
• It is used to infer all the functional dependencies on a relational database.

Various Axioms Rules

A. Primary Rules

Rule 1 Reflexivity
If A is a set of attributes and B is a subset of A, then A holds B. { A → B }

Rule 2 Augmentation
If A hold B and C is a set of attributes, then AC holds BC. {AC → BC}
It means that attribute in dependencies does not change the basic dependencies.

Rule 3 Transitivity
If A holds B and B holds C, then A holds C.
If {A → B} and {B → C}, then {A → C}
A holds B {A → B} means that A functionally determines B.

B. Secondary Rules

Rule 1 Union
If A holds B and A holds C, then A holds BC.
If{A → B} and {A → C}, then {A → BC}

Rule 2 Decomposition
If A holds BC and A holds B, then A holds C.
If{A → BC} and {A → B}, then {A → C}

Rule 3 Pseudo Transitivity


If A holds B and BC holds D, then AC holds D.
If{A → B} and {BC → D}, then {AC → D}

Sometimes Functional Dependency Sets are not able to reduce if the set has following
properties,

1. The Right-hand side set of functional dependency holds only one attribute.
2. The Left-hand side set of functional dependency cannot be reduced, it changes the entire
content of the set.
3. Reducing any functional dependency may change the content of the set.

A set of functional dependencies with the above three properties are also called as Canonical
or Minimal.
Trivial Functional Dependency

Trivial If A holds B {A → B}, where A is a subset of B, then it is called a Trivial


Functional Dependency. Trivial always holds Functional Dependency.

Non-Trivial If A holds B {A → B}, where B is not a subset A, then it is called as


a Non-Trivial Functional Dependency.

Completely Non- If A holds B {A → B}, where A intersect Y = Φ, it is called as


Trivial a Completely Non-Trivial Functional Dependency.

Example:
Consider relation E = (P, Q, R, S, T, U) having set of Functional Dependencies (FD).

P → Q P → R
QR → S Q → T
QR → U PR → U

Calculate some members of Axioms are as follows,

1. P → T
2. PR → S
3. QR → SU
4. PR → SU

Solution:

1. P → T
In the above FD set, P → Q and Q → T
So, Using Transitive Rule: If {A → B} and {B → C}, then {A → C}
∴ If P → Q and Q → T, then P → T.
P → T

2. PR → S
In the above FD set, P → Q
As, QR → S
So, Using Pseudo Transitivity Rule: If{A → B} and {BC → D}, then {AC → D}
∴ If P → Q and QR → S, then PR → S.
PR → S

3. QR → SU
In above FD set, QR → S and QR → U
So, Using Union Rule: If{A → B} and {A → C}, then {A → BC}
∴ If QR → S and QR → U, then QR → SU.
QR → SU

4. PR → SU
So, Using Pseudo Transitivity Rule: If{A → B} and {BC → D}, then {AC → D}
∴ If PR → S and PR → U, then PR → SU.
PR → SU
What is the minimal set of functional dependencies or canonical cover of FD?
A minimal cover of a set of functional dependencies (FD) E is a minimal set of dependencies
F that is equivalent to E.
The formal definition is: A set of FD F to be minimal if it satisfies the following conditions −
• Every dependency in F has a single attribute for its right-hand side.
• We cannot replace any dependency X->A in F with a dependency Y->A, where Y is a
proper subset of X, and still have a set of dependencies that is equivalent to F.
• We cannot remove any dependency from F and still have a set of dependencies that are
equivalent to F.
Canonical cover is called minimal cover which is called the minimum set of FDs. A set of FD
FC is called canonical cover of F if each FD in FC is a −
• Simple FD.
• Left reduced FD.
• Non-redundant FD.
Simple FD − X->Y is a simple FD if Y is a single attribute.
Left reduced FD − X->Y is a left reduced FD if there are no extraneous attributes in X.
{extraneous attributes: Let XA->Y then, A is a extraneous attribute if X_>Y}
Non-redundant FD − X->Y is a Non-redundant FD if it cannot be derived from F- {X->y}.
Example
Consider an example to find canonical cover of F.
The given functional dependencies are as follows −
A -> BC
B -> C
A -> B
AB -> C
• Minimal cover: The minimal cover is the set of FDs which are equivalent to the given
FDs.
• Canonical cover: In canonical cover, the LHS (Left Hand Side) must be unique.
First of all, we will find the minimal cover and then the canonical cover.
First step − Convert RHS attribute into singleton attribute.
A -> B
A -> C
B -> C
A -> B
AB -> C
Second step − Remove the extra LHS attribute
Find the closure of A.
A+ = {A, B, C}
So, AB -> C can be converted into A -> C
A -> B
A -> C
B -> C
A -> B
A -> C
Third step − Remove the redundant FDs.
A -> B
B -> C
Now, we will convert the above set of FDs into canonical cover.
The canonical cover for the above set of FDs will be as follows −
A -> BC
B -> C
Normalization

A large database defined as a single relation may result in data duplication. This repetition of
data may result in:

o Making relations very large.


o It isn't easy to maintain and update data as it would involve searching many records in
relation.
o Wastage and poor utilization of disk space and resources.
o The likelihood of errors and inconsistencies increases.

So to handle these problems, we should analyze and decompose the relations with redundant
data into smaller, simpler, and well-structured relations that are satisfy desirable properties.
Normalization is a process of decomposing the relations into relations with fewer attributes.

What is Normalization?
o Normalization is the process of organizing the data in the database.
o Normalization is used to minimize the redundancy from a relation or set of relations. It
is also used to eliminate undesirable characteristics like Insertion, Update, and Deletion
Anomalies.
o Normalization divides the larger table into smaller and links them using relationships.
o The normal form is used to reduce redundancy from the database table.

Why do we need Normalization?


The main reason for normalizing the relations is removing these anomalies. Failure to eliminate
anomalies leads to data redundancy and can cause data integrity and other problems as the
database grows. Normalization consists of a series of guidelines that helps to guide you in
creating a good database structure.

Data modification anomalies can be categorized into three types:

o Insertion Anomaly: Insertion Anomaly refers to when one cannot insert a new tuple
into a relationship due to lack of data.
o Deletion Anomaly: The delete anomaly refers to the situation where the deletion of
data results in the unintended loss of some other important data.
o Updatation Anomaly: The update anomaly is when an update of a single data value
requires multiple rows of data to be updated.

Types of Normal Forms:

Normalization works through a series of stages called Normal forms. The normal forms apply
to individual relations. The relation is said to be in particular normal form if it satisfies
constraints.

Following are the various types of Normal forms:

Normal Description
Form

1NF A relation is in 1NF if it contains an atomic value.

2NF A relation will be in 2NF if it is in 1NF and all non-key attributes are
fully functional dependent on the primary key.

3NF A relation will be in 3NF if it is in 2NF and no transition dependency


exists.
BCNF A stronger definition of 3NF is known as Boyce Codd's normal form.

4NF A relation will be in 4NF if it is in Boyce Codd's normal form and has
no multi-valued dependency.

5NF A relation is in 5NF. If it is in 4NF and does not contain any join
dependency, joining should be lossless.

Advantages of Normalization
o Normalization helps to minimize data redundancy.
o Greater overall database organization.
o Data consistency within the database.
o Much more flexible database design.
o Enforces the concept of relational integrity.

Disadvantages of Normalization
o You cannot start building the database before knowing what the user needs.
o The performance degrades when normalizing the relations to higher normal forms, i.e.,
4NF, 5NF.
o It is very time-consuming and difficult to normalize relations of a higher degree.
o Careless decomposition may lead to a bad database design, leading to serious problems.

First Normal Form (1NF)

o A relation will be 1NF if it contains an atomic value.


o It states that an attribute of a table cannot hold multiple values. It must hold only single-
valued attribute.
o First normal form disallows the multi-valued attribute, composite attribute, and their
combinations.

Example: Relation EMPLOYEE is not in 1NF because of multi-valued attribute


EMP_PHONE.

EMPLOYEE table:

EMP_ID EMP_NAME EMP_PHONE EMP_STATE

14 John 7272826385, UP
9064738238

20 Harry 8574783832 Bihar


12 Sam 7390372389, Punjab
8589830302

The decomposition of the EMPLOYEE table into 1NF has been shown below:

EMP_ID EMP_NAME EMP_PHONE EMP_STATE

14 John 7272826385 UP

14 John 9064738238 UP
20 Harry 8574783832 Bihar

12 Sam 7390372389 Punjab


12 Sam 8589830302 Punjab

Second Normal Form (2NF)

o In the 2NF, relational must be in 1NF.


o In the second normal form, all non-key attributes are fully functional dependent on the
primary key

Example: Let's assume, a school can store the data of teachers and the subjects they teach. In
a school, a teacher can teach more than one subject.

TEACHER table

TEACHER_ID SUBJECT TEACHER_AGE

25 Chemistry 30

25 Biology 30

47 English 35
83 Math 38

83 Computer 38

In the given table, non-prime attribute TEACHER_AGE is dependent on TEACHER_ID which


is a proper subset of a candidate key. That's why it violates the rule for 2NF.

To convert the given table into 2NF, we decompose it into two tables:

TEACHER_DETAIL table:

TEACHER_ID TEACHER_AGE
25 30

47 35
83 38

TEACHER_SUBJECT table:

TEACHER_ID SUBJECT

25 Chemistry

25 Biology
47 English

83 Math
83 Computer
Third Normal Form (3NF)

o A relation will be in 3NF if it is in 2NF and not contain any transitive partial
dependency.
o 3NF is used to reduce the data duplication. It is also used to achieve the data integrity.
o If there is no transitive dependency for non-prime attributes, then the relation must be
in third normal form.

A relation is in third normal form if it holds atleast one of the following conditions for every
non-trivial function dependency X → Y.

1. X is a super key.
2. Y is a prime attribute, i.e., each element of Y is part of some candidate key.

Example:

EMPLOYEE_DETAIL table:

EMP_ID EMP_NAME EMP_ZIP EMP_STATE EMP_CITY

222 Harry 201010 UP Noida


333 Stephan 02228 US Boston

444 Lan 60007 US Chicago

555 Katharine 06389 UK Norwich


666 John 462007 MP Bhopal
Super key in the table above:

1. {EMP_ID}, {EMP_ID, EMP_NAME}, {EMP_ID, EMP_NAME, EMP_ZIP}....so on

Candidate key: {EMP_ID}

Non-prime attributes: In the given table, all attributes except EMP_ID are non-prime.

Here, EMP_STATE & EMP_CITY dependent on EMP_ZIP and EMP_ZIP dependent


on EMP_ID. The non-prime attributes (EMP_STATE, EMP_CITY) transitively
dependent on super key(EMP_ID). It violates the rule of third normal form.

That's why we need to move the EMP_CITY and EMP_STATE to the new
<EMPLOYEE_ZIP> table, with EMP_ZIP as a Primary key.

EMPLOYEE table:

EMP_ID EMP_NAME EMP_ZIP

222 Harry 201010


333 Stephan 02228

444 Lan 60007

555 Katharine 06389


666 John 462007

EMPLOYEE_ZIP table:

EMP_ZIP EMP_STATE EMP_CITY

201010 UP Noida
02228 US Boston

60007 US Chicago
06389 UK Norwich
462007 MP Bhopal

Boyce Codd normal form (BCNF)

o BCNF is the advance version of 3NF. It is stricter than 3NF.


o A table is in BCNF if every functional dependency X → Y, X is the super key of the
table.
o For BCNF, the table should be in 3NF, and for every FD, LHS is super key.
Example: Let's assume there is a company where employees work in more than one
department.

EMPLOYEE table:

EMP_ID EMP_COUNTRY EMP_DEPT DEPT_TYPE EMP_DEPT_NO

264 India Designing D394 283


264 India Testing D394 300

364 UK Stores D283 232


364 UK Developing D283 549

In the above table Functional dependencies are as follows:

1. EMP_ID → EMP_COUNTRY
2. EMP_DEPT → {DEPT_TYPE, EMP_DEPT_NO}

Candidate key: {EMP-ID, EMP-DEPT}

The table is not in BCNF because neither EMP_DEPT nor EMP_ID alone are keys.

To convert the given table into BCNF, we decompose it into three tables:

EMP_COUNTRY table:

EMP_ID EMP_COUNTRY

264 India
264 India

EMP_DEPT table:

EMP_DEPT DEPT_TYPE EMP_DEPT_NO

Designing D394 283

Testing D394 300


Stores D283 232

Developing D283 549

EMP_DEPT_MAPPING table:
EMP_ID EMP_DEPT

D394 283

D394 300
D283 232

D283 549

Functional dependencies:

1. EMP_ID → EMP_COUNTRY
2. EMP_DEPT → {DEPT_TYPE, EMP_DEPT_NO}

Candidate keys:

For the first table: EMP_ID


For the second table: EMP_DEPT
For the third table: {EMP_ID, EMP_DEPT}

Now, this is in BCNF because left side part of both the functional dependencies is a key.

Query processing and Transaction management:

Introduction to Transaction Processing


Single user system :
In this at-most, only one user at a time can use the system.
Multi-user system :
In the same, many users can access the system concurrently.
Concurrency can be provided through :
1. Interleaved Processing –
In this, the concurrent execution of processes is interleaved in a single CPU. The
transactions are interleaved, meaning the second transaction is started before the primary
one could finish. And execution can switch between the transactions. It can also switch
between multiple transactions. This causes inconsistency in the system.
2. Parallel Processing –
It is defined as the processing in which a large task into various smaller tasks and smaller
task also executes concurrently on several nodes. In this, the processes are concurrently
executed in multiple CPUs.

Transaction

o The transaction is a set of logically related operation. It contains a group of tasks.


o A transaction is an action or series of actions. It is performed by a single user to perform
operations for accessing the contents of the database.
Example: Suppose an employee of bank transfers Rs 800 from X's account to Y's account.
This small transaction contains several low-level tasks:

X's Account

1. Open_Account(X)
2. Old_Balance = X.balance
3. New_Balance = Old_Balance - 800
4. X.balance = New_Balance
5. Close_Account(X)

Y's Account

1. Open_Account(Y)
2. Old_Balance = Y.balance
3. New_Balance = Old_Balance + 800
4. Y.balance = New_Balance
5. Close_Account(Y)

Operations of Transaction:

Following are the main operations of transaction:

Read(X): Read operation is used to read the value of X from the database and stores it in a
buffer in main memory.

Write(X): Write operation is used to write the value back to the database from the buffer.

Let's take an example to debit transaction from an account which consists of following
operations:

1. 1. R(X);
2. 2. X = X - 500;
3. 3. W(X);

Let's assume the value of X before starting of the transaction is 4000.

o The first operation reads X's value from database and stores it in a buffer.
o The second operation will decrease the value of X by 500. So buffer will contain 3500.
o The third operation will write the buffer's value to the database. So X's final value will
be 3500.

But it may be possible that because of the failure of hardware, software or power, etc. that
transaction may fail before finished all the operations in the set.
For example: If in the above transaction, the debit transaction fails after executing operation
2 then X's value will remain 4000 in the database which is not acceptable by the bank.

To solve this problem, we have two important operations:

Commit: It is used to save the work done permanently.

Rollback: It is used to undo the work done.

DBMS Concurrency Control

Concurrency Control is the management procedure that is required for controlling concurrent
execution of the operations that take place on a database.

But before knowing about concurrency control, we should know about concurrent execution.

Concurrent Execution in DBMS


o In a multi-user system, multiple users can access and use the same database at one time,
which is known as the concurrent execution of the database. It means that the same
database is executed simultaneously on a multi-user system by different users.
o While working on the database transactions, there occurs the requirement of using the
database by multiple users for performing different operations, and in that case,
concurrent execution of the database is performed.
o The thing is that the simultaneous execution that is performed should be done in an
interleaved manner, and no operation should affect the other executing operations, thus
maintaining the consistency of the database. Thus, on making the concurrent execution
of the transaction operations, there occur several challenging problems that need to be
solved.

Problems with Concurrent Execution

In a database transaction, the two main operations are READ and WRITE operations. So,
there is a need to manage these two operations in the concurrent execution of the transactions
as if these operations are not performed in an interleaved manner, and the data may become
inconsistent. So, the following problems occur with the Concurrent Execution of the
operations:

Problem 1: Lost Update Problems (W - W Conflict)

The problem occurs when two different database transactions perform the read/write
operations on the same database items in an interleaved manner (i.e., concurrent execution)
that makes the values of the items incorrect hence making the database inconsistent.

For example:

Consider the below diagram where two transactions TX and TY, are performed on the
same account A where the balance of account A is $300.
o At time t1, transaction TX reads the value of account A, i.e., $300 (only read).
o At time t2, transaction TX deducts $50 from account A that becomes $250 (only
deducted and not updated/write).
o Alternately, at time t3, transaction TY reads the value of account A that will be $300
only because TX didn't update the value yet.
o At time t4, transaction TY adds $100 to account A that becomes $400 (only added but
not updated/write).
o At time t6, transaction TX writes the value of account A that will be updated as $250
only, as TY didn't update the value yet.
o Similarly, at time t7, transaction TY writes the values of account A, so it will write as
done at time t4 that will be $400. It means the value written by T X is lost, i.e., $250 is
lost.

Hence data becomes incorrect, and database sets to inconsistent.

Dirty Read Problems (W-R Conflict)

The dirty read problem occurs when one transaction updates an item of the database, and
somehow the transaction fails, and before the data gets rollback, the updated database item is
accessed by another transaction. There comes the Read-Write Conflict between both
transactions.

For example:
Consider two transactions TX and TY in the below diagram performing read/write
operations on account A where the available balance in account A is $300:

o At time t1, transaction TX reads the value of account A, i.e., $300.


o At time t2, transaction TX adds $50 to account A that becomes $350.
o At time t3, transaction TX writes the updated value in account A, i.e., $350.
o Then at time t4, transaction TY reads account A that will be read as $350.
o Then at time t5, transaction TX rollbacks due to server problem, and the value changes
back to $300 (as initially).
o But the value for account A remains $350 for transaction TY as committed, which is the
dirty read and therefore known as the Dirty Read Problem.

Failure Classification

To find that where the problem has occurred, we generalize a failure into the following
categories:

1. Transaction failure
2. System crash
3. Disk failure

1. Transaction failure

The transaction failure occurs when it fails to execute or when it reaches a point from
where it can't go any further. If a few transaction or process is hurt, then this is called
as transaction failure.
Reasons for a transaction failure could be -

1. Logical errors: If a transaction cannot complete due to some code error or an


internal error condition, then the logical error occurs.
2. Syntax error: It occurs where the DBMS itself terminates an active transaction
because the database system is not able to execute it. For example, The system
aborts an active transaction, in case of deadlock or resource unavailability.

2. System Crash
3. System failure can occur due to power failure or other hardware or software
failure. Example: Operating system error.

Fail-stop assumption: In the system crash, non-volatile storage is assumed not


to be corrupted.

3. Disk Failure
4. It occurs where hard-disk drives or storage drives used to fail frequently. It was
a common problem in the early days of technology evolution.
5. Disk failure occurs due to the formation of bad sectors, disk head crash, and
unreachability to the disk or any other failure, which destroy all or part of disk
storage.

States of Transaction

In a database, the transaction can be in one of the following states -

Active state
o The active state is the first state of every transaction. In this state, the transaction is
being executed.
o For example: Insertion or deletion or updating a record is done here. But all the records
are still not saved to the database.

Partially committed
o In the partially committed state, a transaction executes its final operation, but the data
is still not saved to the database.
o In the total mark calculation example, a final display of the total marks step is executed
in this state.

Committed

A transaction is said to be in a committed state if it executes all its operations successfully. In


this state, all the effects are now permanently saved on the database system.

Failed state
o If any of the checks made by the database recovery system fails, then the transaction is
said to be in the failed state.
o In the example of total mark calculation, if the database is not able to fire a query to
fetch the marks, then the transaction will fail to execute.

Aborted
o If any of the checks fail and the transaction has reached a failed state then the database
recovery system will make sure that the database is in its previous consistent state. If
not then it will abort or roll back the transaction to bring the database into a consistent
state.
o If the transaction fails in the middle of the transaction then before executing the
transaction, all the executed transactions are rolled back to its consistent state.
o After aborting the transaction, the database recovery module will select one of the two
operations:
1. Re-start the transaction
2. Kill the transaction

ACID Properties in DBMS

DBMS is the management of data that should remain integrated when any changes are done in
it. It is because if the integrity of the data is affected, whole data will get disturbed and
corrupted. Therefore, to maintain the integrity of the data, there are four properties described
in the database management system, which are known as the ACID properties. The ACID
properties are meant for the transaction that goes through a different group of tasks, and there
we come to see the role of the ACID properties.

In this section, we will learn and understand about the ACID properties. We will learn what
these properties stand for and what does each property is used for. We will also understand the
ACID properties with the help of some examples.
ACID Properties

The expansion of the term ACID defines for:

1) Atomicity: The term atomicity defines that the data remains atomic. It means if any
operation is performed on the data, either it should be performed or executed completely or
should not be executed at all. It further means that the operation should not break in between
or execute partially. In the case of executing operations on the transaction, the operation should
be completely executed and not partially.

Example: If Remo has account A having $30 in his account from which he wishes to send $10
to Sheero's account, which is B. In account B, a sum of $ 100 is already present. When $10
will be transferred to account B, the sum will become $110. Now, there will be two operations
that will take place. One is the amount of $10 that Remo wants to transfer will be debited from
his account A, and the same amount will get credited to account B, i.e., into Sheero's account.
Now, what happens - the first operation of debit executes successfully, but the credit operation,
however, fails. Thus, in Remo's account A, the value becomes $20, and to that of Sheero's
account, it remains $100 as it was previously present.
In the above diagram, it can be seen that after crediting $10, the amount is still $100 in account
B. So, it is not an atomic transaction.

The below image shows that both debit and credit operations are done successfully. Thus the
transaction is atomic.

Thus, when the amount loses atomicity, then in the bank systems, this becomes a huge issue,
and so the atomicity is the main focus in the bank systems.

2) Consistency: The word consistency means that the value should remain preserved always.
In DBMS, the integrity of the data should be maintained, which means if a change in the
database is made, it should remain preserved always. In the case of transactions, the integrity
of the data is very essential so that the database remains consistent before and after the
transaction. The data should always be correct.
Example:

In the above figure, there are three accounts, A, B, and C, where A is making a transaction T
one by one to both B & C. There are two operations that take place, i.e., Debit and Credit.
Account A firstly debits $50 to account B, and the amount in account A is read $300 by B
before the transaction. After the successful transaction T, the available amount in B becomes
$150. Now, A debits $20 to account C, and that time, the value read by C is $250 (that is correct
as a debit of $50 has been successfully done to B). The debit and credit operation from account
A to C has been done successfully. We can see that the transaction is done successfully, and
the value is also read correctly. Thus, the data is consistent. In case the value read by B and C
is $300, which means that data is inconsistent because when the debit operation executes, it
will not be consistent.

4) Isolation: The term 'isolation' means separation. In DBMS, Isolation is the property of a
database where no data should affect the other one and may occur concurrently. In short, the
operation on one database should begin when the operation on the first database gets complete.
It means if two operations are being performed on two different databases, they may not affect
the value of one another. In the case of transactions, when two or more transactions occur
simultaneously, the consistency should remain maintained. Any changes that occur in any
particular transaction will not be seen by other transactions until the change is not committed
in the memory.

Example: If two operations are concurrently running on two different accounts, then the value
of both accounts should not get affected. The value should remain persistent. As you can see
in the below diagram, account A is making T1 and T2 transactions to account B and C, but
both are executing independently without affecting each other. It is known as Isolation.
4) Durability: Durability ensures the permanency of something. In DBMS, the term durability
ensures that the data after the successful execution of the operation becomes permanent in the
database. The durability of the data should be so perfect that even if the system fails or leads
to a crash, the database still survives. However, if gets lost, it becomes the responsibility of the
recovery manager for ensuring the durability of the database. For committing the values, the
COMMIT command must be used every time we make changes.

Therefore, the ACID property of DBMS plays a vital role in maintaining the consistency and
availability of data in the database.

Thus, it was a precise introduction of ACID properties in DBMS. We have discussed these
properties in the transaction section also.

You might also like