0% found this document useful (0 votes)
43 views

Unit II Database Design

Uploaded by

Vishwa
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
43 views

Unit II Database Design

Uploaded by

Vishwa
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 26

UNIT II

DATABASE DESIGN
Entity-Relationship model – E-R Diagrams – Enhanced-ER Model – ER-to-Relational Mapping –

Functional Dependencies – Non-loss Decomposition – First, Second, Third Normal Forms,

Dependency Preservation – Boyce/Codd Normal Form – Multi-valued Dependencies and Fourth

Normal Form – Join Dependencies and Fifth Normal Form

ENTITY RELATIONSHIP MODEL (ER MODEL)

The E-R data model considers the real world consisting of a set of basic objects, called
entities, and relationships among these objects.
The E-R data model employs three basic notions:
1. Entity sets
2. Relationship sets
3. Attributes
2 Entity Sets

An entity is thing or object in the real world that is distinguishable from all other
objects. For example, each person is an entity.

An entity has a set of properties, and the values for some set of properties may uniquely
identify an entity.

For example, a customer with customer-id property with value C101 uniquely identifies that
person. An entity may be concrete, such as person or a book, or it may be abstract, such as a
loan, or a holiday.
An entity set is a set of entities of the same type that share the same properties, or attributes.
For example all persons who are customers at a given bank can be defined as entity set
customer. The properties that describe an entity are called attributes.

2. Relationships and Relationships sets


Relationship is an association among several entities.
Relationship set is a set of relationships of the same type.
The association between entity set is referred to as participation. That is, the entity sets E1,
E2, . ..,En participate in relationship set R.

Recursive relationship set: Same entity set participating in a relationship more than once
in a different role is called Recursive relationship set.
The attributes of entities in Recursive relationship set is called descriptive attributes.
Types of relationships
i) Unary relationship: A unary relationship exists when an association is maintained within a
single entity.

ii) Binary relationship: A binary relationship exists when two entities are associated.

Publisher Publishes Book

iii) Ternary relationship: A ternary relationship exists when there are three entities associated.

Teacher Teaches Subject

Student

iv) Quaternary relationship: A quaternary relationship exists when there are four entities
associated.

Teacher

Student Studies Course material

Subject

The number of entity set participating in a relationship is called degree of the relationship
set.
Binary relationship set is of degree 2; a tertiary relationship set is of degree 3.

Entity role: The function that an entity plays in a relationship is called that entity‘s role. A role is
one end of an association.

Person Works- Company


for
Employee Employee

Here Entity role is Employee

3. Attributes
The property that describes an entity is called attributes.
• The attributes of customer entity set are customer_id, customer_name and city.
• Each attributes has a set of permitted values called the domain or value set.
• Each entity will have value for its attributes.
Example:
 Customer Name John
 Customer Id 321

Attributes are classified as


• Simple
• Composite
• Single- valued
• Multi-valued
• Derived

1) Simple attribute:
This type of attributes cannot be divided into sub parts.
Example: Age, sex, GPA
2) Composite attribute:
This type of attributes Can be subdivided.
Example: Address: street, city, state, zip
3) Single-valued attribute:
This type of attributes can have only a single value
Example: Social security number
4) Multi-valued attribute:
Multi-valued attribute Can have many values.
Example: Person may have several college degrees, phone numbers
5) Derived attribute:
Derived attribute Can be calculated or derived from other related attributes or entities.
Example: Age can be derived from D.O.B.
6) Stored attributes:
• The attributes stored in a data base are called stored attributes.
• An attribute takes a null value when an entity does not have a value for it.

• Null values indicate the value for the particular attribute does not exists or
unknown. E.g. : 1. Middle name may not be present for a person (non
existence case)
2. Apartment number may be missing or unknown.
CONSTRAINTS
An E-R enterprise schema may define certain constraints to which the contents of a database
system must conform.
Three types of constraints are
• Mapping cardinalities
• Key constraints
• Participation constraints

1. Mapping cardinalities

Mapping cardinalities express the  number of entities to which another entity can be

associated via a relationship set. 
Cardinality in E-R diagram that is represented by two ways:
Directed line ( ) Undirected line ( )

There are 4 categories of cardinality.


i) One to one: An entity in A is associated with at most one entity in B, and an entity in B is
associatedwith at most one entity in A.

Example: A customer with single account at given branch is shown by one-


to-one relationship as given below

Customer Depositor Account


ii) One-to-many: An entity in A is associated with any number of entities (zero or more) in
B. An entity in B, however, can be associated with at most one entity in A.

Example: A customer having two accounts at a given branch is shown by one-to-


many relationship as given below.

Customer Depositor Account

iii) Many-to-one: An entity in A is associated with at most one entity in B. An entity in B,


however, can be associated with any number (zero or more) of entities in A.

Example: Many employees works for a company. This relationship is shown by many-to-one
as
given below.

Employees Works-for Company

iv) Many-to-many: An entity in A is associated with any number (zero or more) of entities in B,
and an entity in B is associated with any number (zero or more) of entities in A.
Example: Employee works on number of projects and project is handled by number of employees.
Therefore, the relationship between employee and project is many-to-many as shown below.

Works- Project
Employee
on
2. Keys

A key allows us to identify a set of attributes and thus distinguishes entities from each
other.

Keys also help uniquely identify relationships, and thus distinguish relationships from
each other.

Key Type Definition

Any attribute or combination of attributes that uniquely identifies a row in


Superkey the table.

Example: Roll_No attribute of the entity set ‗student‘ distinguishes one


student entity from another. Customer_name, Customer_id together is a
Super key
Minimal Superkey. A superkey that does not contain a subset of attributes
Candiate that is itself a superkey.
Key

Example: Student_name and Student_street,are sufficient to uniquely


identify one particular student.

Primary The candidate key selected to uniquely identify all rows. It should be
Key rarely changed and cannot contain null values.

Example: Roll_No is a primary key


An attribute (or combination of attributes) in one table that must either
Foreign match the primary key of another table or be null
Key

Example: Consider in the staff relation the branch_no attribute exists to


match staff to the branch office they work in. In the staff relation,
branch_no is foreign key.

Secondary An attribute or combination of attributes used to make data retrieval more


Key
efficient.

3. Participation Constraint

 Participation can be divided into two types.


 1. Total 2. Partial



If every entity in the entity set E participates in at least one relationship in R. Then
 participation is called Total Participation


If only some entities in the entity set E participate in relationships in R. Then the
participation is called Partial Participation.

ENTITY-RELATIONSHIP(E-R) DIAGRAMS

 

E-R diagram can express the overall logical structure of a database graphically.

E-R diagram consists of the following major components: 

E-R diagram with composite, multivalued, and derived attributes.


Double lines are used in an E-R diagram to indicate that the participation of an entity
set in a relationship set is total; that is, each entity in the entity set occurs in at least one
relationship in that relationship set.
The number of time an entity participates in a relationship can be specified using complex
cardinalities.

 An edge between an entity set and binary relationship set can have an associated minimum
and maximum cardinality assigned in the form of l..h.
 l - Minimum cardinality
 h - Maximum cardinality

 A minimum value of 1 indicates total participation of the entity set in the


relationship set. A maximum value of 1 indicates that the entity participates in at
most one relationship. A maximum value * indicates no limit.
 A label 1... on an edge is equivalent to a double line.

Strong and Weak entity sets

 An entity set may not have sufficient attributes to form a primary key. Such an entity set is
termed a
 weak entity set.
 An entity set that has a primary key is termed a strong entity set.

 Weak entity set is associated with another entity set called the identifying or owner entity
set. ie, weak entity set is said to be existence dependent on the identifying entity set.
 Identifying entity set is said to own the weak entity set.
 The relationship among the weak and identifying entity set is called the identifying
relationship.

Discriminator in a weak entity set is a set of attributes that distinguishes the different entities
among the weak entity also called as partial key.

Extended E-R Features


ER model that is supported with the additional semantic concepts is called the extended entity
relationship model or EER model.
EER model deals with
1. Specialization
2. Generalization
3. Aggregation

1. Specialization:

The process of designating sub groupings within an entity set is called


Specialization
Specialization is a top-down process.
 Consider an entity set person. A person may be further classified as one of the following:
 Customer
 Employee

 All people has a set of attributes in common with some additional


attributes. Specialization is depicted by a triangle component
labeled ISA.
 The label ISA stands for ―is a for example, that a customer ―is a person.
 The ISA relationship may also be referred to as a super class-subclass relationship.
2. Generalization:
 Generalization is a simple inversion of specialization.
 Generalization is the process of defining a more general entity type from a set of more
specialized entity types.
Generalization is a bottom-up approach.
 Generalization results in the identification of a generalized super class from the original
subclasses

 Person is the higher-level entity set


 Customer and employee are lower-level entity sets.
 The person entity set is the superclass of the customer and employee subclasses.
Attribute Inheritance

 A property of the higher- and lower-level entities created by specialization and generalization
is attribute inheritance.

 The attributes of the higher-level entity sets are said to be inherited by the lower-level entity
sets.

 For example, customer and employee inherit the attributes of person. The outcome of
attribute inheritance is
o A higher-level entity set with attributes and relationships that apply to all of its lower-
level entity sets.
o Lower-level entity sets with distinctive features that apply only within a particular
lower-level entity set
 If an entity set is involved as a lower-level entity set in only one ISA relationship, then the
entity set has single inheritance
 If an entity set is involved as a lower-level entity set in more than one ISA relationship, then
the entity set has multiple inheritance and the resulting structure is said to be a lattice.
Constraints on Generalizations
1. One type of constraint determining which entities can be members of a lower-level entity set.
Such membership may be one of the following:
• Condition-defined. In condition-defined the members of lower-level entity set is
evaluated on the basis of whether or not an entity satisfies an explicit condition.
• User-defined. User defined constraints are defined by user.
2. A second type of constraint relates to whether or not entities may belong to more than one
lower-level entity set within a single generalization. The lower-level entity sets may be one of
the following:
• Disjoint. A disjointness constraint requires that an entity belong to no more than one
lower-level entity set.
• Overlapping. Same entity may belong to more than one lower-level entity set within a
single generalization.
3. A final constraint, the completeness constraint specifies whether or not an entity in the
higher-level entity set must belong to at least one of the lower-level entity sets .This
constraint may be one of the following:
• Total generalization or specialization. Each higher-level entity must belong to a
lower-level entity set. It is represented by double line.
• Partial generalization or specialization. Some higher-level entities may not belong to
any lower-level entity set.
3. Aggregation
 One limitation of the E-R model is that it cannot express relationships among relationships.
 Consider the ternary relationship works-on, between a employee, branch, and job. Now,
suppose we want to record managers for tasks performed by an employee at a branch. There
another entity set manager is created.
 The best way to model such a situation is to use aggregation.
 Aggregation is an abstraction through which relationships are treated as higherlevel entities.
 In our example works-on act as high level entity.
Functional Dependency
• In a given relation R, X and Y are attributes. Attribute Y is functionally dependent on
attribute X if each value of X determines EXACTLY ONE value of Y, which is
represented as X → Y (X can be composite in nature).
• We say here “x determines y” or “y is functionally dependent on x”
• X→Y does not imply Y→X
• In EMP relation ,If the value of “Empid” is known then the value of “Ename” can be
determined since Empid → Ename

Types of functional dependencies:


 Full Functional dependency.
 Partial Functional dependency.
 Transitive dependency.
Uses for Functional Dependency
• To determine if a relation is in a Normal Form.
• To determine if a decomposition would cause data loss etc,.
Full Functional dependency
• X and Y are attributes.
• X Functionally determines Y
• Note: Subset of X should not functionally determine Y
Consider the following Relation:
 REPORT (STUDENT#,COURSE#, Coursename,Marks,Grade)
 STUDENT# ,COURSE#  Marks

Partial dependencies
• X and Y are attributes.
• Attribute Y is partially dependent on the attribute X only if it is dependent on a sub-set of
attribute X.
Consider the following Relation:
• REPORT (STUDENT#,COURSE#, Coursename,Marks,Grade)
• COURSE#  CourseName.

Transitive dependencies:
• X , Y and Z are three attributes.
• XY
• YZ
=> X  Z
Example:
• STUDENT# COURSE#  Marks
• Marks  Grade
• STUDENT# COURSE#  Grade

Functional Dependency Theory: Closure


Definition:
– The closure F+ of a set of functional dependencies defines all the functional
dependencies that can be derived from the given set of functional dependencies F.
– If F is a set of FD’s on a relational schema R then F+ is the closure of R.
– Closure and Non redundant set of FD’s must be known for a good DBMS design.

Armstrong's Axioms
 Reflexive rule − If alpha is a set of attributes and beta is_subset_of alpha, then alpha holds
beta.
 Augmentation rule − If a → b holds and y is attribute set, then ay → by also holds. That is
adding attributes in dependencies, does not change the basic dependencies.
 Transitivity rule − Same as transitive rule in algebra, if a → b holds and b → c holds, then a
→ c also holds. a → b is called as a functionally that determines b.
Trivial Functional Dependency
 Trivial − If a functional dependency (FD) X → Y holds, where Y is a subset of X, then it is
called a trivial FD. Trivial FDs always hold.
 Non-trivial − If an FD X → Y holds, where Y is not a subset of X, then it is called a non-
trivial FD.
 Completely non-trivial − If an FD X → Y holds, where x intersect Y = Φ, it is said to be a
completely non-trivial FD.

Decomposition
• Decomposition: separating a relation into two or more by projection
• Join: (re)assembling two relations
• If exactly the original rows are reproduced by joining the relations, then the decomposition
was lossless
• If it can’t, the decomposition was lossy
ANOMALIES

If a database design is not perfect, it may contain anomalies, which are like a bad dream for any
database administrator. Managing a database with anomalies is next to impossible.

 Update anomalies − If data items are scattered and are not linked to each other properly,
then it could lead to strange situations. For example, when we try to update one data item
having its copies scattered over several places, a few instances get updated properly while a
few others are left with old values. Such instances leave the database in an inconsistent state.
 Deletion anomalies − We tried to delete a record, but parts of it was left undeleted because
of unawareness, the data is also saved somewhere else.
 Insert anomalies − We tried to insert data in a record that does not exist at all.

Normalization is a method to remove all these anomalies and bring the database to a consistent state.

Normalization

Normalization is a process of designing a consistent Database by minimizing redundancy and


ensuring Data Integrity through the principle of Non-loss decomposition.

Why Normalization?

In order to produce good database design, we should ask questions like:

a. Does the design ensure that all database operations will be efficiently performed and that the
design does not make the DBMS perform expensive consistency checks, which could be avoided?

b. Is the information unnecessarily replicated?


Database normalization:

Ensures Data Integrity

Now, let us see what is Data Integrity.

Data integrity ensures the correctness of data stored within the database. It is achieved by
imposing integrity constraints.
An integrity constraint is a rule, which restricts values present in the database.

There are three integrity constraints:

♦ Entity constraints:

The entity integrity rule states that the value of the primary key can never be a null value (a null
value is one that has no value and is not the same as a blank). Because a primary key is used to
identify a unique row in a relational table, its value must always be specified and should never be
unknown. The integrity rule requires that insert, update and delete operations maintain the
uniqueness and existence of all primary keys.

♦ Domain Constraints:
Only permissible values of an attribute are allowed in a relation.

♦ Referential Integrity constraints:


The referential integrity rule states that if a relational table has a foreign key, then every value of
the foreign key must either be null or match the values in the relational table in which that foreign
key is a primary key.

Prevents Redundancy in data

A non-normalized database is vulnerable to data anomalies, if it stores data redundantly. If


data is stored in two locations, but later updated in only one of the locations, then the data is
inconsistent; this is referred to as an "update anomaly". A normalized database stores non-
primary key data in only one location.

Redundancy can be:

♦ Direct Redundancy:
Direct redundancy can result due to the presence of same data in two different locations,
thereby, leading to anomalies such as reading, writing, updating and deleting.

♦ Indirect redundancy:
Indirect Redundancy results due to storing information that can be computed
from the other data items stored within the database.

Normalized databases have a design that reflects the true dependencies between tracked quantities,
allowing quick updates to data with little risk of introducing inconsistencies. There are formal
methods for quantifying "how normalized" a relational database is, and these classifications are
called Normal Forms (or NF).
What is a Normal Form?
Forms are designed to logically address potential problems such as inconsistencies and redundancy in
information stored in the database.

A database is said to be in one of the Normal Forms, if it satisfies the rules required by that Form as well as
previous; it also will not suffer from any of the problems addressed by the Form.

Types of Normal Forms


Several normal forms have been identified, the most important and widely used of which are:

 First normal form (1NF)


 Second normal form (2NF)
 Third normal form (3NF)
 Boyce-Codd normal form (BCNF)
 Fourth normal form (4NF)
 Fifth Normal Form (5NF)

A form is said to be in its particular form only if it satisfies the previous Normal form.

First Normal Form (1NF)

A Relation is in 1NF, if every row contains exactly one value for each attribute.

Let us understand this with an example.

Consider a table ‘Faculty’ which has information about the faculty, subjects and, the number of hours allotted
to each subject they teach.

Faculty:

Faculty code Faculty Name Date of Birth Subject Hours


100 Smith 17/07/64 Java 16
PL/SQL 8
Linux 8
101 Jones 24/12/72 Java 16
Forms 8
Reports 12
102 Fred 03/02/80 SQL 10
Linux 8
Java 16
103 Robert 28/11/66 SQL 10
PL/SQL 8
Forms 8
Anomalies: -

The above table does not have any atomic values in the ‘Subject’ column. Hence, it is called un-normalized
table. Inserting, Updating and deletion would be a problem is such table.

Hence it has to be normalized.

For the above table to be in first normal form, each row should have atomic values. Hence let us re-construct
the data in the table. A ‘S.No’ column is included in the table to uniquely identity each row.

NO Faculty Faculty Date of Subject Hours


code Name Birth
1 100 Smith 17/07/64 Java 16
2 100 Smith 17/07/64 PL/SQL 8
3 100 Smith 17/07/64 Linux 8
4 101 Jones 24/12/72 Java 16
5 101 Jones 24/12/72 Forms 8
6 101 Jones 24/12/72 Reports 12
7 102 Fred 03/02/80 SQL 10
8 102 Fred 03/02/80 Linux 8
9 102 Fred 03/02/80 Java 16
10 103 Robert 28/11/66 SQL 10
11 103 Robert 28/11/66 PL/SQL 8
12 103 Robert 28/11/66 Forms 8

This table shows the same data as the previous table but we have eliminated the repeating groups.
Hence the table is now said to be in First Normal form (1NF). But we have introduced Redundancy into the
table now. This can be eliminated using Second Normal Form (2NF).

Second Normal Form (2NF)


A relation is in 2NF, if it is in 1NF and every non-key attribute is fully functionally dependent on the primary
key of the relation.

2NF prohibits partial dependencies.

The steps for converting a database to 2NF are as follows:

 Find and remove attributes that are related to only a part of the key.
 Group the removed items in another table.
 Assign the new table a key that consists of that part of the old composite key.

If a relation is not in 2NF, it can be further normalized into a number of 2NF relations.
Let us consider the table we obtained after first normalization.

SNO Faculty Faculty Date of Subject Hours


code Name Birth
1 100 Smith 17/07/64 Java 16
2 100 Smith 17/07/64 PL/SQL 8
3 100 Smith 17/07/64 Linux 8
4 101 Jones 24/12/72 Java 16
5 101 Jones 24/12/72 Forms 8
6 101 Jones 24/12/72 Reports 12
7 102 Fred 03/02/80 SQL 10
8 102 Fred 03/02/80 Linux 8
9 102 Fred 03/02/80 Java 16
10 103 Robert 28/11/66 SQL 10
11 103 Robert 28/11/66 PL/SQL 8
12 103 Robert 28/11/66 Forms 8

While eliminating the repeating groups, we have introduced redundancy into table. Faculty Code, Name and
date of Birth are repeated since the same faculty is multi skilled.
To eliminate this, let us split the table into 2 parts; one with the non-repeating groups and the other for
repeating groups.

Faculty:
Faculty code Faculty Name Date of Birth
100 Smith 17/07/64
101 Jones 24/12/72
102 Fred 03/02/80
103 Robert 28/11/66

Faculty_code Faculty_name, Date_of_Birth

The other table is those with repeating groups.


Subject:
SNO Faculty code Subject Hours
1 100 Java 16
2 100 PL/SQL 8
3 100 Linux 8
4 101 Java 16
5 101 Forms 8
6 101 Reports 12
7 102 SQL 10
8 102 Linux 8
9 102 Java 16
10 103 SQL 10
11 103 PL/SQL 8
12 103 Forms 8

Faculty Code is the only key to identify the faculty name and the date of birth.

Hence, Faculty code is the primary key in the first table and foreign key in the second table.

Faculty code is repeated in the Subject table. Hence, we have to take into account the ‘SNO’ to form a
composite key in Subject table. Now, SNO +Faculty code can unique identity each row in this table.
Anomalies in 2nd NF:
The situation could lead to the following problems:

• Insertion: Inserting the records of various Faculty teaching same subject would result the redundancy
of hours information.
• Updation: For a subject, the number of hours allotted to a subject is repeated several times. Hence, if
the number of hours has to be changed, this change will have to be recorded in every instance of that
subject. Any omissions will lead to inconsistencies.
• Deletion: If a faculty leaves the organization, information regarding hours allotted to the subject is lost.

This Subject table should therefore be further decomposed without any loss of information as:

SNO Faculty code Subject

Subject Hours
Transitive Dependency
Transitive dependencies arise:

• When one non-key attribute is functionally dependent on another non-key attribute.


• FD: non-key attribute -> non-key attribute
• And when there is redundancy in database.

Third Normal Form


A relation is in 3NF, if it is in 2NF and no non-key attribute of the relation is transitively dependent on the
primary key.

3NF prohibits transitive dependencies.

In order to remove the anomalies that arose in Second Normal Form and to remove transitive dependencies, if
any, we have to perform third normalization.
Now let us see how to normalize the second table obtained after 2NF.

Subject:
SNO Faculty code Subject Hours

1 100 Java 16
2 100 PL/SQL 8
3 100 Linux 8
4 101 Java 16
5 101 Forms 8
6 101 Reports 12
7 102 SQL 10
8 102 Linux 8
9 102 Java 16
10 103 SQL 10
11 103 PL/SQL 8
12 103 Forms 8

In this table, hours depend on the subject and subject depends on the Faculty code and SNO. But, hours
is neither dependent on the faculty code nor the SNO. Hence, there exits a transitive dependency between
SNO, Subject and Hours.
If a faculty code is deleted, due to transitive dependency, information regarding the subject and hours
allotted to it will be lost.

For a table to be in 3rd Normal form, transitive dependencies must be eliminated.

So, we need to decompose the table further to normalize it.

Fac_Sub:
SNO Faculty code Subject

1 100 Java
2 100 PL/SQL
3 100 Linux
4 101 Java
5 101 Forms
6 101 Reports
7 102 SQL
8 102 Linux
9 102 Java
10 103 SQL
11 103 PL/SQL
12 103 Forms

Sub_Hrs:
Subject Hours

Java 16
PL/SQL 8
Linux 8
Forms 8
Reports 12
SQL 10

After decomposing the ‘Subject’ table we now have ‘Fac_Sub’ and ‘Sub_Hrs’ table respectively. By doing
so, the following anomalies are addressed in the table.

Insertion: - No redundancy of data for subject and hours while inserting the records.

Updation: - Subject and hours are stored in the separate table. So updation becomes much easier as there is no
repetitiveness of data.

Deletion: - Even if the faculty leaves the organization, the hours allotted to a particular subject can be
still retrieved from the Sub_Hrs table.

Boyce–Codd Normal Form (BCNF)

The intention of Boyce-Codd Normal Form (BCNF) is that - 3NF does not satisfactorily handle the case of a
relation processing two or more composite or overlapping candidate keys.

A relation R is said to be in BCNF, if and only if every determinant is a candidate key.


In most cases, third normal form is the sufficient level of decomposition. But some case requires the design to
be further formalized upto the level of 4th as well as 5th. These are based on the concept of MultiValued
Dependency. Let us have a idea about it now.

Multivalued Dependency:
Multivalued dependency defined by X Y is said to hold for a relation R(X,Y,Z) if for a given set of values
for X, there is a set of associated values for set of values of attribute Y, and X values depend only on X values
and have no dependence on the set of attributes Z.
Fourth Normal Form (4NF)
A relation is said to be in fourth normal form if each table contains no more than one multi-valued dependency
per key attribute.

Seminar Faculty Topic

DBP-1 Brown Database Principles

DAT-2 Brown Database Advanced Techniques

DBP-1 Brown Data Modeling Techniques

DBP-1 Robert Database Principles

DBP-1 Robert Data Modeling Techniques

DAT-2 Maria Database Advanced Techniques

In the above example, same topic is being taught in a seminar by more than 1 faculty. And Each Faculty takes
up different topics in the same seminar. Hence, Topic names are being repeated several times. This is an
example of multivalued dependency. For a table to be in fourth Normal Form, multivalued dependency must
be avoided.

To eliminate multivalued dependency, split the table such that there is no multivalued
dependency.
Seminar Topic

Seminar Faculty DBP-1 Database Principles

DBP-1 Brown DAT-2 Database Advanced

Techniques
DAT-2 Brown

DBP-1 Data Modeling Techniques


DBP-1 Robert

DAT-2 Maria
Fifth Normal Form
A relation is said to be in 5NF if and only if it is in 4NF and every join dependency in it is implied by the
candidate keys.
Fifth normal form deals with cases where information can be reconstructed from smaller pieces of information
that can be maintained with less redundancy. It emphasizes on lossless decomposition.
Consider the following example:
Faculty Seminar Location

Brown DBP-1 New York


Brown DAT-2 Chicago
Robert DBP-1 Chicago

If we were to add the seminar DAT-2 to New York, we would have to add a line to the table for each instructor
located in New York.

The table would look like as shown below adding the above information:

Faculty Seminar Location

Brown DBP-1 New York


Brown DAT-2 Chicago
Robert DBP-1 Chicago
Brown DAT-2 New York
Robert DAT-2 New York

From the above table, we observe that there is a redundancy of data stored for Brown’s information. So to
eliminate this redundancy, we have to do a ‘Non-Loss decomposition’ of the table.
Consider the following decomposition of the above table into fifth normal form:

Faculty Seminar

Brown DBP-1
Brown DAT-2
Robert DBP-1
Robert DAT-2

Seminar Location

DBP-1 New York


DAT-2 Chicago
DBP-1 Chicago
DAT-2 New York

Faculty Location

Brown New York


Brown Chicago
Robert Chicago
Robert New York
Generally, table is in fifth normal form when its information content cannot be reconstructed from several
smaller tables, i.e., from tables having fewer fields than the original table, each table having different keys.
In the normalized form, the fact that ‘Brown’ traveling to ‘New York’ is recorded only once, whereas, in the
unnormalized form it may be repeated many times.

An attempt has been made to explain Normal forms in a simple yet understandable manner.

Some redundancies are unavoidable. One should take care while normalizing a table so that data integrity is not
compromised for removing redundancies.

Domain/key normal form (DKNF)

Domain/key normal form (DKNF) is a normal form used in database normalization which requires
that the database contains no constraints other than domain constraints and key constraints.

A domain constraint specifies the permissible values for a given attribute, while a key constraint specifies the
attributes that uniquely identify a row in a given table.

The domain/key normal form is achieved when every constraint on the relation is a logical consequence of the
definition of keys and domains, and enforcing key and domain restraints and conditions causes all constraints to
be met. Thus, it avoids all non-temporal anomalies.

The reason to use domain/key normal form is to avoid having general constraints in the database that are not
clear domain or key constraints. Most databases can easily test domain and key constraints on attributes.
General constraints however would normally require special database programming in the form of stored
procedures that are expensive to maintain and expensive for the database to execute. Therefore general
constraints are split into domain and key constraints.

It's much easier to build a database in domain/key normal form than it is to convert lesser databases which may
contain numerous anomalies. However, successfully building a domain/key normal form database remains a
difficult task, even for experienced database programmers. Thus, while the domain/key normal form eliminates
the problems found in most databases, it tends to be the most costly normal form to achieve. However, failing to
achieve the domain/key normal form may carry long-term, hidden costs due to anomalies which appear in
databases adhering only to lower normal forms over time.

The third normal form, Boyce–Codd normal form, fourth normal form and fifth normal form are special cases
of the domain/key normal form. All have either functional, multi-valued or join dependencies that can be
converted into (super)keys. The domains on those normal forms were unconstrained so all domain constraints
are satisfied. However, transforming a higher normal form into domain/key normal form is not always a
dependency-preserving transformation and therefore not always possible

Denormalization
Denormalization is the process of storing the join of higher normal form relations as a base relation,
which is in a lower normal form.

In computing, denormalization is the process of attempting to optimize the read performance of


a database by adding redundant data or by grouping data. In some cases, denormalization is a means of
addressing performance or scalability in relational database software.
A normalized design will often store different but related pieces of information in separate logical tables (called
relations). If these relations are stored physically as separate disk files, completing a database query that draws
information from several relations (a join operation) can be slow. If many relations are joined, it may be
prohibitively slow. There are two strategies for dealing with this. The preferred method is to keep the logical
design normalized, but allow the database management system (DBMS) to store additional redundant
information on disk to optimise query response. In this case it is the DBMS software's responsibility to ensure
that any redundant copies are kept consistent. This method is often implemented in SQL as indexed views
(Microsoft SQL Server) or materialised views (Oracle, PostgreSQL). A view represents information in a format
convenient for querying, and the index ensures that queries against the view are optimised.
The more common approach is to denormalize the logical data design. With care this can achieve a similar
improvement in query response, but at a cost—it is now the database designer's responsibility to ensure that the
denormalized database does not become inconsistent. This is done by creating rules in the database
called constraints, that specify how the redundant copies of information must be kept synchronised. It is the
increase in logical complexity of the database design and the added complexity of the additional constraints that
make this approach hazardous. Moreover, constraints introduce a trade-off, speeding up reads ( SELECT in

SQL) while slowing down writes ( INSERT , UPDATE , and DELETE ). This means a denormalized database

under heavy write load may actually offer worse performance than its functionally equivalent normalized
counterpart.
A denormalized data model is not the same as a data model that has not been normalized, and denormalization
should only take place after a satisfactory level of normalization has taken place and that any required
constraints and/or rules have been created to deal with the inherent anomalies in the design. For example, all the
relations are in third normal form and any relations with join and multi-valued dependencies are handled
appropriately.
Examples of denormalization techniques include:
 Materialised views, which may implement the following:
 Storing the count of the "many" objects in a one-to-many relationship as an attribute of the "one"
relation
 Adding attributes to a relation from another relation with which it will be joined
 Star schemas, which are also known as fact-dimension models and have been extended to snowflake
schemas
 Prebuilt summarisation or OLAP cubes
Denormalization techniques are often used to improve the scalability of Web applications.

You might also like