Unit II Database Design
Unit II Database Design
DATABASE DESIGN
Entity-Relationship model – E-R Diagrams – Enhanced-ER Model – ER-to-Relational Mapping –
The E-R data model considers the real world consisting of a set of basic objects, called
entities, and relationships among these objects.
The E-R data model employs three basic notions:
1. Entity sets
2. Relationship sets
3. Attributes
2 Entity Sets
An entity is thing or object in the real world that is distinguishable from all other
objects. For example, each person is an entity.
An entity has a set of properties, and the values for some set of properties may uniquely
identify an entity.
For example, a customer with customer-id property with value C101 uniquely identifies that
person. An entity may be concrete, such as person or a book, or it may be abstract, such as a
loan, or a holiday.
An entity set is a set of entities of the same type that share the same properties, or attributes.
For example all persons who are customers at a given bank can be defined as entity set
customer. The properties that describe an entity are called attributes.
Recursive relationship set: Same entity set participating in a relationship more than once
in a different role is called Recursive relationship set.
The attributes of entities in Recursive relationship set is called descriptive attributes.
Types of relationships
i) Unary relationship: A unary relationship exists when an association is maintained within a
single entity.
ii) Binary relationship: A binary relationship exists when two entities are associated.
iii) Ternary relationship: A ternary relationship exists when there are three entities associated.
Student
iv) Quaternary relationship: A quaternary relationship exists when there are four entities
associated.
Teacher
Subject
The number of entity set participating in a relationship is called degree of the relationship
set.
Binary relationship set is of degree 2; a tertiary relationship set is of degree 3.
Entity role: The function that an entity plays in a relationship is called that entity‘s role. A role is
one end of an association.
3. Attributes
The property that describes an entity is called attributes.
• The attributes of customer entity set are customer_id, customer_name and city.
• Each attributes has a set of permitted values called the domain or value set.
• Each entity will have value for its attributes.
Example:
Customer Name John
Customer Id 321
1) Simple attribute:
This type of attributes cannot be divided into sub parts.
Example: Age, sex, GPA
2) Composite attribute:
This type of attributes Can be subdivided.
Example: Address: street, city, state, zip
3) Single-valued attribute:
This type of attributes can have only a single value
Example: Social security number
4) Multi-valued attribute:
Multi-valued attribute Can have many values.
Example: Person may have several college degrees, phone numbers
5) Derived attribute:
Derived attribute Can be calculated or derived from other related attributes or entities.
Example: Age can be derived from D.O.B.
6) Stored attributes:
• The attributes stored in a data base are called stored attributes.
• An attribute takes a null value when an entity does not have a value for it.
• Null values indicate the value for the particular attribute does not exists or
unknown. E.g. : 1. Middle name may not be present for a person (non
existence case)
2. Apartment number may be missing or unknown.
CONSTRAINTS
An E-R enterprise schema may define certain constraints to which the contents of a database
system must conform.
Three types of constraints are
• Mapping cardinalities
• Key constraints
• Participation constraints
1. Mapping cardinalities
Mapping cardinalities express the number of entities to which another entity can be
associated via a relationship set.
Cardinality in E-R diagram that is represented by two ways:
Directed line ( ) Undirected line ( )
Example: Many employees works for a company. This relationship is shown by many-to-one
as
given below.
iv) Many-to-many: An entity in A is associated with any number (zero or more) of entities in B,
and an entity in B is associated with any number (zero or more) of entities in A.
Example: Employee works on number of projects and project is handled by number of employees.
Therefore, the relationship between employee and project is many-to-many as shown below.
Works- Project
Employee
on
2. Keys
A key allows us to identify a set of attributes and thus distinguishes entities from each
other.
Keys also help uniquely identify relationships, and thus distinguish relationships from
each other.
Primary The candidate key selected to uniquely identify all rows. It should be
Key rarely changed and cannot contain null values.
3. Participation Constraint
If every entity in the entity set E participates in at least one relationship in R. Then
participation is called Total Participation
If only some entities in the entity set E participate in relationships in R. Then the
participation is called Partial Participation.
ENTITY-RELATIONSHIP(E-R) DIAGRAMS
E-R diagram can express the overall logical structure of a database graphically.
E-R diagram consists of the following major components:
An edge between an entity set and binary relationship set can have an associated minimum
and maximum cardinality assigned in the form of l..h.
l - Minimum cardinality
h - Maximum cardinality
An entity set may not have sufficient attributes to form a primary key. Such an entity set is
termed a
weak entity set.
An entity set that has a primary key is termed a strong entity set.
Weak entity set is associated with another entity set called the identifying or owner entity
set. ie, weak entity set is said to be existence dependent on the identifying entity set.
Identifying entity set is said to own the weak entity set.
The relationship among the weak and identifying entity set is called the identifying
relationship.
Discriminator in a weak entity set is a set of attributes that distinguishes the different entities
among the weak entity also called as partial key.
1. Specialization:
A property of the higher- and lower-level entities created by specialization and generalization
is attribute inheritance.
The attributes of the higher-level entity sets are said to be inherited by the lower-level entity
sets.
For example, customer and employee inherit the attributes of person. The outcome of
attribute inheritance is
o A higher-level entity set with attributes and relationships that apply to all of its lower-
level entity sets.
o Lower-level entity sets with distinctive features that apply only within a particular
lower-level entity set
If an entity set is involved as a lower-level entity set in only one ISA relationship, then the
entity set has single inheritance
If an entity set is involved as a lower-level entity set in more than one ISA relationship, then
the entity set has multiple inheritance and the resulting structure is said to be a lattice.
Constraints on Generalizations
1. One type of constraint determining which entities can be members of a lower-level entity set.
Such membership may be one of the following:
• Condition-defined. In condition-defined the members of lower-level entity set is
evaluated on the basis of whether or not an entity satisfies an explicit condition.
• User-defined. User defined constraints are defined by user.
2. A second type of constraint relates to whether or not entities may belong to more than one
lower-level entity set within a single generalization. The lower-level entity sets may be one of
the following:
• Disjoint. A disjointness constraint requires that an entity belong to no more than one
lower-level entity set.
• Overlapping. Same entity may belong to more than one lower-level entity set within a
single generalization.
3. A final constraint, the completeness constraint specifies whether or not an entity in the
higher-level entity set must belong to at least one of the lower-level entity sets .This
constraint may be one of the following:
• Total generalization or specialization. Each higher-level entity must belong to a
lower-level entity set. It is represented by double line.
• Partial generalization or specialization. Some higher-level entities may not belong to
any lower-level entity set.
3. Aggregation
One limitation of the E-R model is that it cannot express relationships among relationships.
Consider the ternary relationship works-on, between a employee, branch, and job. Now,
suppose we want to record managers for tasks performed by an employee at a branch. There
another entity set manager is created.
The best way to model such a situation is to use aggregation.
Aggregation is an abstraction through which relationships are treated as higherlevel entities.
In our example works-on act as high level entity.
Functional Dependency
• In a given relation R, X and Y are attributes. Attribute Y is functionally dependent on
attribute X if each value of X determines EXACTLY ONE value of Y, which is
represented as X → Y (X can be composite in nature).
• We say here “x determines y” or “y is functionally dependent on x”
• X→Y does not imply Y→X
• In EMP relation ,If the value of “Empid” is known then the value of “Ename” can be
determined since Empid → Ename
Partial dependencies
• X and Y are attributes.
• Attribute Y is partially dependent on the attribute X only if it is dependent on a sub-set of
attribute X.
Consider the following Relation:
• REPORT (STUDENT#,COURSE#, Coursename,Marks,Grade)
• COURSE# CourseName.
Transitive dependencies:
• X , Y and Z are three attributes.
• XY
• YZ
=> X Z
Example:
• STUDENT# COURSE# Marks
• Marks Grade
• STUDENT# COURSE# Grade
Armstrong's Axioms
Reflexive rule − If alpha is a set of attributes and beta is_subset_of alpha, then alpha holds
beta.
Augmentation rule − If a → b holds and y is attribute set, then ay → by also holds. That is
adding attributes in dependencies, does not change the basic dependencies.
Transitivity rule − Same as transitive rule in algebra, if a → b holds and b → c holds, then a
→ c also holds. a → b is called as a functionally that determines b.
Trivial Functional Dependency
Trivial − If a functional dependency (FD) X → Y holds, where Y is a subset of X, then it is
called a trivial FD. Trivial FDs always hold.
Non-trivial − If an FD X → Y holds, where Y is not a subset of X, then it is called a non-
trivial FD.
Completely non-trivial − If an FD X → Y holds, where x intersect Y = Φ, it is said to be a
completely non-trivial FD.
Decomposition
• Decomposition: separating a relation into two or more by projection
• Join: (re)assembling two relations
• If exactly the original rows are reproduced by joining the relations, then the decomposition
was lossless
• If it can’t, the decomposition was lossy
ANOMALIES
If a database design is not perfect, it may contain anomalies, which are like a bad dream for any
database administrator. Managing a database with anomalies is next to impossible.
Update anomalies − If data items are scattered and are not linked to each other properly,
then it could lead to strange situations. For example, when we try to update one data item
having its copies scattered over several places, a few instances get updated properly while a
few others are left with old values. Such instances leave the database in an inconsistent state.
Deletion anomalies − We tried to delete a record, but parts of it was left undeleted because
of unawareness, the data is also saved somewhere else.
Insert anomalies − We tried to insert data in a record that does not exist at all.
Normalization is a method to remove all these anomalies and bring the database to a consistent state.
Normalization
Why Normalization?
a. Does the design ensure that all database operations will be efficiently performed and that the
design does not make the DBMS perform expensive consistency checks, which could be avoided?
Data integrity ensures the correctness of data stored within the database. It is achieved by
imposing integrity constraints.
An integrity constraint is a rule, which restricts values present in the database.
♦ Entity constraints:
The entity integrity rule states that the value of the primary key can never be a null value (a null
value is one that has no value and is not the same as a blank). Because a primary key is used to
identify a unique row in a relational table, its value must always be specified and should never be
unknown. The integrity rule requires that insert, update and delete operations maintain the
uniqueness and existence of all primary keys.
♦ Domain Constraints:
Only permissible values of an attribute are allowed in a relation.
♦ Direct Redundancy:
Direct redundancy can result due to the presence of same data in two different locations,
thereby, leading to anomalies such as reading, writing, updating and deleting.
♦ Indirect redundancy:
Indirect Redundancy results due to storing information that can be computed
from the other data items stored within the database.
Normalized databases have a design that reflects the true dependencies between tracked quantities,
allowing quick updates to data with little risk of introducing inconsistencies. There are formal
methods for quantifying "how normalized" a relational database is, and these classifications are
called Normal Forms (or NF).
What is a Normal Form?
Forms are designed to logically address potential problems such as inconsistencies and redundancy in
information stored in the database.
A database is said to be in one of the Normal Forms, if it satisfies the rules required by that Form as well as
previous; it also will not suffer from any of the problems addressed by the Form.
A form is said to be in its particular form only if it satisfies the previous Normal form.
A Relation is in 1NF, if every row contains exactly one value for each attribute.
Consider a table ‘Faculty’ which has information about the faculty, subjects and, the number of hours allotted
to each subject they teach.
Faculty:
The above table does not have any atomic values in the ‘Subject’ column. Hence, it is called un-normalized
table. Inserting, Updating and deletion would be a problem is such table.
For the above table to be in first normal form, each row should have atomic values. Hence let us re-construct
the data in the table. A ‘S.No’ column is included in the table to uniquely identity each row.
This table shows the same data as the previous table but we have eliminated the repeating groups.
Hence the table is now said to be in First Normal form (1NF). But we have introduced Redundancy into the
table now. This can be eliminated using Second Normal Form (2NF).
Find and remove attributes that are related to only a part of the key.
Group the removed items in another table.
Assign the new table a key that consists of that part of the old composite key.
If a relation is not in 2NF, it can be further normalized into a number of 2NF relations.
Let us consider the table we obtained after first normalization.
While eliminating the repeating groups, we have introduced redundancy into table. Faculty Code, Name and
date of Birth are repeated since the same faculty is multi skilled.
To eliminate this, let us split the table into 2 parts; one with the non-repeating groups and the other for
repeating groups.
Faculty:
Faculty code Faculty Name Date of Birth
100 Smith 17/07/64
101 Jones 24/12/72
102 Fred 03/02/80
103 Robert 28/11/66
Faculty Code is the only key to identify the faculty name and the date of birth.
Hence, Faculty code is the primary key in the first table and foreign key in the second table.
Faculty code is repeated in the Subject table. Hence, we have to take into account the ‘SNO’ to form a
composite key in Subject table. Now, SNO +Faculty code can unique identity each row in this table.
Anomalies in 2nd NF:
The situation could lead to the following problems:
• Insertion: Inserting the records of various Faculty teaching same subject would result the redundancy
of hours information.
• Updation: For a subject, the number of hours allotted to a subject is repeated several times. Hence, if
the number of hours has to be changed, this change will have to be recorded in every instance of that
subject. Any omissions will lead to inconsistencies.
• Deletion: If a faculty leaves the organization, information regarding hours allotted to the subject is lost.
This Subject table should therefore be further decomposed without any loss of information as:
Subject Hours
Transitive Dependency
Transitive dependencies arise:
In order to remove the anomalies that arose in Second Normal Form and to remove transitive dependencies, if
any, we have to perform third normalization.
Now let us see how to normalize the second table obtained after 2NF.
Subject:
SNO Faculty code Subject Hours
1 100 Java 16
2 100 PL/SQL 8
3 100 Linux 8
4 101 Java 16
5 101 Forms 8
6 101 Reports 12
7 102 SQL 10
8 102 Linux 8
9 102 Java 16
10 103 SQL 10
11 103 PL/SQL 8
12 103 Forms 8
In this table, hours depend on the subject and subject depends on the Faculty code and SNO. But, hours
is neither dependent on the faculty code nor the SNO. Hence, there exits a transitive dependency between
SNO, Subject and Hours.
If a faculty code is deleted, due to transitive dependency, information regarding the subject and hours
allotted to it will be lost.
Fac_Sub:
SNO Faculty code Subject
1 100 Java
2 100 PL/SQL
3 100 Linux
4 101 Java
5 101 Forms
6 101 Reports
7 102 SQL
8 102 Linux
9 102 Java
10 103 SQL
11 103 PL/SQL
12 103 Forms
Sub_Hrs:
Subject Hours
Java 16
PL/SQL 8
Linux 8
Forms 8
Reports 12
SQL 10
After decomposing the ‘Subject’ table we now have ‘Fac_Sub’ and ‘Sub_Hrs’ table respectively. By doing
so, the following anomalies are addressed in the table.
Insertion: - No redundancy of data for subject and hours while inserting the records.
Updation: - Subject and hours are stored in the separate table. So updation becomes much easier as there is no
repetitiveness of data.
Deletion: - Even if the faculty leaves the organization, the hours allotted to a particular subject can be
still retrieved from the Sub_Hrs table.
The intention of Boyce-Codd Normal Form (BCNF) is that - 3NF does not satisfactorily handle the case of a
relation processing two or more composite or overlapping candidate keys.
Multivalued Dependency:
Multivalued dependency defined by X Y is said to hold for a relation R(X,Y,Z) if for a given set of values
for X, there is a set of associated values for set of values of attribute Y, and X values depend only on X values
and have no dependence on the set of attributes Z.
Fourth Normal Form (4NF)
A relation is said to be in fourth normal form if each table contains no more than one multi-valued dependency
per key attribute.
In the above example, same topic is being taught in a seminar by more than 1 faculty. And Each Faculty takes
up different topics in the same seminar. Hence, Topic names are being repeated several times. This is an
example of multivalued dependency. For a table to be in fourth Normal Form, multivalued dependency must
be avoided.
To eliminate multivalued dependency, split the table such that there is no multivalued
dependency.
Seminar Topic
Techniques
DAT-2 Brown
DAT-2 Maria
Fifth Normal Form
A relation is said to be in 5NF if and only if it is in 4NF and every join dependency in it is implied by the
candidate keys.
Fifth normal form deals with cases where information can be reconstructed from smaller pieces of information
that can be maintained with less redundancy. It emphasizes on lossless decomposition.
Consider the following example:
Faculty Seminar Location
If we were to add the seminar DAT-2 to New York, we would have to add a line to the table for each instructor
located in New York.
The table would look like as shown below adding the above information:
From the above table, we observe that there is a redundancy of data stored for Brown’s information. So to
eliminate this redundancy, we have to do a ‘Non-Loss decomposition’ of the table.
Consider the following decomposition of the above table into fifth normal form:
Faculty Seminar
Brown DBP-1
Brown DAT-2
Robert DBP-1
Robert DAT-2
Seminar Location
Faculty Location
An attempt has been made to explain Normal forms in a simple yet understandable manner.
Some redundancies are unavoidable. One should take care while normalizing a table so that data integrity is not
compromised for removing redundancies.
Domain/key normal form (DKNF) is a normal form used in database normalization which requires
that the database contains no constraints other than domain constraints and key constraints.
A domain constraint specifies the permissible values for a given attribute, while a key constraint specifies the
attributes that uniquely identify a row in a given table.
The domain/key normal form is achieved when every constraint on the relation is a logical consequence of the
definition of keys and domains, and enforcing key and domain restraints and conditions causes all constraints to
be met. Thus, it avoids all non-temporal anomalies.
The reason to use domain/key normal form is to avoid having general constraints in the database that are not
clear domain or key constraints. Most databases can easily test domain and key constraints on attributes.
General constraints however would normally require special database programming in the form of stored
procedures that are expensive to maintain and expensive for the database to execute. Therefore general
constraints are split into domain and key constraints.
It's much easier to build a database in domain/key normal form than it is to convert lesser databases which may
contain numerous anomalies. However, successfully building a domain/key normal form database remains a
difficult task, even for experienced database programmers. Thus, while the domain/key normal form eliminates
the problems found in most databases, it tends to be the most costly normal form to achieve. However, failing to
achieve the domain/key normal form may carry long-term, hidden costs due to anomalies which appear in
databases adhering only to lower normal forms over time.
The third normal form, Boyce–Codd normal form, fourth normal form and fifth normal form are special cases
of the domain/key normal form. All have either functional, multi-valued or join dependencies that can be
converted into (super)keys. The domains on those normal forms were unconstrained so all domain constraints
are satisfied. However, transforming a higher normal form into domain/key normal form is not always a
dependency-preserving transformation and therefore not always possible
Denormalization
Denormalization is the process of storing the join of higher normal form relations as a base relation,
which is in a lower normal form.
SQL) while slowing down writes ( INSERT , UPDATE , and DELETE ). This means a denormalized database
under heavy write load may actually offer worse performance than its functionally equivalent normalized
counterpart.
A denormalized data model is not the same as a data model that has not been normalized, and denormalization
should only take place after a satisfactory level of normalization has taken place and that any required
constraints and/or rules have been created to deal with the inherent anomalies in the design. For example, all the
relations are in third normal form and any relations with join and multi-valued dependencies are handled
appropriately.
Examples of denormalization techniques include:
Materialised views, which may implement the following:
Storing the count of the "many" objects in a one-to-many relationship as an attribute of the "one"
relation
Adding attributes to a relation from another relation with which it will be joined
Star schemas, which are also known as fact-dimension models and have been extended to snowflake
schemas
Prebuilt summarisation or OLAP cubes
Denormalization techniques are often used to improve the scalability of Web applications.