Database Management System PDF
Database Management System PDF
Database Management System (DBMS) is a Software package used for providing EFFICIENT,
CONVENIENT and SAFE MULTI-USER (many people/programs accessing same database, or even
same data, simultaneously) storage of and access to MASSIVE amounts of PERSISTENT (data
outlives programs that operate on it) data. A DBMS also provides a systematic method for creating,
updating, storing, retrieving data in a database. DBMS also provides the service of controlling data
access, enforcing data integrity, managing concurrency control, and recovery. Having this in mind, a
full scale DBMS should at least have the following services to provide to the user.
1. Data storage, retrieval and update in the database.
2. A user accessible catalogue.
3. Transaction support service: ALL or NONE transaction, which minimize data inconsistency.
4. Concurrency Control Services: access and update on the database by different users
simultaneously should be implemented correctly.
5. Recovery Services: a mechanism for recovering the database after a failure must be available.
6. Authorization Services (Security): must support the implementation of access and
authorization service to database administrator and users.
7. Support for Data Communication: should provide the facility to integrate with data transfer
software or data communication managers.
8. Integrity Services: rules about data and the change that took place on the data, correctness
and consistency of stored data, and quality of data based on business constraints.
9. Services to promote data independency between the data and the application.
10. Utility services: sets of utility service facilities like
Importing data.
Statistical analysis support.
Index reorganization.
Garbage collection.
As people are one of the components in DBMS environment, there are group of roles played by
different stakeholders of the designing and operation of a database system.
ANSI-SPARC Architecture
All users should be able to access same data. This is important since the database is having a
shared data feature where all the data is stored in one location and all users will have their own
customized way of interacting with the data.
A user's view is unaffected or immune to changes made in other views. Since the requirement of
one user is independent of the other, a change made in one user’s view should not affect other
users.
Users should not need to know physical database storage details. As there are naïve users of the
system, hardware level or physical details should be a black-box for such users.
DBA should be able to change database storage structures without affecting the users' views. A
change in file organization, access method should not affect the structure of the data which in
turn will have no effect on the users.
Internal structure of database should be unaffected by changes to physical aspects of storage.
DBA should be able to change conceptual structure of database without affecting all users. In
any database system, the DBA will have the privilege to change the structure of the database,
like adding tables, adding and deleting an attribute, changing the specification of the objects in
the database.
All the above and much other functionality are possible due to the three levels ANSI-SPARC
architecture.
1. Planning
2. Analysis
3. Design
4. DBMS Selection
5. Implementation
6. Maintenance
1. Database planning
d
The database-planning phase begins when a customer requests to develop a database project. It is set
of tasks or activities, which decide the resources required in the database development and time limits
of different activities. During planning phase, four major activities are performed.
Review and approve the database project request.
Prioritize the database project request.
Allocate resources such as money, people and tools.
Arrange a development team to develop the database project.
Database planning should also include the development of standards that govern how data will be
collected, how the format should be specified, what necessary documentation will be needed.
2. Requirements Analysis
Requirements analysis is done in order to understand the problem, which is to be solved. It is very
important activity for the development of database system. The person responsible for the
requirements analysis is often called "Analyst".
In requirements analysis phase, the requirements and expectations of the users are collected and
analyzed. The collected requirements help to understand the system that does not yet exist. There are
two major activities in requirements analysis.
3. Design
design
The database design is the major phase of information engineering. In this phase, the information
models that were developed during analysis are used to design a conceptual schema for the database
and to design transaction and application.
In conceptual schema design, the data requirements collected in Requirement Analysis phase
are examined and a conceptual database schema is produced.
In transaction and application design, the database applications analyzed in Requirement
Analysis phase are examined and specifications of these applications are produced. There are
two major steps in design phase:
Database Design
Process Design
Conceptual Conceptual
level Schema
Internal Internal
Level Schema
Physical Data
Organization Data Base
Internal Schema
Physical Schema
External Level: Users' view of the database. It describes that part of database that is relevant to a
particular user. Different users have their own customized view of the database independent of
other users.
Conceptual Level: Community view of the database. It describes what data is stored in database and
relationships among the data.
Internal Level: Physical representation of the database on the computer. It describes how the data is
stored in the database.
The following example can be taken as an illustration for the difference between the three levels in
the ANSI-SPARC database Architecture. Where:
• The first level is concerned about the group of users and their respective data requirement
independent of the other.
• The second level is describing the whole content of the database where one piece of
information will be represented once.
• The third level
External View 1
Sno FName LName Age Salary
External View 2
Staff_No LName Bno
Conceptual level
Staff_No FName LName DOB Salary Bno
Internal level
Struct STAFF
{ Int Staff_No;
Int Branch_No;
Char FName[15];
Char LName[15];
Date Date_of_Birth;
Float Salary;
Strcut STAFF *next;
};
Defines d
DBMS schemas at three levels:
Internal schema: at the internal level to describe physical storage structures and access paths.
Typically uses a physical data model.
Conceptual schema: at the conceptual level to describe the structure and constraints for the
whole database for a community of users. It uses a conceptual or an
implementation data model.
External schema: at the external level to describe the various user views. It usually uses the
same data model as the conceptual level.
d
Data Independence
The ability to modify the physical schema without changing the logical schema
Applications depend on the logical schema.
In general, the interfaces between the various levels and components should be well
defined so that changes in some parts do not seriously influence others.
The capacity to change the internal schema without having to change the conceptual
schema.
Refers to immunity of conceptual schema to changes in the internal schema.
Internal schema changes e.g. using different file organizations, storage structures/devices
should not require change to conceptual or external schemas.
External / Conceptual
Mapping Logical Data independency
Conceptual
Schema
Conceptual / Internal
Mapping Physical Data independency
Internal
Schema
Database Languages
Allows DBA or user to describe and name entitles, attributes and relationships required for
the application.
Specification notation for defining the database schema.
4. DBMS selection
In this phase an appropriate DBMS is selected to support the information system. A number of
factors are involved in DBMS selection. They may be technical and economical factors. The
technical factors are concerned with the suitability of the DBMS for information system. The
following technical factors are considered.
5. Implementation
After the design phase and selecting a suitable DBMS, the database system is implemented. The
purpose of this phase is to construct and install the information system according to the plan and
design as described in previous phases. Implementation involves a series of steps leading to
operational information system that includes creating database definitions (such as tables,
indexes etc), developing applications, testing the system, developing operational procedures and
documentation, training the users and populating the database. In the context of information
engineering, it involves two steps.
Database definitions.
Creating applications.
6. Operational Maintenance
Once the database system is implemented, the operational maintenance phase of the database
system begins. The operational maintenance is the process of monitoring and maintaining the
database system. Maintenance includes activities such as adding new fields, changing the size of
existing field, adding new tables, and so on. As the database system requirement change, it
becomes necessary to add new tables or remove existing tables and to reorganize some files by
changing primary access methods or by dropping old indexes and constructing new ones. Some
queries or transactions may be rewritten for better performance. Database tuning or
reorganization continues throughout the life of database and while the requirements keep
changing.
Chapter Three
Data Model
A specific DBMS has its own specific Data Definition Language, but this type of
language is too low level to describe the data requirements of an organization
in a way that is readily understandable by a variety of users. We need a higher-
level language. Such a higher-level is called data-model.
1. Hierarchical Model
The simplest data model.
Record type is referred to as node or segment.
The top node is the root node.
Nodes are arranged in a hierarchical structure as sort of upside-down tree.
A parent node can have more than one child node.
A child node can only have one parent node.
The relationship between parent and child is one-to-many.
Relation is established by creating physical link between stored records
(each is stored with a predefined access path to other records).
To add new record type or relationship, the database must be redefined
and then stored in a new form.
Department
Employee Job
Time Card
Activity
Advantages of Hierarchical Data Model
Hierarchical Model is simple to construct and operate on.
Corresponds to a number of natural hierarchically organized domains e.g.,
assemblies in manufacturing, personnel organization in companies
Language is simple; uses constructs like GET, GET UNIQUE, GET NEXT,
GET NEXT WITHIN PARENT etc.
Disadvantages of Hierarchical Data Model
Navigational and procedural nature of processing
Database is visualized as a linear arrangement of records
Little scope for "query optimization"
2. Network Model
Allows record types to have more that one parent unlike hierarchical
model.
A network data models sees records as set members.
Each set has an owner and one or more members.
Do not allow many to many relationships between entities.
Like hierarchical model network model is a collection of physically linked
records.
Allow member records to have more than one owner.
Department Job
Employee Activity
Time Card
Advantages of Network Data Model
Network Model is able to model complex relationships and represents
semantics of add/delete on the relationships.
Can handle most situations for modeling using record types and
relationship types.
Language is navigational; uses constructs like FIND, FIND member, FIND
owner, FIND NEXT within set, GET etc. Programmers can do optimal
navigation through the database.
Disadvantages of Network Data Model
Navigational and procedural nature of processing
Database contains a complex array of pointers that thread through a set of
records.
Little scope for automated "query optimization”
3. Relational Data Model
Developed by Dr. Edgar Frank Codd in 1970 (famous paper, 'A Relational
Model for Large Shared Data Banks').
A terminology originates from the branch of mathematics called set theory
and relation.
Can define more flexible and complex relationship.
Viewed as a collection of tables called “Relations” equivalent to collection of
record types.
Relation Two dimensional table.
Stores information or data in the form of table’s rows and columns.
A row of the table is called tuple equivalent to record.
A column of a table is called attribute equivalent to fields.
Data value is the value of the Attribute.
Records are related by the data stored jointly in the fields of records in two
tables or files. The related tables contain information that creates the
relation.
The tables seem to be independent but are related some how.
No physical consideration of the storage is required by the user.
Many tables are merged together to come up with a new virtual view of the
relationship.
Alternative terminologies
Relation Table File
Tuple Row Record
Attribute Column Field
Degree of a Relationship
An important point about a relationship is how many entities participate
in it. The number of entities participating in a relationship is called the
Degree of the relationship.
Among the Degrees of relationship, the following are the basic:
Unary/Recursive Relationship: Tuples/records of a Single entity are
related withy each other.
Binary Relationships: Tuples/records of two entities are associated in a
relationship.
Ternary Relationship: Tuples/records of three different entities are
associated.
And a generalized one N-Nary Relationship: Tuples from arbitrary
number of entity sets are participating in a relationship.
Cardinality of a Relationship
Another important concept about relationship is the number of
instances/tuples that can be associated with a single instance from one
entity in a single relationship. The number of instances participating or
associated with a single instance from an entity in a relationship is called
the Cardinality of the relationship. The major cardinalities of a
relationship are:
ONE-TO-ONE: one tuple is associated with only one other tuple.
E.g. Building – Location as a single building will be located in a
single location and as a single location will only accommodate a single
Building.
ONE-TO-MANY, one tuple can be associated with many other tuples, but
not the reverse.
E.g. Department-Student as one department can have multiple
students.
MANY-TO-ONE, many tuples are associated with one tuple but not the
reverse.
E.g. Employee – Department as many employees belong to a single
department.
MANY-TO-MANY: one tuple is associated with many other tuples and from
the other side, with a different role name one tuple will be associated with
many tuples
E.g. Student – Course as a student can take many courses and a
single course can be attended by many students.
4. Relational Constraints/Integrity Rules
Relational Integrity
Domain Integrity: No value of the attribute should be beyond the
allowable limits.
Entity Integrity: In a base relation, no attribute of a Primary Key can
assume a value of NULL.
Referential Integrity: If a Foreign Key exists in a relation, either the
Foreign Key value must match a Candidate Key value in its home relation
or the Foreign Key value must be NULL.
Enterprise Integrity: Additional rules specified by the users or database
administrators of a database are incorporated.
Key constraints
If tuples are need to be unique in the database, and then we need to make each
tuple distinct. To do this we need to have relational keys that uniquely identify
each relation.
Super Key: an attribute or set of attributes that uniquely identifies a tuple
within a relation.
Candidate Key: a super key such that no proper subset of that collection
is a Super Key within the relation.
A candidate key has two properties:
1. Uniqueness
2. Irreducibility
If a super key is having only one attribute, it is automatically a Candidate key.
If a candidate key consists of more than one attribute it is called Composite
Key.
Primary Key: the candidate key that is selected to identify tuples uniquely
within the relation.
The entire set of attributes in a relation can be considered as a primary case in
a worst case.
Foreign Key: an attribute, or set of attributes, within one relation that
matches the candidate key of some relation.
A foreign key is a link between different relations to create the view or the
unnamed relation
Relational Views
Relations are perceived as a Table from the users’ perspective. Actually, there
are two kinds of relation in relational database. The two categories or tyapes of
Relations are Named and Unnamed Relations. The basic difference is on how
the relation is created, used and updated:
1. Base Relation
A Named Relation corresponding to an entity in the conceptual schema,
whose tuples are physically stored in the database.
2. View (Unnamed Relation)
A View is the dynamic result of one or more relational operations operating
on the base relations to produce another virtual relation that does not
actually exist as presented. So a view is virtually derived relation that does
not necessarily exist in the database but can be produced upon request by a
particular user at the time of request. The virtual table or relation can be
created from single or different relations by extracting some attributes and
records with or without conditions.
Purpose of a view
Hides unnecessary information from users: since only part of the base
relation (Some collection of attributes, not necessarily all) are to be included
in the virtual table.
Provide powerful flexibility and security: since unnecessary information will
be hidden from the user there will be some sort of data security.
Provide customized view of the database for users: each user is going to be
interfaced with their own preferred data set and format by making use of the
Views.
A view of one base relation can be updated.
Update on views derived from various relations is not allowed since it may
violate the integrity of the database.
Update on view with aggregation and summary is not allowed. Since
aggregation and summary results are computed from a base relation and
does not exist actually.
Instances
Instance: is the collection of data in the database at a particular point of
time (snap-shot).
Also called State or Snap Shot or Extension of the database.
Refers to the actual data in the database at a specific point in time.
State of database is changed any time we add, delete or update an
item.
Valid state: the state that satisfies the structure and constraints
specified in the schema and is enforced by DBMS.
Since Instance is actual data of database at some point in time, changes
rapidly.
To define a new database, we specify its database schema to the DBMS
(database is empty).
Database is initialized when we first load it with data.
Chapter Five
Database design
Database design is the process of coming up with different kinds of specification for the data to
be stored in the database. The database design part is one of the middle phases we have in
information systems development where the system uses a database approach. Design is the part
on which we would be engaged to describe how the data should be perceived at different levels
and finally how it is going to be stored in a computer system.
Information System with Database application consists of several tasks which include:
Planning of Information systems Design
Requirements Analysis,
Design (Conceptual, Logical and Physical Design)
Testing
Implementation
Operation and Support
From these different phases, the prime interest of a database system will be the Design part
which is again sub divided into other three sub-phases.
These sub-phases are:
1. Conceptual Design
2. Logical Design, and
3. Physical Design
In general, one has to go back and forth between these tasks to refine a database design, and
decisions in one task can influence the choices in another task.
In developing a good design, one should answer such questions as:
What are the relevant Entities for the Organization?
What are the important features of each Entity?
What are the important Relationships?
What are the important queries from the user?
What are the other requirements of the Organization and the Users?
The Three levels of Database Design
Conceptual Design
Logical Design
Physical Design
Logical design is the process of constructing a model of the information used in an enterprise
based on a specific data model (e.g. relational, hierarchical or network or object), but
independent of a particular DBMS and other physical considerations.
Normalization process
Collection of Rules to be maintained.
Discover new entities in the process.
Revise attributes based on the rules and the discovered Entities
Conceptual design revolves around discovering and analyzing organizational and user data
requirements.
The important activities are to identify
Entities
Attributes
Relationships
Constraints
And based on these components develop the ER model using ER diagrams
Designing conceptual model for the database is not a one linear process but an iterative
activity where the design is refined again and again.
To identify the entities, attributes, relationships, and constraints on the data, there are
different set of methods used during the analysis phase.
These include information gathered by…
Interviewing end users individually and in a group
Questionnaire survey
Direct observation
Examining different documents
The basic E-R model is graphically depicted and presented for review.
The process is repeated until the end users and designers agree that the ER diagram is a fair
representation of the organization’s activities and functions.
Checking the Redundant Relationships in the ER Diagram. Relationships between entities
indicate access from one entity to another - it is therefore possible to access one entity
occurrence from another entity occurrence even if there are other entities and relationships
that separate them - this is often referred to as Navigation' of the ER diagram
The last phase in ER modeling is validating an ER Model against requirement of the user.
Key
Composit Attribute
Key
Age
Enrolled_i
n
Acedamic_year Semeste
Grade r
One-to-one relationship
A customer is associated with at most one loan via the relationship borrower
A loan is associated with at most one customer via borrower
E.g. Relationship Manages between Staff ands Branch
The multiplicity of the relationship is:
One branch can only have one manager.
One Employee could Manages either one or no branches.
1..1 0..1
Employee Manages Branch
One-To-Many Relationships
In the one-to-many relationship a loan is associated with at most one customer via borrower,
a customer is associated with several (including 0) loans via borrower
1..1 0..*
Employee Leads Project
Many-To-Many Relationship
0..* 0..*
Instructor Teaches Course
Logical Database Design
Logical design is the process of constructing a model of the information used in an enterprise
based on a specific data model (e.g. relational, hierarchical or network or object), but
independent of a particular DBMS and other physical considerations.
Normalization process
Collection of Rules to be maintained.
Discover new entities in the process.
Revise attributes based on the rules and the discovered Entities
The first step before applying the rules in relational data model is converting the conceptual
design to a form suitable for relational logical model, which is in a form of tables.
An "insertion anomaly" is a failure to place information about a new database entry into all
the places in the database where information about that new entry needs to be stored. In a
properly normalized database, information about a new entry needs to be inserted into only
one place in the database; in an inadequately normalized database, information about a new
entry may need to be inserted into more than one place and, human fallibility being what it is,
some of the needed additional insertions may be missed.
If employee with ID 16 is deleted then ever information about skill C++ and the type of
skill is deleted from the database. Then we will not have any information about C++ and
its skill type.
Insertion Anomalies:
What if we have a new employee with a skill called Pascal? We can not decide weather
Pascal is allowed as a value for skill and we have no clue about the type of skill that
Pascal should be categorized as.
Modification Anomalies:
What if the address for Helico is changed from Piazza to Mexico? We need to look for
every occurrence of Helico and change the value of School_Add from Piazza to Mexico,
which is prone to error.
Database-management system can work only with the information that we put explicitly
into its tables for a given database and into its rules for working with those tables, where
such rules are appropriate and possible.
Data Dependency
The logical associations between data items that point the database designer in the direction of a
good database design are referred to as determinant or dependent relationships.
Two data items A and B are said to be in a determinant or dependent relationship if certain
values of data item B always appears with certain values of data item A. if the data item A is the
determinant data item and B the dependent data item then the direction of the association is from
A to B and not vice versa.
The essence of this idea is that if the existence of something, call it A, implies that B must exist
and have a certain value, and then we say that "B is functionally dependent on A." We also
often express this idea by saying that "A determines B," or that "B is a function of A," or that "A
functionally governs B." Often, the notions of functionality and functional dependency are
expressed briefly by the statement, "If A, then B." It is important to note that the value B must be
unique for a given value of A, i.e., any given value of A must imply just one and only one value
of B, in order for the relationship to qualify for the name "function." (However, this does not
necessarily prevent different values of A from implying the same value of B.)
X Y holds if whenever two tuples have the same value for X, they must have the same value
for Y
The notation is: AB which is read as; B is functionally dependent on A.
In general, a functional dependency is a relationship among attributes. In relational databases,
we can have a determinant that governs one other attribute or several other attributes.
FDs are derived from the real-world constraints on the attributes.
Example
Dinner Type of Wine
Meat Red
Fish White
Cheese Rose
Since the type of Wine served depends on the type of Dinner, we say Wine is functionally
dependent on Dinner.
Dinner Wine
Dinner Type of Wine Type of Fork
Meat Red Meat fork
Fish White Fish fork
Cheese Rose Cheese fork
Since both Wine type and Fork type are determined by the Dinner type, we say Wine is
functionally dependent on Dinner and Fork is functionally dependent on Dinner.
Dinner Wine
Dinner Fork
Partial Dependency
If an attribute which is not a member of the primary key is dependent on some part of the
primary key (if we have composite primary key) then that attribute is partially functionally
dependent on the primary key.
Let {A, B} is the Primary Key and C is no key attribute.
Then if {A, B} C and B C or A C
Then C is partially functionally dependent on {A, B}
Full Dependency
If an attribute which is not a member of the primary key is not dependent on some part of the
primary key but the whole key (if we have composite primary key) then that attribute is fully
functionally dependent on the primary key.
Let {A, B} is the Primary Key and C is no key attribute
Then if {A, B} C and B C and A C doesn’t hold (if B can not determine C and B can not
determine C)
Then C Fully functionally dependent on {A, B}
Transitive Dependency
In mathematics and logic, a transitive relationship is a relationship of the following form: "If A
implies B, and if also B implies C, then A implies C."
Example:
If Mr X is a Human, and if every Human is an Animal, then Mr X must be an Animal.
Generalized way of describing transitive dependency is that:
If A functionally governs B, AND
If B functionally governs C
THEN A functionally governs C
Provided that neither C nor B determines A i.e. (B / A and C / A)
In the normal notation:
{(AB) AND (BC)} ==> AC provided that B / A and C / A
Steps of Normalization:
We have various levels or steps in normalization called Normal Forms. The level of complexity,
strength of the rule and decomposition increases as we move from one lower level Normal Form
to the higher.
A table in a relational database is said to be in a certain normal form if it satisfies certain
constraints.
Normal form below represents a stronger condition than the previous one
Normalization towards a logical design consists of the following steps:
Un-Normalized Form:
Identify all data elements
First Normal Form:
Find the key with which you can find all data
Second Normal Form:
Remove part-key dependencies. Make all data dependent on the whole key.
Third Normal Form
Remove non-key dependencies. Make all data dependent on nothing but the key.
For most practical purposes, databases are considered normalized if they adhere to third normal
form.
Requires that all column values in a table are atomic (e.g., a number is an atomic value,
while a list or a set is not). We have two ways of achieving this: -
1. Putting each repeating group into a separate table and connecting them with a primary
key-foreign key relationship.
2. Moving these repeating groups to a new row by repeating the common attributes. If so
then find the key with which you can find all data
Definition: a table (relation) is in 1NF
If
There are no duplicated rows in the table. Unique identifier.
Each cell is single-valued (i.e., there are no repeating groups).
Entries in a column (attribute, field) are of the same kind.
Remove all repeating groups. Distribute the multi-valued attributes into different rows and
identify a unique identifier for the relation so that is can be said is a relation in relational
database.
EmpID FName LName SkillID Skill SkillType School SchoolAdd Skill level
12 Abebe Kebede 2 SQL Database AAU Sidist_killo 5
12 Abebe Kebede 6 VB.6 Programming Helico Piazza 8
16 Lemma Alemu 5 C++ Programming NAC Saris 6
16 Lemma Alemu 1 IP Programming Jimma Jimma_city 4
28 Mesfin Taye 2 SQL Database AAU Sidist_killo 10
65 Almaz Abera 2 SQL Database Helico Piazza 9
65 Almaz Abera 4 Prolog Programming Jimma Jimma_city 8
65 Almaz Abera 7 Java Programming AAU Sidist_killo 6
24 Teddy Tamiru 8 Oracle Database NAC Saris 5
94 Taye Gizaw 3 Cisco Networking AAU Sidist_killo 7
No partial dependency of a non key attribute on part of the primary key. This will result in a set
of relations with a level of Second Normal Form.
Any table that is in 1NF and has a single-attribute (i.e., a non-composite) primary key is
automatically in 2NF.
Definition: a table (relation) is in 2NF
If
It is in 1NF and
If all non-key attributes are dependent on the entire primary key. i.e. no partial
dependency.
Example for 2NF:
EMP_PROJ
EmpID EmpName ProjNo ProjName ProjLoc ProjFund ProjMangID Incentive
EMP_PROJ rearranged
EmpID ProjNo EmpName ProjName ProjLoc ProjFund ProjMangID Incentive
Business rule: Whenever an employee participates in a project, he/she will be entitled for an
incentive.
This schema is in its 1NF since we don’t have any repeating groups or attributes with multi-
valued property. To convert it to a 2NF we need to remove all partial dependencies of non key
attributes on part of the primary key.
{EmpID, ProjNo} EmpName, ProjName, ProjLoc, ProjFund, ProjMangID, Incentive
But in addition to this we have the following dependencies
FD1: {EmpID}EmpName
FD2: {ProjNo}ProjName, ProjLoc, ProjFund, ProjMangID
FD3: {EmpID, ProjNo} Incentive
As we can see, some non key attributes are partially dependent on some part of the primary key.
This can be witnessed by analyzing the first two functional dependencies (FD1 and FD2). Thus,
each Functional Dependencies, with their dependent attributes should be moved to a new relation
where the Determinant will be the Primary Key for each.
EMPLOYEE
EmpID EmpName
PROJECT
ProjNo ProjName ProjLoc ProjFund ProjMangID
EMP_PROJ
EmpID ProjNo Incentive
THIRD NORMAL FORM (3NF)
This schema is in its 2NF since the primary key is a single attribute.
Let’s take StudID, Year and Dormitory and see the dependencies.
StudIDYear AND YearDormitory
And Year can not determine StudID and Dormitory can not determine StudID
Then transitively StudIDDormitory
To convert it to a 3NF we need to remove all transitive dependencies of non key attributes on
another non-key attribute.
The non-primary key attributes, dependent on each other will be moved to another table and
linked with the main table using Candidate Key- Foreign Key relationship.
STUDENT
StudID Stud_F_Name Stud_L_Name Dept Year
125/97 Abebe Kebede Info Sc 1
654/95 Lemma Alemu Geog 3
842/95 Mesfin Taye Comp. Sc 3
165/97 Abera Belay Info Sc 1
985/95 Almaz Abera Geog 3
DORM
Year Dormitory
1 401
3 403
Generally, even though there are other four additional levels of Normalization, a table is said to
be normalized if it reaches 3NF. A database with all tables in the 3NF is said to be Normalized
Database.
Mnemonic for remembering the rationale for normalization up to 3NF could be the following:
1. No Repeating or Redundancy: - no repeating fields in the table.
2. The Fields Depend Upon the Key: - the table should solely depend on the key.
3. The Whole Key: - no partial key dependency.
4. And Nothing But the Key: - no inter data dependency.
5. So Help Me Codd: - since Codd came up with these rules.
Pitfalls of Normalization