0% found this document useful (0 votes)
14 views

chp1

The document provides an overview of database management systems (DBMS), covering key concepts such as the relational model, entity-relationship (ER) model, and the advantages of using a DBMS. It explains the structure of databases, the importance of data independence, and the ACID properties that ensure transaction reliability. Additionally, it discusses the components of the ER model, including entities, attributes, and relationships, along with their representation in diagrams.

Uploaded by

shbhamare123
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views

chp1

The document provides an overview of database management systems (DBMS), covering key concepts such as the relational model, entity-relationship (ER) model, and the advantages of using a DBMS. It explains the structure of databases, the importance of data independence, and the ACID properties that ensure transaction reliability. Additionally, it discusses the components of the ER model, including entities, attributes, and relationships, along with their representation in diagrams.

Uploaded by

shbhamare123
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 92

Database Management

Chapter 1
Syllabus Outline CH 1
• Relational Model Integration: overview of database management system,
limitations of data processing environment, database approach, instance
and schema, three level of abstraction, DBMS structure, ACID Properties
• Entity Relation Model: Entity, attributes, keys, relations, cardinality,
participation, weak entities, ER diagram, Generalization, Specialization and
aggregation, conceptual design with ER model, entity versus attribute,
entity versus relationship, binary versus ternary relationship, aggregate
versus ternary relationship.
• Functional dependency, Relational Database design: features of good
relational database design, atomic domain and Normalization (1NF, 2NF,
3NF, BCNF).
Database
• A database is a collection of data, typically describing the activities of
one or more related organizations.
• For example, a university database might contain information about
the following:
• Entities such as students, faculty, courses, and classrooms.
• Relationships between entities, such as students' enrollment in courses,
• faculty teaching courses, and the use of rooms for courses.
DBMS
• A database management system, or DBMS, is software designed to
assist in maintaining and utilizing large collections of data. The need
for such systems, as well as their use, is growing rapidly.
File systems v/s database
• We probably do not have 500 GB of main memory to hold all the data. We must therefore store data in a
storage device such as a disk or tape and
bring relevant parts into main memory for processing as needed.
• Even if we have 500 GB of main memory, on computer systems with 32-bit addressing, we cannot refer
directly to more than about 4 GB of data. We
have to program some method of identifying all data items.
• We have to write special programs to answer each question a user may want to ask about the data. These
programs are likely to be complex because
of the large volume of data to be searched.
• We must protect the data from inconsistent changes made by different users accessing the data
concurrently. If applications must address the details of such concurrent access, this adds greatly to their
complexity.
• We must ensure that data is restored to a consistent state if the system craches while changes are being
made.
• Operating systems provide only a password mechanism for security. This is not sufficiently flexible to enforce
security policies in which different users have permission to access different subsets of the data.
Advantages of DBMS
• Data Independence: Application programs should not, ideally, be exposed
to details of data representation and storage, provides an abstract view of
the data that hides such details.
• Efficient Data Access: A DBMS utilizes a variety of sophisticated techniques
to store and retrieve data efficiently. This feature is especially important if
the data is stored on external storage devices.
• Data Integrity and Security: If data is always accessed through the DBMS,
the DBMS can enforce integrity constraints.
• For example, before inserting salary information for an employee, the DBMS can
check that the department budget is not exceeded.
• Also, it can enforce access controls that govern what data is visible to different
classes of users.
Advantages of DBMS
• Data Administration: When several users share the data, centralizing the
administration of data can offer significant improvements. Organizing the data
representation to minimize redundancy and for fine-tuning the storage of the
data to make retrieval efficient.
• Concurrent Access and Crash Recovery: A DBMS schedules concurrent accesses
to the data in such a manner that users can think of the data as being accessed by
only one user at a time. Further, the DBMS protects users from the effects of
system failures.
• Reduced Application Development Time: Clearly, the DBMS supports important
functions that are common to many applications accessing data in the DBMS.
This, in conjunction with the high-level interface to the data, facilitates quick
application development.
• DBMS applications are also likely to be more robust than similar stand-alone
applications because many important tasks are handled by the DBMS
Data Model
• A data model is a collection of high-level data description constructs
that hide many low-level storage details.
• A DBMS allows a user to define the data to be stored in terms of a
data model.
• RDBMS is the database management system where data is stored in
the form of tables.
• RDBMS is the basis for all modern database systems such as MySQL,
Microsoft SQL Server, Oracle, and Microsoft Access.
Database Table
• A table is a collection of related data entries, and it consists of
columns and rows.
• A column holds specific information about every record in the table.
• A record (or row) is each individual entry that exists in a table.
Semantic Data Model
• Semantic data model is a more abstract, high-level data model that makes
it easier for a user to come up with a good initial description of the data in
an enterprise.
• These models contain a wide variety of constructs that help describe a real
application scenario.
• A database design in terms of a semantic model serves as a useful starting
point and is subsequently translated into a database design in terms of the
data model the DBMS actually supports.

• A widely used semantic data model called the entity-relationship (ER)


model allows us to pictorially denote entities and the relationships among
them.
RDBMS
• The central data description construct in this model is a relation,
which can be thought of as a set of records.
• A description of data in terms of a data model is called a schema. In
the relational model, the schema for a relation specifies its name, the
name of each field (or attribute or column), and the type of each
field. As an example, student information in a university database
may be stored in a relation with the following schema-

Instance of Student Relation


Levels of Abstraction in DBMS
• The data in a DBMS is described at three levels of abstraction,
Conceptual Schema
• The conceptual schema (sometimes called the logical schema)
describes the stored data in terms of the data model of the DBMS. In
a relational DBMS, the conceptual schema describes all relations that
are stored in the database.
• Example - university database, these relations contain information
about entities, such as students and faculty, and about relationships,
such as students' enrollment in courses. All student entities can be
described using records in a Students relation.
• collection of relationships can be described as a relation leading to
conceptual schema
Conceptual Schema - example
• Students(sid: string, name: string, login: string, age: integer, gpa: real)
Faculty(fid: string, fname: string, sal: real)
• Courses( cid: string, cname: string, credits: integer)
• Rooms(nw: integer, address: string, capacity: integer)
• Enrolled (sid: string, cid: string, grade: string)
• Teaches(fid: string, cid: string)
• Meets_In( cid: string, rno: integer, ti'fne: string)
The choice of relations, and the choice of fields for each relation, is not
always obvious, and the process of arriving at a good conceptual
schema is called conceptual database design.
Schema
• The physical schema specifies additional storage details. Essentially,
the physical schema summarizes how the relations described in the
conceptual schema are actually stored on secondary storage devices
such as disks and tapes.
• External schemas, which usually are also in terms of the data model
of the DBMS, allow data access to be customized (and authorized) at
the level of individual users or groups of users.
• Any given database has exactly one conceptual schema and one
physical schema because it has just one set of stored relations, but it
may have several external schemas, each tailored to a particular
group of users
Example of Schema
• The external schema design is guided by end user requirements. For
example, we might want to allow students to find out the names of
faculty members teaching courses as well as course enrollments. This
can be done by defining the following view:
Courseinfo(rid: string, fname: string, enTollment: integer)
Data Independence
• A very important advantage of using a DBMS is that it offers data
independence. That is, application programs are insulated from
changes in the way
• the data is structured and stored.
• Data independence is achieved through use of the three levels of data
abstraction; in particular, the conceptual schema and the external schema
Examples of Queries
1. What is the name of the student with student ID 1234567
2. What is the average salary of professors who teach course CS5647
3. How many students are enrolled in CS5647
4. What fraction of students in CS564 received a grade better than B7
5. Is any student with a CPA less than 3.0 enrolled in CS5647
Queries and DBMS
• A DBMS takes great care to evaluate queries as efficiently as possible
• The Data Definition Language, or DDL, is used to specify the
architecture and structure of the database, including the creation of
tables and changes to their characteristics.
• A DBMS enables users to modify, and query data through a data
manipulation language (DML)
Transaction
• A transaction is defined as “any one execution” of a user program in a
DBMS and differs from an execution of a program outside the DBMS (e.g., a
C program executing on Unix) in important ways.
• A transaction is a program unit whose execution may or may not change
the contents of a database.
• The transaction concept in DBMS is executed as a single unit.
• If the database operations do not update the database but only retrieve
data, this type of transaction is called a read-only transaction.
• DBMS transactions must be atomic, consistent, isolated and durable
• If the database were in an inconsistent state before a transaction, it would
remain in the inconsistent state after the transaction.
THE ACID PROPERTIES
A DBMS must ensure four important properties of transactions to maintain data in the face of
concurrent access and system failures:
1. Users should be able to regard the execution of each transaction as atomic: Either all actions are
carried out or none are. Users should not have to worry about the effect of incomplete transactions
(say, when a system crash occurs).
2. Each transaction, run by itself with no concurrent execution of other transactions, must preserve
the consistency of the database. The DBMS assumes that consistency holds for each transaction.
Ensuring this property of a transaction is the responsibility of the user.
3. Users should be able to understand a transaction without considering the effect of other
concurrently executing transactions, even if the DBMS interleaves the actions of several
transactions for performance reasons. This property is sometimes referred to isolation: Transactions
are isolated, or protected, from the effects of concurrently scheduling other transactions.
4. Once the DBMS informs the user that a transaction has been successfully completed, its effects
should persist even if the system crashes before all its changes are reflected on disk. This property is
called durability.
State Diagram of Transaction
Example of ACID property in DBMS

Transaction 1: Begin X=X-50, Y = Y+50 END


Transaction 2: Begin X=1.1*X, Y=1.1*Y END

• Transaction 1 is transferring $50 from account X to account Y.


• Transaction 2 is crediting each account with a 10% interest payment.

• If both transactions are submitted together, there is no guarantee that the


Transaction 1 will execute before Transaction 2 or vice versa. Irrespective of
the order, the result must be as if the transactions take place serially one
after the other.
Section 2
Entity Relation Model: Entity, attributes, keys, relations, cardinality, participation,
weak entities, ER diagram, Generalization, Specialization and aggregation,
conceptual design with ER model, entity versus attribute, entity versus relationship,
binary versus ternary relationship, aggregate versus ternary relationship.
ER Model
• The Entity Relational Model is a model for identifying entities to be
represented in the database and representation of how those entities
are related.
• The ER data model specifies enterprise schema that represents the
overall logical structure of a database graphically.
• ER models are used to model real-world objects like a person, a car,
or a company and the relation between these real-world objects.
• In short, ER Diagram is the structural format of the database.
Use ER Diagrams In DBMS

• ER diagrams are used to represent the E-R model in a database, which


makes them easy to be converted into relations (tables).
• ER diagrams provide the purpose of real-world modeling of objects
which makes them intently useful.
• ER diagrams require no technical knowledge and no hardware
support.
• These diagrams are very easy to understand and easy to create even
for a naive user.
• It gives a standard solution for visualizing the data logically.
Symbols Used in ER Model

• ER Model is used to model the logical view of the system from a data
perspective which consists of these symbols:

• Rectangles: Rectangles represent Entities in ER Model.


• Ellipses: Ellipses represent Attributes in ER Model.
• Diamond: Diamonds represent Relationships among Entities.
• Lines: Lines represent attributes to entitie s and entity sets with other
relationship types.
• Double Ellipse: Double Ellipses represent Multi-Valued Attributes.
• Double Rectangle: Double Rectangle represents a Weak Entity.
Components of ER Model
Entity

• An Entity may be an object with a physical existence – a


particular person, car, house, or employee – or it may be an
object with a conceptual existence – a company, a job, or a
university course.
• Entity Set: An Entity is an object of Entity Type and a set of all
entities is called an entity set.
• For Example, E1 is an entity having Entity Type Student and
the set of all students is called Entity Set. In ER diagram,
Entity Type is represented as:
Entity
1. Strong Entity
• A Strong Entity is a type of entity that has a key Attribute. Strong Entity does not depend
on other Entity in the Schema. It has a primary key, that helps in identifying it uniquely,
and it is represented by a rectangle. These are called Strong Entity Types.
2. Weak Entity
• An Entity type has a key attribute that uniquely identifies each entity in the entity set.
But some entity type exists for which key attributes can’t be defined. These are called
Weak Entity types.
• For Example, A company may store the information of dependents (Parents, Children, Spouse) of
an Employee. But the dependents don’t have existed without the employee. So Dependent will be
a Weak Entity Type and Employee will be Identifying Entity type for Dependent, which means it is
Strong Entity Type.
• A weak entity type is represented by a Double Rectangle. The participation of weak
entity types is always total. The relationship between the weak entity type and its
identifying strong entity type is called identifying relationship and it is represented by a
double diamond.
Attributes
• Attributes are the properties that define the entity type. For example, Roll_No,
Name, DOB, Age, Address, and Mobile_No are the attributes that define entity
type Student. In ER diagram, the attribute is represented by an oval.
• Key Attribute
• The attribute which uniquely identifies each entity in the entity set is called the
key attribute. For example, Roll_No will be unique for each student. In ER
diagram, the key attribute is represented by an oval with underlying lines.
• Composite Attribute
• An attribute composed of many other attributes is called a composite attribute.
For example, the Address attribute of the student Entity type consists of Street,
City, State, and Country. In ER diagram, the composite attribute is represented by
an oval comprising of ovals.
Attribute
• Multivalued Attribute
• An attribute consisting of more than one value for a given entity. For example,
Phone_No (can be more than one for a given student). In ER diagram, a
multivalued attribute is represented by a double oval.
• Derived Attribute
• An attribute that can be derived from other attributes of the entity type is
known as a derived attribute. e.g.; Age (can be derived from DOB). In ER
diagram, the derived attribute is represented by a dashed oval.
Identify the type of Attributes
Entity type Student
Relationship Type and Relationship Set
• A Relationship Type represents the association between entity types.
For example, ‘Enrolled in’ is a relationship type that exists between
entity type Student and Course.
• In ER diagram, the relationship type is represented by a diamond and
connecting the entities with lines.
Set of Relationship
• A set of relationships of the same type is known as a relationship set.
The following relationship set depicts S1 as enrolled in C2, S2 as
enrolled in C1, and S3 as registered in C3.
Degree of a Relationship Set

• The number of different entity sets participating in a relationship set


is called the degree of a relationship set.
1. Unary Relationship: When there is only ONE entity set participating in a
relation, the relationship is called a unary relationship. For example, one person
is married to only one person.
2. Binary Relationship: When there are TWO entities set participating in a
relationship, the relationship is called a binary relationship. For example, a
Student is enrolled in a Course.
3. n-ary Relationship: When there are n entities set participating in a relation,
the relationship is called an n-ary relationship.
Relationship
Cardinality

• The number of times an entity of an entity set participates in a


relationship set is known as cardinality. Cardinality can be of different
types:
1. One-to-One: When each entity in each entity set can take part only once in
the relationship, the cardinality is one-to-one. Let us assume that a male
can marry one female and a female can marry one male. So the relationship
will be one-to-one. the total number of tables that can be used in this is 2.
Cardinality
• 2. One-to-Many: In one-to-many mapping as well where each entity
can be related to more than one relationship and the total number of
tables that can be used in this is 2.
Cardinality
3. Many-to-One: When entities in one entity set can take part only
once in the relationship set and entities in other entity sets can take
part more than once in the relationship set, cardinality is many to one.
Let us assume that a student can take only one course but one course
can be taken by many students.
Relationship
• 4. Many-to-Many: When entities in all entity sets can take part more
than once in the relationship cardinality is many to many. Let us
assume that a student can take more than one course and one course
can be taken by many students. So the relationship will be many to
many.
Activity 1- Identify the entities and their
attributes
ER model of a Hospital
• The entities are represented in rectangular boxes and are Patient,
Tests and Doctor.

• Each of these entities have their respective attributes which are −

• Patients - ID(primary key), name, age,visit_date

• Tests- Name(primary key), date, result

• Doctor- ID(primary key), name, specialization


Activity 2- Draw the ER diagram
• The entities in this ER model are Employee, Department and Project.
These entities have the following attributes −

• Employee - ENO(Primary Key) , Name, Salary

• Department - DNO(Primary key), Name, Locations

• Project - PNO(Primary key), Name


Participation Constraint

• Participation Constraint is applied to the entity participating in the


relationship set.

1. Total Participation – Each entity in the entity set must participate in the
relationship. If each student must enroll in a course, the participation of
students will be total. Total participation is shown by a double line in the ER
diagram.

2. Partial Participation – The entity in the entity set may or may NOT
participate in the relationship. If some courses are not enrolled by any of the
students, the participation in the course will be partial.
• The diagram depicts the ‘Enrolled in’ relationship set with Student
Entity set having total participation and Course Entity set having
partial participation.
How to Draw ER Diagram?
• The very first step is Identifying all the Entities, and place them in a
Rectangle, and labeling them accordingly.
• The next step is to identify the relationship between them and pace
them accordingly using the Diamond, and make sure that,
Relationships are not connected to each other.
• Attach attributes to the entities properly.
• Remove redundant entities and relationships.
• Add proper colors to highlight the data present in the database.
Enhanced ER Model
• Generalization, Specialization, and Aggregation in ER model are used
for data abstraction in which an abstraction mechanism is used to
hide details of a set of objects. Some of the terms were added to the
Enhanced ER Model, where some new concepts were added. These
new concepts are:

• Generalization
• Specialization
• Aggregation
Generalization
• Generalization is the process of extracting
common properties from a set of entities
and creating a generalized entity from it.
• It is a bottom-up approach in which two or
more entities can be generalized to a
higher-level entity if they have some
attributes in common.
• For Example, STUDENT and FACULTY can be
generalized to a higher-level entity called
PERSON In this case, common attributes like P_NAME, and P_ADD
become part of a higher entity (PERSON), and specialized
attributes like S_FEE become part of a specialized entity
(STUDENT).
Specialization
• In specialization, an entity is
divided into sub-entities
based on its characteristics. It
is a top-down approach
where the higher-level entity
is specialized into two or
more lower-level entities. For
Example, an EMPLOYEE entity
in an Employee management
system can be specialized into
DEVELOPER, TESTER, etc
Aggregation
• An ER diagram is not capable of representing the relationship between an
entity and a relationship which may be required in some scenarios. In those
cases, a relationship with its corresponding entities is aggregated into a
higher-level entity.
• Aggregation is an abstraction through which we can represent relationships
as higher-level entity sets.
• For Example, an Employee working on a project may require some
machinery. So, REQUIRE relationship is needed between the relationship
WORKS_FOR and entity MACHINERY.
• Using aggregation, WORKS_FOR relationship with its entities EMPLOYEE and
PROJECT is aggregated into a single entity and relationship REQUIRE is
created between the aggregated entity and MACHINERY.
Aggregation
Entity V/S Attribute
• Entity and Attributes are two essential terms of a database
management system (DBMS). The main difference between the Entity
and an attribute is that an entity is a real-world object, and attributes
describe the properties of an Entity.
Entity V/S Relationship
• An entity can have one or more attributes, which are properties or
characteristics that describe the entity. For example, a customer can
have attributes such as name, email, and address. A relationship is a
connection or association between two or more entities that
expresses how they interact or depend on each other.
ER Model V/S Relational Model
• The E-R Model and Relational Model are two aspects of the Data
Model in DBMS that are used to construct databases at the physical,
logical, and view levels. This article explains the complete overview of
the E-R Model and Relational Model. The difference between these
models is the most common part of an interview question. The key
distinction is that the E-R Model is entity-specific, while the
Relational Model is table-specific.
Differences between ER Model and Relational
Model
• the ER model describes the relationship between entities and their
attributes. On the other hand, the Relational Model referred to the
implementation of our model.
• The Relational Model is the implementation or representational model,
while the ER Model is the high-level or conceptual model.
• The data in components such as entity sets, relationship sets, and
attributes are represented by an ER model. The Relational model, on the
other hand, defines data in components such as tuples, attributes, and
attribute domains.
• As compared to a Relational Model, an ER model makes it easier to
understand the relationships between entities.
• Mapping Cardinality is always a constraint in the ER model, while the
cardinality constraint cannot be defined in the Relational Model.
Questions
• Binary V/s Ternary relationship
• Aggregate v/s Ternary relationship
Section 3
Functional dependency, Relational Database design: features of good relational
database design, atomic domain and Normalization (1NF, 2NF, 3NF, BCNF).
Functional Dependency
• In a relational database management, functional dependency is a
concept that specifies the relationship between two sets of attributes
where one attribute determines the value of another attribute. It is
denoted as X → Y, where the attribute set on the left side of the
arrow, X is called Determinant, and Y is called the Dependent.
• Functional dependencies are used to mathematically express
relations among database entities and are very important to
understand advanced concepts in Relational Database System
Example of Functional Dependency
From the above table we can conclude some valid
functional dependencies:
•roll_no → { name, dept_name, dept_building },
•roll_no → dept_name ,
•dept_name → dept_building , Dept_name can identify
the dept_building accurately, since departments with
different dept_name will also have a different
dept_building
•More valid functional dependencies:
•roll_no → name,
•{roll_no, name} ⇢ {dept_name, dept_building}, etc.

•name → dept_name Students with the same name can have different
dept_name, hence this is not a valid functional dependency.
•dept_building → dept_name There can be multiple departments in the same
building. Example, in the above table departments ME and EC are in the same
building B2, hence dept_building → dept_name is an invalid functional
dependency.
Armstrong’s axioms/properties of functional
dependencies
• Reflexivity: If Y is a subset of X, then X→Y holds by reflexivity rule
Example, {roll_no, name} → name is valid.
• Augmentation: If X → Y is a valid dependency, then XZ → YZ is also
valid by the augmentation rule.
Example, {roll_no, name} → dept_building is valid, hence {roll_no,
name, dept_name} → {dept_building, dept_name} is also valid.
• Transitivity: If X → Y and Y → Z are both valid dependencies, then
X→Z is also valid by the Transitivity rule.
Example, roll_no → dept_name & dept_name → dept_building, then
roll_no → dept_building is also valid.
Advantages of Functional Dependencies
• Functional dependencies having numerous applications in the field of database
management system. Here are some applications listed below:
• 1. Data Normalization
• Data normalization is the process of organizing data in a database in order to
minimize redundancy and increase data integrity. Functional dependencies play
an important part in data normalization. With the help of functional
dependencies we are able to identify the primary key, candidate key in a table
which in turns helps in normalization.
• 2. Query Optimization
• With the help of functional dependencies we are able to decide the connectivity
between the tables and the necessary attributes need to be projected to retrieve
the required data from the tables. This helps in query optimization and improves
performance.
Advantages of Functional DEpendency
• 3. Consistency of Data
• Functional dependencies ensures the consistency of the data by removing
any redundancies or inconsistencies that may exist in the data. Functional
dependency ensures that the changes made in one attribute does not
affect inconsistency in another set of attributes thus it maintains the
consistency of the data in database.
• 4. Data Quality Improvement
• Functional dependencies ensure that the data in the database to be
accurate, complete and updated. This helps to improve the overall quality
of the data, as well as it eliminates errors and inaccuracies that might occur
during data analysis and decision making, thus functional dependency
helps in improving the quality of data in database.
RDD
• Relational database design (RDD) models information and data into a set of
tables with rows and columns. Each row of a relation/table represents a
record, and each column represents an attribute of data. The Structured
Query Language (SQL) is used to manipulate relational databases. The
design of a relational database is composed of four stages, where the data
are modeled into a set of related tables. The stages are:
• Define relations/attributes
• Define primary keys
• Define relationships
• Normalization
RDD
• In an RDD, the data are organized into tables and all types of data
access are carried out via controlled transactions.
• Relational database design satisfies the ACID (atomicity, consistency,
integrity and durability) properties required from a database design.
• Relational database design mandates the use of a database server in
applications for dealing with data management problems.
The four stages of an RDD
• Relations and attributes: The various tables and attributes related to each table are identified. The tables
represent entities, and the attributes represent the properties of the respective entities.
• Primary keys: The attribute or set of attributes that help in uniquely identifying a record is identified and
assigned as the primary key
• Relationships: The relationships between the various tables are established with the help of foreign keys.
Foreign keys are attributes occurring in a table that are primary keys of another table. The types of
relationships that can exist between the relations (tables) are:
• One to one
• One to many
• Many to many
• An entity-relationship diagram can be used to depict the entities, their attributes and the relationship
between the entities in a diagrammatic way.
• Normalization: This is the process of optimizing the database structure. Normalization simplifies the
database design to avoid redundancy and confusion. The different normal forms are as follows:
• First normal form
• Second normal form
• Third normal form
• Boyce-Codd normal form
• Fifth normal form
Normalization
• Database normalization is the process of organizing the attributes of
the database to reduce or eliminate data redundancy (having the
same data but at different places).
• Problems because of data redundancy: Data redundancy
unnecessarily increases the size of the database as the same data is
repeated in many places. Inconsistency problems also arise during
insert, delete and update operations.
Features of Normalization
• Elimination of Data Redundancy: Data redundancy refers to the repetition of data in
different parts of the database. Normalization helps in reducing or eliminating this
redundancy, which can improve the efficiency and consistency of the database.
• Ensuring Data Consistency: Normalization helps in ensuring that the data in the database
is consistent and accurate. By eliminating redundancy, normalization helps in preventing
inconsistencies and contradictions that can arise due to different versions of the same
data.
• Simplification of Data Management: Normalization simplifies the process of managing
data in a database. By breaking down a complex data structure into simpler tables,
normalization makes it easier to manage the data, update it, and retrieve it.
• Improved Database Design: Normalization helps in improving the overall design of the
database. By organizing the data in a structured and systematic way, normalization
makes it easier to design and maintain the database. It also makes the database more
flexible and adaptable to changing business needs.
Features of Normalization
• Avoiding Update Anomalies: Normalization helps in avoiding update
anomalies, which can occur when updating a single record in a table
affects multiple records in other tables. Normalization ensures that
each table contains only one type of data and that the relationships
between the tables are clearly defined, which helps in avoiding such
anomalies.
• Standardization: Normalization helps in standardizing the data in the
database. By organizing the data into tables and defining relationships
between them, normalization helps in ensuring that the data is stored
in a consistent and uniform manner.
Normal Forms
• First Normal Form (1NF): This is the most basic level of normalization. In 1NF,
each table cell should contain only a single value, and each column should have a
unique name. The first normal form helps to eliminate duplicate data and simplify
queries.
• Second Normal Form (2NF): 2NF eliminates redundant data by requiring that
each non-key attribute be dependent on the primary key. This means that each
column should be directly related to the primary key, and not to other columns.
• Third Normal Form (3NF): 3NF builds on 2NF by requiring that all non-key
attributes are independent of each other. This means that each column should be
directly related to the primary key, and not to any other columns in the same
table.
• Boyce-Codd Normal Form (BCNF): BCNF is a stricter form of 3NF that ensures that
each determinant in a table is a candidate key. In other words, BCNF ensures that
each non-key attribute is dependent only on the candidate key.
1. First Normal Form –
• If a relation contain composite or multi-valued attribute, it violates
first normal form or a relation is in first normal form if it does not
contain any composite or multi-valued attribute. A relation is in first
normal form if every attribute in that relation is singled valued
attribute.
Example 1NF

Relation STUDENT in table 1 is not in 1NF because of multi-valued attribute STUD_PHONE.


Its decomposition into 1NF
Example 2: 1NF

The table Course is a multi-valued attribute so it is not in 1NF.


Below Table is in 1NF as there is no multi-valued attribute
Convert it to 1NF
Solution
2NF
• To be in second normal form, a relation must be in first normal form
and relation must not contain any partial dependency. A relation is in
2NF if it has No Partial Dependency, i.e., no non-prime attribute
(attributes which are not part of any candidate key) is dependent on
any proper subset of any candidate key of the table.
• Partial Dependency – If the proper subset of candidate key
determines non-prime attribute, it is called partial dependency.
Example 1 :2 NF
{Note that, there are many courses having the same
course fee. }
Here, COURSE_FEE cannot alone decide the value of
COURSE_NO or STUD_NO;
COURSE_FEE together with STUD_NO cannot decide the
value of COURSE_NO;
COURSE_FEE together with COURSE_NO cannot decide
the value of STUD_NO;
Hence, COURSE_FEE would be a non-prime attribute, as
it does not belong to the one only candidate key
{STUD_NO, COURSE_NO} ;
But, COURSE_NO -> COURSE_FEE, i.e., COURSE_FEE is
dependent on COURSE_NO, which is a proper subset of
the candidate key.
Non-prime attribute COURSE_FEE is dependent on a
proper subset of the candidate key, which is a partial
dependency and so this relation is not in 2NF.
2NF
• The second step in Normalization is 2NF.

• A table is in 2NF, only if a relation is in 1NF and meet all the rules, and
every non-key attribute is fully dependent on primary key.

• The Second Normal Form eliminates partial dependencies on primary


keys.
2NF Example

The prime key attributes are StudentID and ProjectID.

As stated, the non-prime attributes i.e. StudentName and ProjectName should be functionally dependent on
part of a candidate key, to be Partial Dependent.
The StudentName can be determined by StudentID, which makes the relation Partial Dependent.
The ProjectName can be determined by ProjectID, which makes the relation Partial Dependent.
Therefore, the <StudentProject> relation violates the 2NF in Normalization and is considered a bad database
design.
Solution
• To remove Partial Dependency and violation on 2NF, decompose the
above tables −
3NF
• Although Second Normal Form (2NF) relations have less redundancy
than those in 1NF, they may still suffer from update anomalies. If we
update only one tuple and not the other, the database would be in an
inconsistent state.
• This update anomaly is caused by a transitive dependency. We need
to remove such dependencies by progressing to Third Normal Form
(3NF).
Third Normal Form (3NF):

• A relation is in third normal form, if there is no transitive dependency


for non-prime attributes as well as it is in second normal form.
• A relation is in 3NF if at least one of the following condition holds in
every non-trivial function dependency X –> Y:
• X is a super key.
• Y is a prime attribute (each element of Y is part of some candidate key).
• A relation that is in First and Second Normal Form and in which no
non-primary-key attribute is transitively dependent on the primary
key, then it is in Third Normal Form (3NF).
3NF
• The normalization of 2NF relations to 3NF involves the removal of
transitive dependencies. If a transitive dependency exists, we remove
the transitively dependent attribute(s) from the relation by placing
the attribute(s) in a new relation along with a copy of the
determinant.
3NF Example

FD set:
{STUD_NO -> STUD_NAME, STUD_NO -> STUD_STATE, STUDENT (STUD_NO, STUD_NAME, STUD_PHONE,
STUD_STATE -> STUD_COUNTRY, STUD_NO -> STUD_AGE} STUD_STATE, STUD_AGE)
STATE_COUNTRY (STATE, COUNTRY)
Candidate Key:
{STUD_NO}
Solution
For this relation in table 4, STUD_NO -> STUD_STATE and
STUD_STATE -> STUD_COUNTRY are true. So STUD_COUNTRY is
transitively dependent on STUD_NO. It violates the third
normal form. To convert it in third normal form, we will
decompose the relation STUDENT (STUD_NO, STUD_NAME,
STUD_PHONE, STUD_STATE, STUD_COUNTRY_STUD_AGE) as:
Convert into 3NF
Solution
• Super key in the table above:
• {EMP_ID}, {EMP_ID, EMP_NAME}, {EMP_ID, EMP_NAME, EMP_ZIP}....so on

• Candidate key: {EMP_ID}

• Non-prime attributes: In the given table, all attributes except EMP_ID are non-prime.

• Here, EMP_STATE & EMP_CITY dependent on EMP_ZIP and EMP_ZIP dependent on


EMP_ID. The non-prime attributes (EMP_STATE, EMP_CITY) transitively dependent on
super key(EMP_ID). It violates the rule of third normal form.

• That's why we need to move the EMP_CITY and EMP_STATE to the new
<EMPLOYEE_ZIP> table, with EMP_ZIP as a Primary key.
Solution
BCNF
• BCNF is the advance version of 3NF. It is stricter than 3NF.
• A table is in BCNF if every functional dependency X → Y, X is the super
key of the table.
• For BCNF, the table should be in 3NF, and for every FD, LHS is super
key.
Example: A company where employees work
in more than one department.

Candidate key: {EMP-ID, EMP-DEPT}


EMP_ID → EMP_COUNTRY The table is not in BCNF because neither EMP_DEPT nor
EMP_DEPT → {DEPT_TYPE, EMP_DEPT_NO} EMP_ID alone are keys.
Functional Dependencies To convert the given table into BCNF, we decompose it into
three tables:
Functional dependencies:

EMP_ID → EMP_COUNTRY
EMP_DEPT → {DEPT_TYPE, EMP_DEPT_NO}
Candidate keys:

For the first table: EMP_ID


For the second table: EMP_DEPT
For the third table: {EMP_ID, EMP_DEPT}

Now, this is in BCNF because left side part of both the


functional dependencies is a key.
End of Module 1

You might also like