0% found this document useful (0 votes)
214 views

Hexaware Dbms

RDBMS stands for Relational Database Management System and is based on relational models introduced by E.F. Codd. RDBMS stores data in tables with rows and columns and uses keys to relate tables. It supports ACID properties for integrity. RDBMS is designed for large data and multiple users while DBMS can be for small data and single users. RDBMS reduces data redundancy through centralization and provides recovery, security and data sharing features.
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
214 views

Hexaware Dbms

RDBMS stands for Relational Database Management System and is based on relational models introduced by E.F. Codd. RDBMS stores data in tables with rows and columns and uses keys to relate tables. It supports ACID properties for integrity. RDBMS is designed for large data and multiple users while DBMS can be for small data and single users. RDBMS reduces data redundancy through centralization and provides recovery, security and data sharing features.
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 85

What is RDBMS

RDBMS stands for Relational Database Management Systems..

All modern database management systems like SQL, MS SQL Server, IBM DB2,
ORACLE, My-SQL and Microsoft Access are based on RDBMS.

It is called Relational Data Base Management System (RDBMS) because it is based on


relational model introduced by E.F. Codd.

How it works
Data is represented in terms of tuples (rows) in RDBMS.

Relational database is most commonly used database. It contains number of tables


and each table has its own primary key.

Due to a collection of organized set of tables, data can be accessed easily in RDBMS.

Brief History of RDBMS


During 1970 to 1972, E.F. Codd published a paper to propose the use of relational
database model.

RDBMS is originally based on that E.F. Codd's relational model invention.

What is table
The RDBMS database uses tables to store data. A table is a collection of related data
entries and contains rows and columns to store data.

A table is the simplest example of data storage in RDBMS.

Let's see the example of student table.

ID Name AGE COURSE

1 Ajeet 24 B.Tech

2 aryan 20 C.A

3 Mahesh 21 BCA
4 Ratan 22 MCA

5 Vimal 26 BSC

What is field
Field is a smaller entity of the table which contains specific information about every
record in the table. In the above example, the field in the student table consist of id,
name, age, course.

What is row or record


A row of a table is also called record. It contains the specific information of each
individual entry in the table. It is a horizontal entity in the table. For example: The
above table contains 5 records.

Let's see one record/row in the table.

1 Ajeet 24 B.Tech

What is column
A column is a vertical entity in the table which contains all information associated
with a specific field in a table. For example: "name" is a column in the above table
which contains all information about student's name.

Ajeet

Aryan

Mahesh

Ratan

Vimal

NULL Values
The NULL value of the table specifies that the field has been left blank during record
creation. It is totally different from the value filled with zero or a field that contains
space.
Data Integrity
There are the following categories of data integrity exist with each RDBMS:

Entity integrity: It specifies that there should be no duplicate rows in a table.

Domain integrity: It enforces valid entries for a given column by restricting the type,
the format, or the range of values.

Referential integrity: It specifies that rows cannot be deleted, which are used by
other records.

Difference between DBMS and RDBMS


Although DBMS and RDBMS both are used to store information in physical database
but there are some remarkable differences between them.

No DBMS RDBMS
.

1) DBMS applications store data as file. RDBMS applications store data in a tabular form.

2) In DBMS, data is generally stored in In RDBMS, the tables have an identifier called primary
either a hierarchical form or a key and the data values are stored in the form of
navigational form. tables.

3) Normalization is not present in Normalization is present in RDBMS.


DBMS.

4) DBMS does not apply any RDBMS defines the integrity constraint for the


security with regards to data purpose of ACID (Atomocity, Consistency, Isolation and
manipulation. Durability) property.

5) DBMS uses file system to store data, in RDBMS, data values are stored in the form of tables,
so there will be no relation between so a relationship between these data values will be
the tables. stored in the form of a table as well.

6) DBMS has to provide some uniform RDBMS system supports a tabular structure of the data
methods to access the stored and a relationship between them to access the stored
information. information.

7) DBMS does not support distributed RDBMS supports distributed database.


database.
8) DBMS is meant to be for small RDBMS is designed to handle large amount of data. it
organization and deal with small supports multiple users.
data. it supports single user.

9) Examples of DBMS are file Example of RDBMS are mysql, postgre, sql


systems, xml etc. server, oracle etc.

The main differences between DBMS and RDBMS are given below:

After observing the differences between DBMS and RDBMS, you can say that RDBMS
is an extension of DBMS. There are many software products in the market today who
are compatible for both DBMS and RDBMS. Means today a RDBMS application is
DBMS application and vice-versa.

DBMS vs. File System


File System Approach
File based systems were an early attempt to computerize the manual system. It is
also called a traditional based approach in which a decentralized approach was taken
where each department stored and controlled its own data with the help of a data
processing specialist. The main role of a data processing specialist was to create the
necessary computer file structures, and also manage the data within structures and
design some application programs that create reports based on file data.

In the above figure:


Consider an example of a student's file system. The student file will contain
information regarding the student (i.e. roll no, student name, course etc.). Similarly,
we have a subject file that contains information about the subject and the result file
which contains the information regarding the result.

Some fields are duplicated in more than one file, which leads to data redundancy. So
to overcome this problem, we need to create a centralized system, i.e. DBMS
approach.

DBMS:
A database approach is a well-organized collection of data that are related in a
meaningful way which can be accessed by different users but stored only once in a
system. The various operations performed by the DBMS system are: Insertion,
deletion, selection, sorting etc.

In the above figure,

In the above figure, duplication of data is reduced due to centralization of data.

There are the following differences between DBMS and File systems:

Basis DBMS Approach File System Approach


Meaning DBMS is a collection of data. In DBMS, the The file system is a collection of dat
user is not required to write the procedures. this system, the user has to write
procedures for managing the database

Sharing of data Due to the centralized approach, data Data is distributed in many files, an
sharing is easy. may be of different formats, so it isn't
to share data.

Data Abstraction DBMS gives an abstract view of data that The file system provides the detail o
hides the details. data representation and storage of da

Security and DBMS provides a good protection It isn't easy to protect a file under th
Protection mechanism. system.

Recovery DBMS provides a crash recovery The file system doesn't have a c
Mechanism mechanism, i.e., DBMS protects the user mechanism, i.e., if the system crashes w
from system failure. entering some data, then the conten
the file will be lost.

Manipulation DBMS contains a wide variety of The file system can't efficiently store
Techniques sophisticated techniques to store and retrieve the data.
retrieve the data.

Concurrency DBMS takes care of Concurrent access of In the File system, concurrent access
Problems data using some form of locking. many problems like redirecting the
while deleting some information
updating some information.

Where to use Database approach used in large systems File system approach used in large sys
which interrelate many files. which interrelate many files.

Cost The database system is expensive to design. The file system approach is cheape
design.

Data Redundancy Due to the centralization of the database, In this, the files and application prog
and Inconsistency the problems of data redundancy and are created by different programmer
inconsistency are controlled. that there exists a lot of duplicatio
data which may lead to inconsistency.

Structure The database structure is complex to The file system approach has a si
design. structure.

Data In this system, Data Independence exists, In the File system approach, there exis
Independence and it can be of two types. Data Independence.

o Logical Data Independence


o Physical Data Independence

Integrity Integrity Constraints are easy to apply. Integrity Constraints are difficult
Constraints implement in file system.

Data Models In the database approach, 3 types of data In the file system approach, there i
models exist: concept of data models exists.

o Hierarchal data models


o Network data models
o Relational data models

Flexibility Changes are often a necessity to the The flexibility of the system is les
content of the data stored in any system, compared to the DBMS approach.
and these changes are more easily with a
database approach.

Examples Oracle, SQL Server, Sybase etc. Cobol, C++ etc.

Three schema Architecture


o The three schema architecture is also called ANSI/SPARC architecture or three-level
architecture.
o This framework is used to describe the structure of a specific database system.
o The three schema architecture is also used to separate the user applications and
physical database.
o The three schema architecture contains three-levels. It breaks the database down into
three different categories.

The three-schema architecture is as follows:


In the above diagram:

o It shows the DBMS architecture.


o Mapping is used to transform the request and response between various database
levels of architecture.
o Mapping is not good for small DBMS because it takes more time.
o In External / Conceptual mapping, it is necessary to transform the request from
external level to conceptual schema.
o In Conceptual / Internal mapping, DBMS transform the request from the conceptual
to internal level.

Objectives of Three schema Architecture


The main objective of three level architecture is to enable multiple users to access
the same data with a personalized view while storing the underlying data only once.
Thus it separates the user's view from the physical structure of the database. This
separation is desirable for the following reasons:
o Different users need different views of the same data.
o The approach in which a particular user needs to see the data may change over time.
o The users of the database should not worry about the physical implementation and
internal workings of the database such as data compression and encryption
techniques, hashing, optimization of the internal structures etc.
o All users should be able to access the same data according to their requirements.
o DBA should be able to change the conceptual structure of the database without
affecting the user's
o Internal structure of the database should be unaffected by changes to physical
aspects of the storage.

1. Internal Level

o The internal level has an internal schema which describes the physical storage
structure of the database.
o The internal schema is also known as a physical schema.
o It uses the physical data model. It is used to define that how the data will be stored in
a block.
o The physical level is used to describe complex low-level data structures in detail.

The internal level is generally is concerned with the following activities:

10 Sec

How to find Nth Highest Salary in SQL

o Storage space allocations.


For Example: B-Trees, Hashing etc.
o Access paths.
For Example: Specification of primary and secondary keys, indexes, pointers and
sequencing.
o Data compression and encryption techniques.
o Optimization of internal structures.
o Representation of stored fields.

2. Conceptual Level

o The conceptual schema describes the design of a database at the conceptual level.
Conceptual level is also known as logical level.
o The conceptual schema describes the structure of the whole database.
o The conceptual level describes what data are to be stored in the database and also
describes what relationship exists among those data.
o In the conceptual level, internal details such as an implementation of the data
structure are hidden.
o Programmers and database administrators work at this level.

3. External Level

o At the external level, a database contains several schemas that sometimes called as
subschema. The subschema is used to describe the different view of the database.
o An external schema is also known as view schema.
o Each view schema describes the database part that a particular user group is
interested and hides the remaining database from that user group.
o The view schema describes the end user interaction with database systems.

Mapping between Views


The three levels of DBMS architecture don't exist independently of each other. There
must be correspondence between the three levels i.e. how they actually correspond
with each other. DBMS is responsible for correspondence between the three types of
schema. This correspondence is called Mapping.

There are basically two types of mapping in the database architecture:

o Conceptual/ Internal Mapping


o External / Conceptual Mapping

Conceptual/ Internal Mapping

The Conceptual/ Internal Mapping lies between the conceptual level and the internal
level. Its role is to define the correspondence between the records and fields of the
conceptual level and files and data structures of the internal level.

External/ Conceptual Mapping

The external/Conceptual Mapping lies between the external level and the Conceptual
level. Its role is to define the correspondence betweena particular external and the
conceptual view.

Data Models
Data Model is the modeling of the data description, data semantics, and consistency
constraints of the data. It provides the conceptual tools for describing the design of a
database at each level of data abstraction. Therefore, there are following four data
models used for understanding the structure of the database:
1) Relational Data Model: This type of model designs the data in the form of rows
and columns within a table. Thus, a relational model uses tables for representing data
and in-between relationships. Tables are also called relations. This model was initially
described by Edgar F. Codd, in 1969. The relational data model is the widely used
model which is primarily used by commercial data processing applications.

2) Entity-Relationship Data Model: An ER model is the logical representation of


data as objects and relationships among them. These objects are known as entities,
and relationship is an association among these entities. This model was designed by
Peter Chen and published in 1976 papers. It was widely used in database designing.
A set of attributes describe the entities. For example, student_name, student_id
describes the 'student' entity. A set of the same type of entities is known as an 'Entity
set', and the set of the same type of relationships is known as 'relationship set'.

3) Object-based Data Model: An extension of the ER model with notions of


functions, encapsulation, and object identity, as well. This model supports a rich type
system that includes structured and collection types. Thus, in 1980s, various database
systems following the object-oriented approach were developed. Here, the objects
are nothing but the data carrying its properties.

4) Semistructured Data Model: This type of data model is different from the other three
data models (explained above). The semistructured data model allows the data specifications
at places where the individual data items of the same type may have different attributes sets.
The Extensible Markup Language, also known as XML, is widely used for representing the
semistructured data. Although XML was initially designed for including the markup
information to the text document, it gains importance because of its application in the
exchange of data.

Data model Schema and Instance


o The data which is stored in the database at a particular moment of time is
called an instance of the database.
o The overall design of a database is called schema.
o A database schema is the skeleton structure of the database. It represents the
logical view of the entire database.
o A schema contains schema objects like table, foreign key, primary key, views,
columns, data types, stored procedure, etc.
o A database schema can be represented by using the visual diagram. That
diagram shows the database objects and relationship with each other.
o A database schema is designed by the database designers to help
programmers whose software will interact with the database. The process of
database creation is called data modeling.

A schema diagram can display only some aspects of a schema like the name of
record type, data type, and constraints. Other aspects can't be specified through the
schema diagram. For example, the given figure neither show the data type of each
data item nor the relationship among various files.

In the database, actual data changes quite frequently. For example, in the given
figure, the database changes whenever we add a new grade or add a student. The
data at a particular moment of time is called the instance of the database.
Data Independence
o Data independence can be explained using the three-schema architecture.
o Data independence refers characteristic of being able to modify the schema at
one level of the database system without altering the schema at the next
higher level.

There are two types of data independence:

1. Logical Data Independence


o Logical data independence refers characteristic of being able to change the
conceptual schema without having to change the external schema.
o Logical data independence is used to separate the external level from the
conceptual view.
o If we do any changes in the conceptual view of the data, then the user view of
the data would not be affected.
o Logical data independence occurs at the user interface level.
2. Physical Data Independence
o Physical data independence can be defined as the capacity to change the
internal schema without having to change the conceptual schema.
o If we do any changes in the storage size of the database system server, then
the Conceptual structure of the database will not be affected.
o Physical data independence is used to separate conceptual levels from the
internal levels.
o Physical data independence occurs at the logical interface level.

Fig: Data Independence

Database Language
o A DBMS has appropriate languages and interfaces to express database queries and
updates.
o Database languages can be used to read, store and update the data in the database.
Types of Database Language

1. Data Definition Language


o DDL stands for Data Definition Language. It is used to define database structure or
pattern.
o It is used to create schema, tables, indexes, constraints, etc. in the database.
o Using the DDL statements, you can create the skeleton of the database.
o Data definition language is used to store the information of metadata like the
number of tables and schemas, their names, indexes, columns in each table,
constraints, etc.

Here are some tasks that come under DDL:

o Create: It is used to create objects in the database.


o Alter: It is used to alter the structure of the database.
o Drop: It is used to delete objects from the database.
o Truncate: It is used to remove all records from a table.
o Rename: It is used to rename an object.
o Comment: It is used to comment on the data dictionary.

These commands are used to update the database schema that's why they come
under Data definition language.
2. Data Manipulation Language
DML stands for Data Manipulation Language. It is used for accessing and
manipulating data in a database. It handles user requests.

Here are some tasks that come under DML:

o Select: It is used to retrieve data from a database.


o Insert: It is used to insert data into a table.
o Update: It is used to update existing data within a table.
o Delete: It is used to delete all records from a table.
o Merge: It performs UPSERT operation, i.e., insert or update operations.
o Call: It is used to call a structured query language or a Java subprogram.
o Explain Plan: It has the parameter of explaining data.
o Lock Table: It controls concurrency.

3. Data Control Language


o DCL stands for Data Control Language. It is used to retrieve the stored or saved data.
o The DCL execution is transactional. It also has rollback parameters.

(But in Oracle database, the execution of data control language does not have
the feature of rolling back.)

Here are some tasks that come under DCL:

o Grant: It is used to give user access privileges to a database.


o Revoke: It is used to take back permissions from the user.

There are the following operations which have the authorization of Revoke:

CONNECT, INSERT, USAGE, EXECUTE, DELETE, UPDATE and SELECT.

4. Transaction Control Language


TCL is used to run the changes made by the DML statement. TCL can be grouped
into a logical transaction.

Here are some tasks that come under TCL:

o Commit: It is used to save the transaction on the database.


o Rollback: It is used to restore the database to original since the last Commit.

ACID Properties in DBMS


DBMS is the management of data that should remain integrated when any changes
are done in it. It is because if the integrity of the data is affected, whole data will get
disturbed and corrupted. Therefore, to maintain the integrity of the data, there are
four properties described in the database management system, which are known as
the ACID properties. The ACID properties are meant for the transaction that goes
through a different group of tasks, and there we come to see the role of the ACID
properties.

In this section, we will learn and understand about the ACID properties. We will learn
what these properties stand for and what does each property is used for. We will also
understand the ACID properties with the help of some examples.

ACID Properties
The expansion of the term ACID defines for:

1) Atomicity: The term atomicity defines that the data remains atomic. It means if
any operation is performed on the data, either it should be performed or executed
completely or should not be executed at all. It further means that the operation
should not break in between or execute partially. In the case of executing operations
on the transaction, the operation should be completely executed and not partially.

Example: If Remo has account A having $30 in his account from which he wishes to
send $10 to Sheero's account, which is B. In account B, a sum of $ 100 is already
present. When $10 will be transferred to account B, the sum will become $110. Now,
there will be two operations that will take place. One is the amount of $10 that Remo
wants to transfer will be debited from his account A, and the same amount will get
credited to account B, i.e., into Sheero's account. Now, what happens - the first
operation of debit executes successfully, but the credit operation, however, fails.
Thus, in Remo's account A, the value becomes $20, and to that of Sheero's account, it
remains $100 as it was previously present.

In the above diagram, it can be seen that after crediting $10, the amount is still $100
in account B. So, it is not an atomic transaction.

The below image shows that both debit and credit operations are done successfully.
Thus the transaction is atomic.

Thus, when the amount loses atomicity, then in the bank systems, this becomes a
huge issue, and so the atomicity is the main focus in the bank systems.
2) Consistency: The word consistency means that the value should remain
preserved always. In DBMS

, the integrity of the data should be maintained, which means if a change in the database is
made, it should remain preserved always. In the case of transactions, the integrity of the data
is very essential so that the database remains consistent before and after the transaction. The
data should always be correct.

Example:

In the above figure, there are three accounts, A, B, and C, where A is making a
transaction T one by one to both B & C. There are two operations that take place, i.e.,
Debit and Credit. Account A firstly debits $50 to account B, and the amount in
account A is read $300 by B before the transaction. After the successful transaction T,
the available amount in B becomes $150. Now, A debits $20 to account C, and that
time, the value read by C is $250 (that is correct as a debit of $50 has been
successfully done to B). The debit and credit operation from account A to C has been
done successfully. We can see that the transaction is done successfully, and the value
is also read correctly. Thus, the data is consistent. In case the value read by B and C is
$300, which means that data is inconsistent because when the debit operation
executes, it will not be consistent.

4) Isolation: The term 'isolation' means separation. In DBMS, Isolation is the


property of a database where no data should affect the other one and may occur
concurrently. In short, the operation on one database should begin when the
operation on the first database gets complete. It means if two operations are being
performed on two different databases, they may not affect the value of one another.
In the case of transactions, when two or more transactions occur simultaneously, the
consistency should remain maintained. Any changes that occur in any particular
transaction will not be seen by other transactions until the change is not committed
in the memory.

Example: If two operations are concurrently running on two different accounts, then
the value of both accounts should not get affected. The value should remain
persistent. As you can see in the below diagram, account A is making T1 and T2
transactions to account B and C, but both are executing independently without
affecting each other. It is known as Isolation.

4) Durability: Durability ensures the permanency of something. In DBMS, the term


durability ensures that the data after the successful execution of the operation
becomes permanent in the database. The durability of the data should be so perfect
that even if the system fails or leads to a crash, the database still survives. However, if
gets lost, it becomes the responsibility of the recovery manager for ensuring the
durability of the database. For committing the values, the COMMIT command must
be used every time we make changes.

Therefore, the ACID property of DBMS plays a vital role in maintaining the
consistency and availability of data in the database.

Thus, it was a precise introduction of ACID properties in DBMS. We have discussed


these properties in the transaction section also.
ER model
o ER model stands for an Entity-Relationship model. It is a high-level data model. This
model is used to define the data elements and relationship for a specified system.
o It develops a conceptual design for the database. It also develops a very simple and
easy to design view of data.
o In ER modeling, the database structure is portrayed as a diagram called an entity-
relationship diagram.

For example, Suppose we design a school database. In this database, the student


will be an entity with attributes like address, name, id, age, etc. The address can be
another entity with attributes like city, street name, pin code, etc and there will be a
relationship between them.
Component of ER Diagram

1. Entity:
An entity may be any object, class, person or place. In the ER diagram, an entity can
be represented as rectangles.

Consider an organization as an example- manager, product, employee, department


etc. can be taken as an entity.
a. Weak Entity

10 Sec

Prime Ministers of India | List of Prime Minister of India (1947-2020)

An entity that depends on another entity called a weak entity. The weak entity
doesn't contain any key attribute of its own. The weak entity is represented by a
double rectangle.

2. Attribute
The attribute is used to describe the property of an entity. Eclipse is used to
represent an attribute.

For example, id, age, contact number, name, etc. can be attributes of a student.
a. Key Attribute

The key attribute is used to represent the main characteristics of an entity. It


represents a primary key. The key attribute is represented by an ellipse with the text
underlined.

b. Composite Attribute

An attribute that composed of many other attributes is known as a composite


attribute. The composite attribute is represented by an ellipse, and those ellipses are
connected with an ellipse.
c. Multivalued Attribute

An attribute can have more than one value. These attributes are known as a
multivalued attribute. The double oval is used to represent multivalued attribute.

For example, a student can have more than one phone number.

d. Derived Attribute

An attribute that can be derived from other attribute is known as a derived attribute.
It can be represented by a dashed ellipse.

For example, A person's age changes over time and can be derived from another
attribute like Date of birth.

3. Relationship
A relationship is used to describe the relation between entities. Diamond or rhombus
is used to represent the relationship.
Types of relationship are as follows:

a. One-to-One Relationship

When only one instance of an entity is associated with the relationship, then it is
known as one to one relationship.

For example, A female can marry to one male, and a male can marry to one female.

b. One-to-many relationship

When only one instance of the entity on the left, and more than one instance of an
entity on the right associates with the relationship then this is known as a one-to-
many relationship.

For example, Scientist can invent many inventions, but the invention is done by the
only specific scientist.

c.
Many-to-one relationship

When more than one instance of the entity on the left, and only one instance of an
entity on the right associates with the relationship then it is known as a many-to-one
relationship.

For example, Student enrolls for only one course, but a course can have many
students.

d. Many-to-many relationship

When more than one instance of the entity on the left, and more than one instance
of an entity on the right associates with the relationship then it is known as a many-
to-many relationship.

For example, Employee can assign by many projects and project can have many
employees.

Notation of ER diagram
Database can be represented using the notations. In ER diagram, many notations are
used to express the cardinality. These notations are as follows:
Fig: Notations of ER diagram

Mapping Constraints
o A mapping constraint is a data constraint that expresses the number of
entities to which another entity can be related via a relationship set.
o It is most useful in describing the relationship sets that involve more than two
entity sets.
o For binary relationship set R on an entity set A and B, there are four possible
mapping cardinalities. These are as follows:
1. One to one (1:1)
2. One to many (1:M)
3. Many to one (M:1)
4. Many to many (M:M)
One-to-one
In one-to-one mapping, an entity in E1 is associated with at most one entity in E2, and an
entity in E2 is associated with at most one entity in E1.

One-to-many
In one-to-many mapping, an entity in E1 is associated with any number of entities in E2, and
an entity in E2 is associated with at most one entity in E1.

Many-to-one
In one-to-many mapping, an entity in E1 is associated with at most one entity in E2, and an
entity in E2 is associated with any number of entities in E1.
Many-to-many
In many-to-many mapping, an entity in E1 is associated with any number of entities
in E2, and an entity in E2 is associated with any number of entities in E1.

Keys
o Keys play an important role in the relational database.
o It is used to uniquely identify any record or row of data from the table. It is also used
to establish and identify relationships between tables.

For example: In Student table, ID is used as a key because it is unique for each
student. In PERSON table, passport_number, license_number, SSN are keys since they
are unique for each person.
Types of key:

1. Primary key
o It is the first key which is used to identify one and only one instance of an entity
uniquely. An entity can contain multiple keys as we saw in PERSON table. The key
which is most suitable from those lists become a primary key.
o In the EMPLOYEE table, ID can be primary key since it is unique for each employee. In
the EMPLOYEE table, we can even select License_Number and Passport_Number as
primary key since they are also unique.
o For each entity, selection of the primary key is based on requirement and developers.

2. Candidate key
o A candidate key is an attribute or set of an attribute which can uniquely identify a
tuple.
o The remaining attributes except for primary key are considered as a candidate key.
The candidate keys are as strong as the primary key.

For example: In the EMPLOYEE table, id is best suited for the primary key. Rest of
the attributes like SSN, Passport_Number, and License_Number, etc. are considered
as a candidate key.
3. Super Key
Super key is a set of an attribute which can uniquely identify a tuple. Super key is a
superset of a candidate key.

For example: In the above EMPLOYEE table, for(EMPLOEE_ID, EMPLOYEE_NAME) the


name of two employees can be the same, but their EMPLYEE_ID can't be the same.
Hence, this combination can also be a key.

The super key would be EMPLOYEE-ID, (EMPLOYEE_ID, EMPLOYEE-NAME), etc.

4. Foreign key
o Foreign keys are the column of the table which is used to point to the primary key of
another table.
o In a company, every employee works in a specific department, and employee and
department are two different entities. So we can't store the information of the
department in the employee table. That's why we link these two tables through the
primary key of one table.
o We add the primary key of the DEPARTMENT table, Department_Id as a new attribute
in the EMPLOYEE table.
o Now in the EMPLOYEE table, Department_Id is the foreign key, and both the tables
are related.

Generalization
o Generalization is like a bottom-up approach in which two or more entities of
lower level combine to form a higher level entity if they have some attributes
in common.
o In generalization, an entity of a higher level can also combine with the entities
of the lower level to form a further higher level entity.
o Generalization is more like subclass and superclass system, but the only
difference is the approach. Generalization uses the bottom-up approach.
o In generalization, entities are combined to form a more generalized entity, i.e.,
subclasses are combined to make a superclass.

For example, Faculty and Student entities can be generalized and create a higher
level entity Person.
Next →← Prev

Specialization
o Specialization is a top-down approach, and it is opposite to Generalization. In
specialization, one higher level entity can be broken down into two lower level
entities.
o Specialization is used to identify the subset of an entity set that shares some
distinguishing characteristics.
o Normally, the superclass is defined first, the subclass and its related attributes
are defined next, and relationship set are then added.

For example: In an Employee management system, EMPLOYEE entity can be


specialized as TESTER or DEVELOPER based on what role they play in the company.
Aggregation
In aggregation, the relation between two entities is treated as a single entity. In
aggregation, relationship with its corresponding entities is aggregated into a higher
level entity.

For example: Center entity offers the Course entity act as a single entity in the
relationship which is in a relationship with another entity visitor. In the real world, if a
visitor visits a coaching center then he will never enquiry about the Course only or
just about the Center instead he will ask the enquiry about both.
Reduction of ER diagram to Table
The database can be represented using the notations, and these notations can be
reduced to a collection of tables.

In the database, every entity set or relationship set can be represented in tabular
form.
The ER diagram is given below:

There are some points for converting the ER diagram to the table:

10 Sec
SQL CREATE TABLE

o Entity type becomes a table.

In the given ER diagram, LECTURE, STUDENT, SUBJECT and COURSE forms individual
tables.

o All single-valued attribute becomes a column for the table.

In the STUDENT entity, STUDENT_NAME and STUDENT_ID form the column of


STUDENT table. Similarly, COURSE_NAME and COURSE_ID form the column of
COURSE table and so on.

o A key attribute of the entity type represented by the primary key.

In the given ER diagram, COURSE_ID, STUDENT_ID, SUBJECT_ID, and LECTURE_ID are


the key attribute of the entity.

o The multivalued attribute is represented by a separate table.


In the student table, a hobby is a multivalued attribute. So it is not possible to
represent multiple values in a single column of STUDENT table. Hence we create a
table STUD_HOBBY with column name STUDENT_ID and HOBBY. Using both the
column, we create a composite key.

o Composite attribute represented by components.

In the given ER diagram, student address is a composite attribute. It contains CITY,


PIN, DOOR#, STREET, and STATE. In the STUDENT table, these attributes can merge
as an individual column.

o Derived attributes are not considered in the table.

In the STUDENT table, Age is the derived attribute. It can be calculated at any point
of time by calculating the difference between current date and Date of Birth.

Using these rules, you can convert the ER diagram to tables and columns and assign
the mapping between the tables. Table structure for the given ER diagram is as
below:

Figure: Table structure


Relationship of higher degree
The degree of relationship can be defined as the number of occurrences in one entity
that is associated with the number of occurrences in another entity.

There is the three degree of relationship:

1. One-to-one (1:1)
2. One-to-many (1:M)
3. Many-to-many (M:N)

1. One-to-one
o In a one-to-one relationship, one occurrence of an entity relates to only one
occurrence in another entity.
o A one-to-one relationship rarely exists in practice.
o For example: if an employee is allocated a company car then that car can only be
driven by that employee.
o Therefore, employee and company car have a one-to-one relationship.

2. One-to-many
o In a one-to-many relationship, one occurrence in an entity relates to many
occurrences in another entity.
o For example: An employee works in one department, but a department has many
employees.
o Therefore, department and employee have a one-to-many relationship.
3. Many-to-many
o In a many-to-many relationship, many occurrences in an entity relate to many
occurrences in another entity.
o Same as a one-to-one relationship, the many-to-many relationship rarely exists in
practice.
o For example: At the same time, an employee can work on several projects, and a
project has a team of many employees.
o Therefore, employee and project have a many-to-many relationship.

Relational Model concept


Relational model can represent as a table with columns and rows. Each row is known as a
tuple. Each table of the column has a name or attribute.

Domain: It contains a set of atomic values that an attribute can take.

Attribute: It contains the name of a column in a particular table. Each attribute Ai must have
a domain, dom(Ai)

Relational instance: In the relational database system, the relational instance is


represented by a finite set of tuples. Relation instances do not have duplicate tuples.

Relational schema: A relational schema contains the name of the relation and name
of all columns or attributes.

Relational key: In the relational key, each row has one or more attributes. It can
identify the row in the relation uniquely.

Example: STUDENT Relation

NAME ROLL_NO PHONE_NO ADDRESS AGE

Ram 14795 7305758992 Noida 24

Shyam 12839 9026288936 Delhi 35


Laxman 33289 8583287182 Gurugram 20

Mahesh 27857 7086819134 Ghaziabad 27

Ganesh 17282 9028 9i3988 Delhi 40

o In the given table, NAME, ROLL_NO, PHONE_NO, ADDRESS, and AGE are the
attributes.
o The instance of schema STUDENT has 5 tuples.
o t3 = <Laxman, 33289, 8583287182, Gurugram, 20>

Properties of Relations
o Name of the relation is distinct from all other relations.
o Each relation cell contains exactly one atomic (single) value
o Each attribute contains a distinct name
o Attribute domain has no significance
o tuple has no duplicate value
o Order of tuple can have a different sequence

Relational Algebra
Relational algebra is a procedural query language. It gives a step by step process to
obtain the result of the query. It uses operators to perform queries.

Types of Relational operation


1. Select Operation:
o The select operation selects tuples that satisfy a given predicate.
o It is denoted by sigma (σ).

1. Notation:  σ p(r)  

Where:

σ is used for selection prediction


r is used for relation
p is used as a propositional logic formula which may use connectors like: AND OR
and NOT. These relational can use as relational operators like =, ≠, ≥, <, >, ≤.

For example: LOAN Relation

BRANCH_NAME LOAN_NO AMOUNT

Downtown L-17 1000

Redwood L-23 2000

Perryride L-15 1500

Downtown L-14 1500

Mianus L-13 500

Roundhill L-11 900

Perryride L-16 1300

Input:

1. σ BRANCH_NAME="perryride" (LOAN)  

Output:

BRANCH_NAME LOAN_NO AMOUNT


Perryride L-15 1500

Perryride L-16 1300

2. Project Operation:
o This operation shows the list of those attributes that we wish to appear in the result.
Rest of the attributes are eliminated from the table.
o It is denoted by ∏.

1. Notation: ∏ A1, A2, An (r)   

Where

A1, A2, A3 is used as an attribute name of relation r.

Example: CUSTOMER RELATION

NAME STREET CITY

Jones Main Harrison

Smith North Rye

Hays Main Harrison

Curry North Rye

Johnson Alma Brooklyn

Brooks Senator Brooklyn

Input:

1. ∏ NAME, CITY (CUSTOMER)  

Output:

NAME CITY
Jones Harrison

Smith Rye

Hays Harrison

Curry Rye

Johnson Brooklyn

Brooks Brooklyn

3. Union Operation:
o Suppose there are two tuples R and S. The union operation contains all the tuples
that are either in R or S or both in R & S.
o It eliminates the duplicate tuples. It is denoted by ∪.

1. Notation: R ∪ S   

A union operation must hold the following condition:

o R and S must have the attribute of the same number.


o Duplicate tuples are eliminated automatically.

Example:
DEPOSITOR RELATION

CUSTOMER_NAME ACCOUNT_NO

Johnson A-101

Smith A-121

Mayes A-321
Turner A-176

Johnson A-273

Jones A-472

Lindsay A-284

BORROW RELATION

CUSTOMER_NAME LOAN_NO

Jones L-17

Smith L-23

Hayes L-15

Jackson L-14

Curry L-93

Smith L-11

Williams L-17

Input:

1. ∏ CUSTOMER_NAME (BORROW) ∪ ∏ CUSTOMER_NAME (DEPOSITOR)  

Output:

CUSTOMER_NAME

Johnson
Smith

Hayes

Turner

Jones

Lindsay

Jackson

Curry

Williams

Mayes

4. Set Intersection:
o Suppose there are two tuples R and S. The set intersection operation contains all
tuples that are in both R & S.
o It is denoted by intersection ∩.

1. Notation: R ∩ S   

Example: Using the above DEPOSITOR table and BORROW table

nput:

1. ∏ CUSTOMER_NAME (BORROW) ∩ ∏ CUSTOMER_NAME (DEPOSITOR)  

Output:

CUSTOMER_NAME

Smith

Jones
5. Set Difference:
o Suppose there are two tuples R and S. The set intersection operation contains all
tuples that are in R but not in S.
o It is denoted by intersection minus (-).

1. Notation: R - S  

Example: Using the above DEPOSITOR table and BORROW table

Input:

1. ∏ CUSTOMER_NAME (BORROW) - ∏ CUSTOMER_NAME (DEPOSITOR)  

Output:

CUSTOMER_NAME

Jackson

Hayes

Willians

Curry

6. Cartesian product
o The Cartesian product is used to combine each row in one table with each row in the
other table. It is also known as a cross product.
o It is denoted by X.

1. Notation: E X D  
Example:
EMPLOYEE

EMP_ID EMP_NAME EMP_DEPT

1 Smith A
2 Harry C

3 John B

DEPARTMENT

DEPT_NO DEPT_NAME

A Marketing

B Sales

C Legal

Input:

1. EMPLOYEE X DEPARTMENT  

Output:

EMP_ID EMP_NAME EMP_DEPT DEPT_NO DEPT_NAME

1 Smith A A Marketing

1 Smith A B Sales

1 Smith A C Legal

2 Harry C A Marketing

2 Harry C B Sales

2 Harry C C Legal
3 John B A Marketing

3 John B B Sales

3 John B C Legal

7. Rename Operation:
The rename operation is used to rename the output relation. It is denoted by rho (ρ).

Example: We can use the rename operator to rename STUDENT relation to


STUDENT1.

1. ρ(STUDENT1, STUDENT)  

Join Operations:
A Join operation combines related tuples from different relations, if and only if a
given join condition is satisfied. It is denoted by ⋈.

Example:
EMPLOYEE

EMP_CODE EMP_NAME

101 Stephan

102 Jack

103 Harry

SALARY

EMP_CODE SALARY

101 50000

102 30000
103 25000

1. Operation: (EMPLOYEE ⋈ SALARY)   

Result:

EMP_CODE EMP_NAME SALARY

101 Stephan 50000

102 Jack 30000

103 Harry 25000

Types of Join operations:


1. Natural Join:
o A natural join is the set of tuples of all combinations in R and S that are equal on their
common attribute names.
o It is denoted by ⋈.

Example: Let's use the above EMPLOYEE table and SALARY table:

Input:

1. ∏EMP_NAME, SALARY (EMPLOYEE ⋈ SALARY)  

Output:

EMP_NAME SALARY

Stephan 50000
Jack 30000

Harry 25000

2. Outer Join:
The outer join operation is an extension of the join operation. It is used to deal with
missing information.

Example:

EMPLOYEE

EMP_NAME STREET CITY

Ram Civil line Mumbai

Shyam Park street Kolkata

Ravi M.G. Street Delhi

Hari Nehru nagar Hyderabad

FACT_WORKERS

EMP_NAME BRANCH SALARY

Ram Infosys 10000

Shyam Wipro 20000

Kuber HCL 30000

Hari TCS 50000

Input:
1. (EMPLOYEE ⋈ FACT_WORKERS)  

Output:

EMP_NAME STREET CITY BRANCH SALARY

Ram Civil line Mumbai Infosys 10000

Shyam Park street Kolkata Wipro 20000

Hari Nehru nagar Hyderabad TCS 50000

An outer join is basically of three types:

a. Left outer join


b. Right outer join
c. Full outer join

a. Left outer join:


o Left outer join contains the set of tuples of all combinations in R and S that are equal
on their common attribute names.
o In the left outer join, tuples in R have no matching tuples in S.
o It is denoted by ⟕.

Example: Using the above EMPLOYEE table and FACT_WORKERS table

Input:

1. EMPLOYEE ⟕ FACT_WORKERS   

EMP_NAME STREET CITY BRANCH SALARY

Ram Civil line Mumbai Infosys 10000

Shyam Park street Kolkata Wipro 20000

Hari Nehru street Hyderabad TCS 50000


Ravi M.G. Street Delhi NULL NULL

b. Right outer join:


o Right outer join contains the set of tuples of all combinations in R and S that are
equal on their common attribute names.
o In right outer join, tuples in S have no matching tuples in R.
o It is denoted by ⟖.

Example: Using the above EMPLOYEE table and FACT_WORKERS Relation

Input:

1. EMPLOYEE ⟖ FACT_WORKERS  

Output:

EMP_NAME BRANCH SALARY STREET CITY

Ram Infosys 10000 Civil line Mumbai

Shyam Wipro 20000 Park street Kolkata

Hari TCS 50000 Nehru street Hyderabad

Kuber HCL 30000 NULL NULL

c. Full outer join:


o Full outer join is like a left or right join except that it contains all rows from both
tables.
o In full outer join, tuples in R that have no matching tuples in S and tuples in S that
have no matching tuples in R in their common attribute name.
o It is denoted by ⟗.

Example: Using the above EMPLOYEE table and FACT_WORKERS table

Input:

1. EMPLOYEE ⟗ FACT_WORKERS  
Output:

EMP_NAME STREET CITY BRANCH SALARY

Ram Civil line Mumbai Infosys 10000

Shyam Park street Kolkata Wipro 20000

Hari Nehru street Hyderabad TCS 50000

Ravi M.G. Street Delhi NULL NULL

Kuber NULL NULL HCL 30000

3. Equi join:
It is also known as an inner join. It is the most common join. It is based on matched
data as per the equality condition. The equi join uses the comparison operator(=).

Example:

CUSTOMER RELATION

CLASS_ID NAME

1 John

2 Harry

3 Jackson

PRODUCT

PRODUCT_ID CITY

1 Delhi
2 Mumbai

3 Noida

Input:

1. CUSTOMER ⋈ PRODUCT    

Output:

CLASS_ID NAME PRODUCT_ID CITY

1 John 1 Delhi

2 Harry 2 Mumbai

3 Harry 3 Noida

Integrity Constraints
o Integrity constraints are a set of rules. It is used to maintain the quality of
information.
o Integrity constraints ensure that the data insertion, updating, and other processes
have to be performed in such a way that data integrity is not affected.
o Thus, integrity constraint is used to guard against accidental damage to the database.

Types of Integrity Constraint


1. Domain constraints
o Domain constraints can be defined as the definition of a valid set of values for an
attribute.
o The data type of domain includes string, character, integer, time, date, currency, etc.
The value of the attribute must be available in the corresponding domain.

Example:

2. Entity integrity constraints


o The entity integrity constraint states that primary key value can't be null.
o This is because the primary key value is used to identify individual rows in relation
and if the primary key has a null value, then we can't identify those rows.
o A table can contain a null value other than the primary key field.
Example:

3. Referential Integrity Constraints


o A referential integrity constraint is specified between two tables.
o In the Referential integrity constraints, if a foreign key in Table 1 refers to the Primary
Key of Table 2, then every value of the Foreign Key in Table 1 must be null or be
available in Table 2.

Example:
4. Key constraints
o Keys are the entity set that is used to identify an entity within its entity set uniquely.
o An entity set can have multiple keys, but out of which one key will be the primary key.
A primary key can contain a unique and null value in the relational table.

Example:

Competitive questions on Structures

Next →← Prev

Relational Calculus
o Relational calculus is a non-procedural query language. In the non-procedural query
language, the user is concerned with the details of how to obtain the end results.
o The relational calculus tells what to do but never explains how to do.

Types of Relational calculus:


1. Tuple Relational Calculus (TRC)
o The tuple relational calculus is specified to select the tuples in a relation. In TRC,
filtering variable uses the tuples of a relation.
o The result of the relation can have one or more tuples.

Notation:

1. {T | P (T)}   or {T | Condition (T)}   

Where

T is the resulting tuples

P(T) is the condition used to fetch T.

Java Try Catch

For example:

1. { T.name | Author(T) AND T.article = 'database' }  

OUTPUT: This query selects the tuples from the AUTHOR relation. It returns a tuple
with 'name' from Author who has written an article on 'database'.
TRC (tuple relation calculus) can be quantified. In TRC, we can use Existential (∃) and
Universal Quantifiers (∀).

For example:

1. { R| ∃T ∈ Authors(T.article='database' AND R.name=T.name)}  

Output: This query will yield the same result as the previous one.

2. Domain Relational Calculus (DRC)


o The second form of relation is known as Domain relational calculus. In domain
relational calculus, filtering variable uses the domain of attributes.
o Domain relational calculus uses the same operators as tuple calculus. It uses logical
connectives ∧ (and), ∨ (or) and ┓ (not).
o It uses Existential (∃) and Universal Quantifiers (∀) to bind the variable.

Notation:

1. { a1, a2, a3, ..., an | P (a1, a2, a3, ... ,an)}  

Where

a1,a2 areattributes
P stands for formula built by inner attributes

For example:

1. {< article, page, subject > |  ∈ javatpoint ∧ subject = 'database'}  

Output: This query will yield the article, page, and subject from the relational
javatpoint, where the subject is a database.
SQL
o SQL stands for Structured Query Language. It is used for storing and managing data
in relational database management system (RDMS).
o It is a standard language for Relational Database System. It enables a user to create,
read, update and delete relational databases and tables.
o All the RDBMS like MySQL, Informix, Oracle, MS Access and SQL Server use SQL as
their standard database language.
o SQL allows users to query the database in a number of ways, using English-like
statements.

Rules:
SQL follows the following rules:

o Structure query language is not case sensitive. Generally, keywords of SQL are written
in uppercase.
o Statements of SQL are dependent on text lines. We can use a single SQL statement
on one or multiple text line.
o Using the SQL statements, you can perform most of the actions in a database.
o SQL depends on tuple relational calculus and relational algebra.

SQL process:
o When an SQL command is executing for any RDBMS, then the system figure out the
best way to carry out the request and the SQL engine determines that how to
interpret the task.
o In the process, various components are included. These components can be
optimization Engine, Query engine, Query dispatcher, classic, etc.
o All the non-SQL queries are handled by the classic query engine, but SQL query
engine won't handle logical files.
Characteristics of SQL
o SQL is easy to learn.
o SQL is used to access data from relational database management systems.
o SQL can execute queries against the database.
o SQL is used to describe the data.
o SQL is used to define the data in the database and manipulate it when
needed.
o SQL is used to create and drop the database and table.
o SQL is used to create a view, stored procedure, function in a database.
o SQL allows users to set permissions on tables, procedures, and views.
Normalization:

Functional Dependency
The functional dependency is a relationship that exists between two attributes. It
typically exists between the primary key and non-key attribute within a table.

1. X   →   Y  

The left side of FD is known as a determinant, the right side of the production is
known as a dependent.

For example:

Assume we have an employee table with attributes: Emp_Id, Emp_Name,


Emp_Address.

Here Emp_Id attribute can uniquely identify the Emp_Name attribute of employee
table because if we know the Emp_Id, we can tell that employee name associated
with it.

Functional dependency can be written as:

1. Emp_Id → Emp_Name   

We can say that Emp_Name is functionally dependent on Emp_Id.

Types of Functional dependency


1. Trivial functional dependency
o A → B has trivial functional dependency if B is a subset of A.
o The following dependencies are also trivial like: A → A, B → B

Example:

1. Consider a table with two columns Employee_Id and Employee_Name.  
2. {Employee_id, Employee_Name}   →    Employee_Id is a trivial functional dependency 
as   
3. Employee_Id is a subset of {Employee_Id, Employee_Name}.  
4. Also, Employee_Id → Employee_Id and Employee_Name   →    Employee_Name are tri
vial dependencies too.  
2. Non-trivial functional dependency
o A → B has a non-trivial functional dependency if B is not a subset of A.
o When A intersection B is NULL, then A → B is called as complete non-trivial.

Example:

1. ID   →    Name,  
2. Name   →    DOB  

Inference Rule (IR):


o The Armstrong's axioms are the basic inference rule.
o Armstrong's axioms are used to conclude functional dependencies on a
relational database.
o The inference rule is a type of assertion. It can apply to a set of FD(functional
dependency) to derive other FD.
o Using the inference rule, we can derive additional functional dependency from
the initial set.

The Functional dependency has 6 types of inference rule:


1. Reflexive Rule (IR1)
In the reflexive rule, if Y is a subset of X, then X determines Y.

1. If X ⊇ Y then X  →    Y  

Example:

1. X = {a, b, c, d, e}  
2. Y = {a, b, c}  

2. Augmentation Rule (IR2)


The augmentation is also called as a partial dependency. In augmentation, if X
determines Y, then XZ determines YZ for any Z.

How to find Nth Highest Salary in SQL

1. If X    →  Y then XZ   →   YZ   

Example:

1. For R(ABCD),  if A   →   B then AC  →   BC  

3. Transitive Rule (IR3)


In the transitive rule, if X determines Y and Y determine Z, then X must also
determine Z.

1. If X   →   Y and Y  →  Z then X  →   Z    

4. Union Rule (IR4)


Union rule says, if X determines Y and X determines Z, then X must also determine Y
and Z.

1. If X    →  Y and X   →  Z then X  →    YZ     

Proof:

1. X → Y (given)
2. X → Z (given)
3. X → XY (using IR2 on 1 by augmentation with X. Where XX = X)
4. XY → YZ (using IR2 on 2 by augmentation with Y)
5. X → YZ (using IR3 on 3 and 4)

5. Decomposition Rule (IR5)


Decomposition rule is also known as project rule. It is the reverse of union rule.

This Rule says, if X determines Y and Z, then X determines Y and X determines Z


separately.

1. If X   →   YZ then X   →   Y and X  →    Z   

Proof:

1. X → YZ (given)
2. YZ → Y (using IR1 Rule)
3. X → Y (using IR3 on 1 and 2)

6. Pseudo transitive Rule (IR6)


In Pseudo transitive Rule, if X determines Y and YZ determines W, then XZ
determines W.

1. If X   →   Y and YZ   →   W then XZ   →   W   

Proof:

1. X → Y (given)
2. WY → Z (given)
3. WX → WY (using IR2 on 1 by augmenting with W)
4. WX → Z (using IR3 on 3 and 2)

Normalization
o Normalization is the process of organizing the data in the database.
o Normalization is used to minimize the redundancy from a relation or set of
relations. It is also used to eliminate the undesirable characteristics like
Insertion, Update and Deletion Anomalies.
o Normalization divides the larger table into the smaller table and links them
using relationship.
o The normal form is used to reduce redundancy from the database table.
Types of Normal Forms
There are the four types of normal forms:

Normal Description
Form

1NF A relation is in 1NF if it contains an atomic value.

2NF A relation will be in 2NF if it is in 1NF and all non-key attributes are fully functional
dependent on the primary key.

3NF A relation will be in 3NF if it is in 2NF and no transition dependency exists.

4NF A relation will be in 4NF if it is in Boyce Codd normal form and has no multi-valued
dependency.

5NF A relation is in 5NF if it is in 4NF and not contains any join dependency and joining
should be lossless.

First Normal Form (1NF)


o A relation will be 1NF if it contains an atomic value.
o It states that an attribute of a table cannot hold multiple values. It must hold
only single-valued attribute.
o First normal form disallows the multi-valued attribute, composite attribute,
and their combinations.

Example: Relation EMPLOYEE is not in 1NF because of multi-valued attribute


EMP_PHONE.

EMPLOYEE table:

EMP_ID EMP_NAME EMP_PHONE EMP_STATE

14 John 7272826385, UP
9064738238

20 Harry 8574783832 Bihar

12 Sam 7390372389, Punjab


8589830302

The decomposition of the EMPLOYEE table into 1NF has been shown below:
EMP_ID EMP_NAME EMP_PHONE EMP_STATE

14 John 7272826385 UP

14 John 9064738238 UP

20 Harry 8574783832 Bihar

12 Sam 7390372389 Punjab

12 Sam 8589830302 Punjab

Second Normal Form (2NF)


o In the 2NF, relational must be in 1NF.
o In the second normal form, all non-key attributes are fully functional
dependent on the primary key

Example: Let's assume, a school can store the data of teachers and the subjects they
teach. In a school, a teacher can teach more than one subject.

TEACHER table

TEACHER_ID SUBJECT TEACHER_AGE

25 Chemistry 30

25 Biology 30

47 English 35

83 Math 38

83 Computer 38
In the given table, non-prime attribute TEACHER_AGE is dependent on TEACHER_ID
which is a proper subset of a candidate key. That's why it violates the rule for 2NF.

To convert the given table into 2NF, we decompose it into two tables:

Exception Handling in Java - Javatpoint

TEACHER_DETAIL table:

TEACHER_ID TEACHER_AGE

25 30

47 35

83 38

TEACHER_ID SUBJECT

25 Chemistry

25 Biology

47 English

83 Math

83 Computer

TEACHER_SUBJECT table:

Third Normal Form (3NF)


o A relation will be in 3NF if it is in 2NF and not contain any transitive partial
dependency.
o 3NF is used to reduce the data duplication. It is also used to achieve the data
integrity.
o If there is no transitive dependency for non-prime attributes, then the relation
must be in third normal form.

A relation is in third normal form if it holds atleast one of the following conditions for
every non-trivial function dependency X → Y.

1. X is a super key.
2. Y is a prime attribute, i.e., each element of Y is part of some candidate key.

Example:

EMPLOYEE_DETAIL table:

EMP_ID EMP_NAME EMP_ZIP EMP_STATE EMP_CITY

222 Harry 201010 UP Noida

333 Stephan 02228 US Boston

444 Lan 60007 US Chicago

555 Katharine 06389 UK Norwich

666 John 462007 MP Bhopal

Super key in the table above:

1. {EMP_ID}, {EMP_ID, EMP_NAME}, {EMP_ID, EMP_NAME, EMP_ZIP}....so o
n  

Candidate key: {EMP_ID}

Non-prime attributes: In the given table, all attributes except EMP_ID are
non-prime.

Here, EMP_STATE & EMP_CITY dependent on EMP_ZIP and EMP_ZIP


dependent on EMP_ID. The non-prime attributes (EMP_STATE, EMP_CITY)
transitively dependent on super key(EMP_ID). It violates the rule of third
normal form.

That's why we need to move the EMP_CITY and EMP_STATE to the new
<EMPLOYEE_ZIP> table, with EMP_ZIP as a Primary key.

EMPLOYEE table:

EMP_ID EMP_NAME EMP_ZIP

222 Harry 201010

333 Stephan 02228

444 Lan 60007

555 Katharine 06389

666 John 462007

EMPLOYEE_ZIP table:

EMP_ZIP EMP_STATE EMP_CITY

201010 UP Noida

02228 US Boston

60007 US Chicago

06389 UK Norwich

462007 MP Bhopal

Boyce Codd normal form (BCNF)


o BCNF is the advance version of 3NF. It is stricter than 3NF.
o A table is in BCNF if every functional dependency X → Y, X is the super key of
the table.
o For BCNF, the table should be in 3NF, and for every FD, LHS is super key.
Example: Let's assume there is a company where employees work in more than one
department.

EMPLOYEE table:

EMPLOYEE table:

EMP_ID EMP_COUNTRY EMP_DEPT DEPT_TYPE EMP_DEPT_NO

264 India Designing D394 283

264 India Testing D394 300

364 UK Stores D283 232

364 UK Developing D283 549

In the above table Functional dependencies are as follows:

1. EMP_ID  →  EMP_COUNTRY  
2. EMP_DEPT  →   {DEPT_TYPE, EMP_DEPT_NO}  

Candidate key: {EMP-ID, EMP-DEPT}

The table is not in BCNF because neither EMP_DEPT nor EMP_ID alone are keys.

To convert the given table into BCNF, we decompose it into three tables:

EMP_COUNTRY table:

EMP_ID EMP_COUNTRY

264 India

264 India

EMP_DEPT table:

EMP_DEPT DEPT_TYPE EMP_DEPT_NO

Designing D394 283


Testing D394 300

Stores D283 232

Developing D283 549

EMP_DEPT_MAPPING table:

EMP_ID EMP_DEPT

D394 283

D394 300

D283 232

D283 549

Functional dependencies:

1. EMP_ID   →    EMP_COUNTRY  
2. EMP_DEPT   →   {DEPT_TYPE, EMP_DEPT_NO}  

Candidate keys:

For the first table: EMP_ID


For the second table: EMP_DEPT
For the third table: {EMP_ID, EMP_DEPT}

Now, this is in BCNF because left side part of both the functional dependencies is a
key.

Fourth normal form (4NF)


o A relation will be in 4NF if it is in Boyce Codd normal form and has no multi-valued
dependency.
o For a dependency A → B, if for a single value of A, multiple values of B exists, then the
relation will be a multi-valued dependency.

Example
STUDENT
STU_ID COURSE HOBBY

21 Computer Dancing

21 Math Singing

34 Chemistry Dancing

74 Biology Cricket

59 Physics Hockey

The given STUDENT table is in 3NF, but the COURSE and HOBBY are two
independent entity. Hence, there is no relationship between COURSE and HOBBY.

In the STUDENT relation, a student with STU_ID, 21 contains two


courses, Computer and Math and two hobbies, Dancing and Singing. So there is a
Multi-valued dependency on STU_ID, which leads to unnecessary repetition of data.

So to make the above table into 4NF, we can decompose it into two tables:

Keep Watching

STUDENT_COURSE

STU_ID COURSE

21 Computer

21 Math

34 Chemistry

74 Biology

59 Physics

STUDENT_HOBBY
STU_ID HOBBY

21 Dancing

21 Singing

34 Dancing

74 Cricket

59 Hockey

Fifth normal form (5NF)


o A relation is in 5NF if it is in 4NF and not contains any join dependency and joining
should be lossless.
o 5NF is satisfied when all the tables are broken into as many tables as possible in
order to avoid redundancy.
o 5NF is also known as Project-join normal form (PJ/NF).

Example
SUBJECT LECTURER SEMESTER

Computer Anshika Semester 1

Computer John Semester 1

Math John Semester 1

Math Akash Semester 2

Chemistry Praveen Semester 1

In the above table, John takes both Computer and Math class for Semester 1 but he
doesn't take Math class for Semester 2. In this case, combination of all these fields
required to identify a valid data.
Suppose we add a new Semester as Semester 3 but do not know about the subject
and who will be taking that subject so we leave Lecturer and Subject as NULL. But all
three columns together acts as a primary key, so we can't leave other two columns
blank.

So to make the above table into 5NF, we can decompose it into three relations P1, P2
& P3:

P1

SEMESTER SUBJECT

Semester 1 Computer

Semester 1 Math

Semester 1 Chemistry

Semester 2 Math

P2

SUBJECT LECTURER

Computer Anshika

Computer John

Math John

Math Akash

Chemistry Praveen

P3

SEMSTER LECTURER

Semester 1 Anshika

Semester 1 John
Semester 1 John

Semester 2 Akash

Semester 1 Praveen

Relational Decomposition
o When a relation in the relational model is not in appropriate normal form then the
decomposition of a relation is required.
o In a database, it breaks the table into multiple tables.
o If the relation has no proper decomposition, then it may lead to problems like loss of
information.
o Decomposition is used to eliminate some of the problems of bad design like
anomalies, inconsistencies, and redundancy.

Types of Decomposition

Lossless Decomposition
o If the information is not lost from the relation that is decomposed, then the
decomposition will be lossless.
o The lossless decomposition guarantees that the join of relations will result in the
same relation as it was decomposed.
o The relation is said to be lossless decomposition if natural joins of all the
decomposition give the original relation.

Example:

EMPLOYEE_DEPARTMENT table:

EMP_ID EMP_NAME EMP_AGE EMP_CITY DEPT_ID DEPT_NAME

22 Denim 28 Mumbai 827 Sales

33 Alina 25 Delhi 438 Marketing

46 Stephan 30 Bangalore 869 Finance

52 Katherine 36 Mumbai 575 Production

60 Jack 40 Noida 678 Testing

The above relation is decomposed into two relations EMPLOYEE and DEPARTMENT

EMPLOYEE table:

EMP_ID EMP_NAME EMP_AGE EMP_CITY

22 Denim 28 Mumbai

33 Alina 25 Delhi

46 Stephan 30 Bangalore

52 Katherine 36 Mumbai

60 Jack 40 Noida
DEPARTMENT table

DEPT_ID EMP_ID DEPT_NAME

827 22 Sales

438 33 Marketing

869 46 Finance

575 52 Production

678 60 Testing

Now, when these two relations are joined on the common column "EMP_ID", then
the resultant relation will look like:

Employee ⋈ Department

EMP_ID EMP_NAME EMP_AGE EMP_CITY DEPT_ID DEPT_NAME

22 Denim 28 Mumbai 827 Sales

33 Alina 25 Delhi 438 Marketing

46 Stephan 30 Bangalore 869 Finance

52 Katherine 36 Mumbai 575 Production

60 Jack 40 Noida 678 Testing

Hence, the decomposition is Lossless join decomposition.

Dependency Preserving
o It is an important constraint of the database.
o In the dependency preservation, at least one decomposed table must satisfy every
dependency.
o If a relation R is decomposed into relation R1 and R2, then the dependencies of R
either must be a part of R1 or R2 or must be derivable from the combination of
functional dependencies of R1 and R2.
o For example, suppose there is a relation R (A, B, C, D) with functional dependency set
(A->BC). The relational R is decomposed into R1(ABC) and R2(AD) which is
dependency preserving because FD A->BC is a part of relation R1(ABC).

Multivalued Dependency
o Multivalued dependency occurs when two attributes in a table are
independent of each other but, both depend on a third attribute.
o A multivalued dependency consists of at least two attributes that are
dependent on a third attribute that's why it always requires at least three
attributes.

Example: Suppose there is a bike manufacturer company which produces two


colors(white and black) of each model every year.

BIKE_MODEL MANUF_YEAR COLOR

M2011 2008 White

M2001 2008 Black

M3001 2013 White

M3001 2013 Black

M4006 2017 White

M4006 2017 Black

Here columns COLOR and MANUF_YEAR are dependent on BIKE_MODEL and


independent of each other.

In this case, these two columns can be called as multivalued dependent on


BIKE_MODEL. The representation of these dependencies is shown below:
1. BIKE_MODEL   →  →  MANUF_YEAR  
2. BIKE_MODEL   →  →  COLOR  

This can be read as "BIKE_MODEL multidetermined MANUF_YEAR" and "BIKE_MODEL


multidetermined COLOR".

You might also like