0% found this document useful (0 votes)
16 views

DBMS and SQL

Uploaded by

tushikasahu5
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views

DBMS and SQL

Uploaded by

tushikasahu5
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 108

OVERVIEW OF DATABASE MANAGEMENT SYSTEM

As the name suggests, the database management system consists of two parts. They are:

1. Database and 2. Management System

Database: A database is a collection of related data. By data, we mean known facts that can be
recorded and that have implicit meaning.

A database has the following implicit properties

1. A database is a logically coherent collection of data with some inherent meaning.

2. A database is designed, built, and populated with data for a specific purpose. It has an intended group
of users and some preconceived application in which these users are interested.

3. A database represents some aspect of the real world.

A database can be of any size and of varying complexity.

A database may be generated and maintained manually or by machine.

A database management system (DBMS) is a collection of programs that enables users to create and
maintain a database.

A DBMS is hence a general-purpose software system that facilitates the processes of defining,
constructing and manipulating databases for various applications. The database and software are
together called a database system.10.7MDK, JRE, and JVM

For example: The college Database organizes the data about the admin, staff, students and faculty etc.

Using the database, you can easily retrieve, insert, and delete the information.
Characteristics of DBMS
o It uses a digital repository established on a server to store and manage the information.

o It can provide a clear and logical view of the process that manipulates data.

o DBMS contains automatic backup and recovery procedures.

o It contains ACID properties which maintain data in a healthy state in case of failure.

o It can reduce the complex relationship between data.


o It is used to support manipulation and processing of data.

o It is used to provide security of data.

o It can view the database from different viewpoints according to the requirements of the user.

Advantages of DBMS
o Controls database redundancy: It can control data redundancy because it stores all the data in one single
database file and that recorded data is placed in the database.

o Data sharing: In DBMS, the authorized users of an organization can share the data among multiple users.
o Easily Maintenance: It can be easily maintainable due to the centralized nature of the database system.

o Reduce time: It reduces development time and maintenance need.


o Backup: It provides backup and recovery subsystems which create automatic backup of data
from hardware and software failures and restores the data if required.
o multiple user interface: It provides different types of user interfaces like graphical user interfaces,
application program interfaces

Disadvantages of DBMS
o Cost of Hardware and Software: It requires a high speed of data processor and large memory size to run
DBMS software.
o Size: It occupies a large space of disks and large memory to run them efficiently.

o Complexity: Database system creates additional complexity and requirements.


o Higher impact of failure: Failure is highly impacted the database because in most of the organization, all
the data stored in a single database and if the database is damaged due to electric failure or database corruption
then the data may be lost forever.
DBMS users:

Many persons are involved in the design, use and maintenance of a large database. Different types of
DBMS users are:

1. Database Administrator (DBA)

2. Database Designer

3. End users

DBMS Architecture
o The DBMS design depends upon its architecture. The basic client/server architecture is used to deal with a
large number of PCs, web servers, database servers and other components that are connected with networks.

o The client/server architecture consists of many PCs and a workstation which are connected via the network.
o DBMS architecture depends upon how users are connected to the database to get their request done.
Types of DBMS Architecture

Database architecture can be seen as a single tier or multi-tier. But logically, database architecture is of two types
like: 2-tier architecture and 3-tier architecture.

1-Tier Architecture
o In this architecture, the database is directly available to the user. It means the user can directly sit on the
DBMS and uses it.
o Any changes done here will directly be done on the database itself. It doesn't provide a handy tool for end
users.

o The 1-Tier architecture is used for development of the local application, where programmers can directly
communicate with the database for the quick response.
2-Tier Architecture
o The 2-Tier architecture is same as basic client-server. In the two-tier architecture, applications on the client
end can directly communicate with the database at the server side. For this interaction, API's
like: ODBC, JDBC are used.
o The user interfaces and application programs are run on the client-side.

o The server side is responsible to provide the functionalities like: query processing and transaction
management.

o To communicate with the DBMS, client-side application establishes a connection with the server side.

Fig: 2-tier Architecture

3-Tier Architecture
o The 3-Tier architecture contains another layer between the client and server. In this architecture, client can't
directly communicate with the server.

o The application on the client-end interacts with an application server which further communicates with the
database system.
o End user has no idea about the existence of the database beyond the application server. The database also has
no idea about any other user beyond the application.

o The 3-Tier architecture is used in case of large web application.

Fig: 3-tier Architecture

History of DBMS

Data is a collection of facts and figures. The data collection was increasing day to day and they needed
to be stored in a device or a safer software. Charles Bachman was the first person to develop the
Integrated Data Store (IDS) which was based on a network data model for which he was inaugurated
with the Turing Award (The most prestigious award which is equivalent to the Nobel prize in the field
of Computer Science.). It was developed in the early 1960s. In the late 1960s, IBM (International
Business Machines Corporation) developed the Integrated Management Systems which is the standard
database system used to date in many places. It was developed based on the hierarchical database
model. It was during the year 1970 that the

Relational database model


was developed by Edgar Codd. Many of the database models we use today are relational-based. It was
considered the standardized database model from then. The relational model was still in use by many
people in the market. Later during the same decade (1980’s), IBM developed the

Structured Query Language (SQL)


as a part of the R project. It was declared as a standard language for the queries by ISO and ANSI. The
Transaction Management System for processing transactions was also developed by James Gray for
which he was felicitated the Turing Award. Further, there were many other models with rich features
like complex queries, datatypes to insert images, and many others. The Internet Age has perhaps
influenced the data models much more.

Relational Model
Relational data model is the primary data model, which is used widely around the world for data storage
and processing. This model is simple and it has all the properties and capabilities required to process
data with storage efficiency.

Concepts
Tables − In relational data model, relations are saved in the format of Tables. This format stores the
relation among entities. A table has rows and columns, where rows represents records and columns
represent the attributes.
Tuple − A single row of a table, which contains a single record for that relation is called a tuple.
Relation instance − A finite set of tuples in the relational database system represents relation instance.
Relation instances do not have duplicate tuples.
Relation schema − A relation schema describes the relation name (table name), attributes, and their
names.
Relation key − Each row has one or more attributes, known as relation key, which can identify the row
in the relation (table) uniquely.
Attribute domain − Every attribute has some pre-defined value scope, known as attribute domain.

Integrity Constraints
o Integrity constraints are a set of rules. It is used to maintain the quality of information.
o Integrity constraints ensure that the data insertion, updating, and other processes have to be
performed in such a way that data integrity is not affected.
o Thus, integrity constraint is used to guard against accidental damage to the database.

Types of Integrity Constraint

1. Domain constraints
o Domain constraints can be defined a valid set of values for an attribute.
o The data type of domain includes string, character, integer, time, date, currency, etc. The value
of the attribute must be available in the corresponding domain.

Example:

2. Entity integrity constraints


o The entity integrity constraint states that primary key value can't be null.
o This is because the primary key value is used to identify individual rows in relation and if the
primary key has a null value, then we can't identify those rows.
o A table can contain a null value other than the primary key field.

Example:
3. Referential Integrity Constraints
o A referential integrity constraint is specified between two tables.
o In the Referential integrity constraints, if a foreign key in Table 1 refers to the Primary Key of
Table 2, then every value of the Foreign Key in Table 1 must be null or be available in Table 2.

Example:
4. Key constraints
o Keys are the entity set that is used to identify an entity within its entity set uniquely.
o An entity set can have multiple keys, but out of which one key will be the primary key. A
primary key can contain a unique and null value in the relational table.

Example:
ER model
o ER model stands for an Entity-Relationship model. It is a high-level data model. This model is
used to define the data elements and relationship for a specified system.
o It develops a conceptual design for the database. It also develops a very simple and easy to
design view of data.
o In ER modeling, the database structure is portrayed as a diagram called an entity-relationship
diagram.

For example, Suppose we design a school database. In this database, the student will be an entity with
attributes like address, name, id, age, etc. The address can be another entity with attributes like city,
street name, pin code, etc and there will be a relationship between them.
Component of ER Diagram

1. Entity:

An entity may be any object, class, person or place. In the ER diagram, an entity can be represented as
rectangles.

Consider an organization as an example- manager, product, employee, department etc. can be taken as
an entity.
a. Weak Entity

An entity that depends on another entity called a weak entity. The weak entity doesn't contain any key
attribute of its own. The weak entity is represented by a double rectangle.

2. Attribute

The attribute is used to describe the property of an entity. Eclipse is used to represent an attribute.

For example, id, age, contact number, name, etc. can be attributes of a student.
a. Key Attribute

The key attribute is used to represent the main characteristics of an entity. It represents a primary key.
The key attribute is represented by an ellipse with the text underlined.
b. Composite Attribute

An attribute that composed of many other attributes is known as a composite attribute. The composite
attribute is represented by an ellipse, and those ellipses are connected with an ellipse.
c. Multivalued Attribute

An attribute can have more than one value. These attributes are known as a multivalued attribute. The
double oval is used to represent multivalued attribute.

For example, a student can have more than one phone number.

d. Derived Attribute

An attribute that can be derived from other attribute is known as a derived attribute. It can be
represented by a dashed ellipse.

For example, A person's age changes over time and can be derived from another attribute like Date of
birth.
3. Relationship

A relationship is used to describe the relation between entities. Diamond or rhombus is used to represent
the relationship.

Types of relationship are as follows:

a. One-to-One Relationship

When only one instance of an entity is associated with the relationship, then it is known as one to one
relationship.
For example, A female can marry to one male, and a male can marry to one female.

b. One-to-many relationship

When only one instance of the entity on the left, and more than one instance of an entity on the right
associates with the relationship then this is known as a one-to-many relationship.

For example, Scientist can invent many inventions, but the invention is done by the only specific
scientist.

c. Many-to-one relationship

When more than one instance of the entity on the left, and only one instance of an entity on the right
associates with the relationship then it is known as a many-to-one relationship.

For example, Student enrolls for only one course, but a course can have many students.
d. Many-to-many relationship

When more than one instance of the entity on the left, and more than one instance of an entity on the
right associates with the relationship then it is known as a many-to-many relationship.

For example, Employee can assign by many projects and project can have many employees.

Relationship

Relationships are represented by diamond-shaped box. Name of the relationship is written inside the
diamond-box. All the entities (rectangles) participating in a relationship, are connected to it by a line.

Binary Relationship and Cardinality

A relationship where two entities are participating is called a binary relationship. Cardinality is the
number of instance of an entity from a relation that can be associated with the relation.

• One-to-one − When only one instance of an entity is associated with the relationship, it is
marked as '1:1'. The following image reflects that only one instance of each entity should be
associated with the relationship. It depicts one-to-one relationship.
• One-to-many − When more than one instance of an entity is associated with a relationship, it is
marked as '1:N'. The following image reflects that only one instance of entity on the left and
more than one instance of an entity on the right can be associated with the relationship. It depicts
one-to-many relationship.

• Many-to-one − When more than one instance of entity is associated with the relationship, it is
marked as 'N:1'. The following image reflects that more than one instance of an entity on the
left and only one instance of an entity on the right can be associated with the relationship. It
depicts many-to-one relationship.

• Many-to-many − The following image reflects that more than one instance of an entity on the
left and more than one instance of an entity on the right can be associated with the relationship.
It depicts many-to-many relationship.
Participation Constraints
• Total Participation − Each entity is involved in the relationship. Total participation is
represented by double lines.
• Partial participation − Not all entities are involved in the relationship. Partial participation is
represented by single lines.

Generalization

As mentioned above, the process of generalizing entities, where the generalized entities contain the
properties of all the generalized entities, is called generalization. In generalization, a number of entities
are brought together into one generalized entity based on their similar characteristics. For example,
pigeon, house sparrow, crow and dove can all be generalized as Birds.
Specialization

Specialization is the opposite of generalization. In specialization, a group of entities is divided into sub-
groups based on their characteristics. Take a group ‘Person’ for example. A person has name, date of
birth, gender, etc. These properties are common in all persons, human beings. But in a company,
persons can be identified as employee, employer, customer, or vendor, based on what role they play in
the company.

Similarly, in a school database, persons can be specialized as teacher, student, or a staff, based on what
role they play in school as entities.

Keys
o Keys play an important role in the relational database.
o It is used to uniquely identify any record or row of data from the table. It is also used to establish
and identify relationships between tables.

For example: In Student table, ID is used as a key because it is unique for each student. In PERSON
table, passport_number, license_number, SSN are keys since they are unique for each person.
Types of key:

1. Primary key
o It is the first key which is used to identify one and only one instance of an entity uniquely. An
entity can contain multiple keys as we saw in PERSON table. The key which is most suitable
from those lists become a primary key.
o In the EMPLOYEE table, ID can be primary key since it is unique for each employee. In the
EMPLOYEE table, we can even select License_Number and Passport_Number as primary key
since they are also unique.
o For each entity, selection of the primary key is based on requirement and developers.
2. Candidate key
o A candidate key is an attribute or set of an attribute which can uniquely identify a tuple.
o The remaining attributes except for primary key are considered as a candidate key. The
candidate keys are as strong as the primary key.

For example: In the EMPLOYEE table, id is best suited for the primary key. Rest of the attributes like
SSN, Passport_Number, and License_Number, etc. are considered as a candidate key.
3. Super Key

Super key is a set of an attribute which can uniquely identify a tuple. Super key is a superset of a
candidate key.

For example: In the above EMPLOYEE table, for(EMPLOEE_ID, EMPLOYEE_NAME) the name
of two employees can be the same, but their EMPLYEE_ID can't be the same. Hence, this combination
can also be a key.

The super key would be EMPLOYEE-ID, (EMPLOYEE_ID, EMPLOYEE-NAME), etc.

4. Foreign key
o Foreign keys are the column of the table1 which is used to point to the primary key in table2.
o In a company, every employee works in a specific department, and employee and department
are two different entities. So we can't store the information of the department in the employee
table. That's why we link these two tables through the primary key of one table.
o We add the primary key of the DEPARTMENT table, Department_Id as a new attribute in the
EMPLOYEE table.
o Now in the EMPLOYEE table, Department_Id is the foreign key, and both the tables are related.

What is SQL?
SQL is Structured Query Language, which is a computer language for storing, manipulating and
retrieving data stored in a relational database.
SQL is the standard language for Relational Database System. All the Relational Database Management
Systems (RDMS) like MySQL, MS Access, Oracle, Sybase, Informix, Postgres and SQL Server use
SQL as their standard database language.

Also, they are using different dialects, such as −

• MS SQL Server using T-SQL,


• Oracle using PL/SQL,
• MS Access version of SQL is called JET SQL (native format) etc.

SQL Process
When you are executing an SQL command for any RDBMS, the system determines the best way to
carry out your request and SQL engine figures out how to interpret the task.
There are various components included in this process.
These components are −

• Query Dispatcher
• Optimization Engines
• Classic Query Engine
• SQL Query Engine, etc.
A classic query engine handles all the non-SQL queries, but a SQL query engine won't handle logical
files.
Following is a simple diagram showing the SQL Architecture −
SQL Commands
The standard SQL commands to interact with relational databases are CREATE, SELECT, INSERT,
UPDATE, DELETE and DROP. These commands can be classified into the following groups based
on their nature −

DDL - Data Definition Language

Sr.No. Command & Description

CREATE
1
Creates a new table, a view of a table, or other object in the database.
ALTER
2
Modifies an existing database object, such as a table.

DROP
3
Deletes an entire table, a view of a table or other objects in the database.

DML - Data Manipulation Language

Sr.No. Command & Description

SELECT
1
Retrieves certain records from one or more tables.

INSERT
2
Creates a record.

UPDATE
3
Modifies records.

DELETE
4
Deletes records.

DCL - Data Control Language

Sr.No. Command & Description


GRANT
1
Gives a privilege to user.

REVOKE
2
Takes back privileges granted from user.

SQL Data type


o SQL Datatype is used to define the values that a column can contain.
o Every column is required to have a name and data type in the database table.

Data type of SQL:


1. Binary Datatypes

There are Three types of binary Datatypes which are given below:

Data Type Description

Binary It has a maximum length of 8000 bytes. It contains fixed-length binary data.

varbinary It has a maximum length of 8000 bytes. It contains variable-length binary data.

Image It has a maximum length of 2,147,483,647 bytes. It contains variable-length binary data.

2. Approximate Numeric Datatype :

The subtypes are given below:

Data type From To Description

Float -1.79E + 308 1.79E + 308 It is used to specify a floating-point value e.g. 6.2, 2.9
etc.

Real -3.40e + 38 3.40E + 38 It specifies a single precision floating point number

3. Exact Numeric Datatype

The subtypes are given below:

Data type Description


Int It is used to specify an integer value.

smallint It is used to specify small integer value.

Bit It has the number of bits to store.

decimal It specifies a numeric value that can have a decimal number.

numeric It is used to specify a numeric value.

4. Character String Data type

The subtypes are given below:

Data type Description

Char It has a maximum length of 8000 characters. It contains Fixed-length non-unicode characters.

varchar It has a maximum length of 8000 characters. It contains variable-length non-unicode characters.

Text It has a maximum length of 2,147,483,647 characters. It contains variable-length non-unicode


characters.

5. Date and time Datatypes

The subtypes are given below:


Datatype Description

Date It is used to store the year, month, and days value.

Time It is used to store the hour, minute, and second values.

Timestamp It stores the year, month, day, hour, minute, and the second value.

1. SELECT Statement

This SQL statement reads the data from the SQL database and shows it as the output to the database
user.

Syntax of SELECT Statement:

1. SELECT column_name1, column_name2, .…, column_nameN

2. [ FROM table_name ]

3. [ WHERE condition ]

4. [ ORDER BY order_column_name1 [ ASC | DESC ], .... ];

Example of SELECT Statement:

1. SELECT Emp_ID, First_Name, Last_Name, Salary, City


2. FROM Employee_details

3. WHERE Salary = 100000

4. ORDER BY Last_Name

This example shows the Emp_ID, First_Name, Last_Name, Salary, and City of those employees
from the Employee_details table whose Salary is 100000. The output shows all the specified details
according to the ascending alphabetical order of Last_Name.

3. UPDATE Statement

This SQL statement changes or modifies the stored data in the SQL database.

Syntax of UPDATE Statement:

1. UPDATE table_name

2. SET column_name1 = new_value_1, column_name2 = new_value_2, ...., column_nameN = new_val


ue_N

3. [ WHERE CONDITION ];

Example of UPDATE Statement:

1. UPDATE Employee_details

2. SET Salary = 100000

3. WHERE Emp_ID = 10;

This example changes the Salary of those employees of the Employee_details table
whose Emp_ID is 10 in the table.
3. DELETE Statement

This SQL statement deletes the stored data from the SQL database.

Syntax of DELETE Statement:

1. DELETE FROM table_name

2. [ WHERE CONDITION ];

Example of DELETE Statement:

1. DELETE FROM Employee_details

2. WHERE First_Name = 'Sumit';

This example deletes the record of those employees from the Employee_details table
whose First_Name is Sumit in the table.

4. CREATE TABLE Statement

This SQL statement creates the new table in the SQL database.

Syntax of CREATE TABLE Statement:

1. CREATE TABLE table_name

2. (

3. column_name1 data_type [column1 constraint(s)],

4. column_name2 data_type [column2 constraint(s)],

5. .....

6. .....,
7. column_nameN data_type [columnN constraint(s)],

8. PRIMARY KEY(one or more col)

9. );

Example of CREATE TABLE Statement:

1. CREATE TABLE Employee_details(

2. Emp_Id NUMBER(4) NOT NULL,

3. First_name VARCHAR(30),

4. Last_name VARCHAR(30),

5. Salary Money,

6. City VARCHAR(30),

7. PRIMARY KEY (Emp_Id)

8. );

This example creates the table Employee_details with five columns or fields in the SQL database. The
fields in the table are Emp_Id, First_Name, Last_Name, Salary, and City. The Emp_Id column in
the table acts as a primary key, which means that the Emp_Id column cannot contain duplicate values
and null values.

5. ALTER TABLE Statement

This SQL statement adds, deletes, and modifies the columns of the table in the SQL database.

Syntax of ALTER TABLE Statement:

1. ALTER TABLE table_name ADD column_name datatype[(size)];


The above SQL alter statement adds the column with its datatype in the existing database table.

1. ALTER TABLE table_name MODIFY column_name column_datatype[(size)];

The above 'SQL alter statement' renames the old column name to the new column name of the existing
database table.

1. ALTER TABLE table_name DROP COLUMN column_name;

The above SQL alter statement deletes the column of the existing database table.

Example of ALTER TABLE Statement:

1. ALTER TABLE Employee_details

2. ADD Designation VARCHAR(18);

This example adds the new field whose name is Designation with size 18 in
the Employee_details table of the SQL database.

6. DROP TABLE Statement

This SQL statement deletes or removes the table and the structure, views, permissions, and triggers
associated with that table.

Syntax of DROP TABLE Statement:

1. DROP TABLE [ IF EXISTS ]

2. table_name1, table_name2, ……, table_nameN;

The above syntax of the drop statement deletes specified tables completely if they exist in the database.

Example of DROP TABLE Statement:


1. DROP TABLE Employee_details;

This example drops the Employee_details table if it exists in the SQL database. This removes the
complete information if available in the table.

7. CREATE DATABASE Statement

This SQL statement creates the new database in the database management system.

Syntax of CREATE DATABASE Statement:

1. CREATE DATABASE database_name;

Example of CREATE DATABASE Statement:

1. CREATE DATABASE Company;

The above example creates the company database in the system.

8. DROP DATABASE Statement

This SQL statement deletes the existing database with all the data tables and views from the database
management system.

Syntax of DROP DATABASE Statement:

1. DROP DATABASE database_name;

Example of DROP DATABASE Statement:

1. DROP DATABASE Company;

The above example deletes the company database from the system.
9. INSERT INTO Statement

This SQL statement inserts the data or records in the existing table of the SQL database. This statement
can easily insert single and multiple records in a single query statement.

Syntax of insert a single record:

1. INSERT INTO table_name

2. (

3. column_name1,

4. column_name2, .…,

5. column_nameN

6. )

7. VALUES

8. (value_1,

9. value_2, ..…,

10. value_N

11. );

Example of insert a single record:

1. INSERT INTO Employee_details

2. (

3. Emp_ID,
4. First_name,

5. Last_name,

6. Salary,

7. City

8. )

9. VALUES

10. (101,

11. Akhil,

12. Sharma,

13. 40000,

14. Bangalore

15. );

This example inserts 101 in the first column, Akhil in the second column, Sharma in the third
column, 40000 in the fourth column, and Bangalore in the last column of the table Employee_details.

Syntax of inserting a multiple records in a single query:

1. INSERT INTO table_name

2. ( column_name1, column_name2, .…, column_nameN)

3. VALUES (value_1, value_2, ..…, value_N), (value_1, value_2, ..…, value_N),….;

Example of inserting multiple records in a single query:

1. INSERT INTO Employee_details


2. ( Emp_ID, First_name, Last_name, Salary, City )

3. VALUES (101, Amit, Gupta, 50000, Mumbai), (101, John, Aggarwal, 45000, Calcutta), (101, Sidhu,
Arora, 55000, Mumbai);

This example inserts the records of three employees in the Employee_details table in the single query
statement.

10. TRUNCATE TABLE Statement

This SQL statement deletes all the stored records from the table of the SQL database.

Syntax of TRUNCATE TABLE Statement:

1. TRUNCATE TABLE table_name;

Example of TRUNCATE TABLE Statement:

1. TRUNCATE TABLE Employee_details;

This example deletes the record of all employees from the Employee_details table of the database.

11. DESCRIBE Statement

This SQL statement tells something about the specified table or view in the query.

Syntax of DESCRIBE Statement:

1. DESCRIBE table_name | view_name;

Example of DESCRIBE Statement:

1. DESCRIBE Employee_details;
This example explains the structure and other details about the Employee_details table.

12. DISTINCT Clause

This SQL statement shows the distinct values from the specified columns of the database table. This
statement is used with the SELECT keyword.

Syntax of DISTINCT Clause:

1. SELECT DISTINCT column_name1, column_name2, ...

2. FROM table_name;

Example of DISTINCT Clause:

1. SELECT DISTINCT City, Salary

2. FROM Employee_details;

This example shows the distinct values of the City and Salary column from
the Employee_details table.

13. COMMIT Statement

This SQL statement saves the changes permanently, which are done in the transaction of the SQL database.

Syntax of COMMIT Statement:

1. COMMIT

Example of COMMIT Statement:

1. DELETE FROM Employee_details

2. WHERE salary = 30000;


3. COMMIT;

This example deletes the records of those employees whose Salary is 30000 and then saves the changes permanently
in the database.

14. ROLLBACK Statement

This SQL statement undo the transactions and operations which are not yet saved to the SQL database.

Syntax of ROLLBACK Statement:

1. ROLLBACK

Example of ROLLBACK Statement:

1. DELETE FROM Employee_details

2. WHERE City = Mumbai;

3. ROLLBACK;

This example deletes the records of those employees whose City is Mumbai and then undo the changes in the
database.

15. CREATE INDEX Statement

This SQL statement creates the new index in the SQL database table.

Syntax of CREATE INDEX Statement:

1. CREATE INDEX index_name

2. ON table_name ( column_name1, column_name2, …, column_nameN );

Example of CREATE INDEX Statement:

1. CREATE INDEX idx_First_Name


2. ON employee_details (First_Name);

This example creates an index idx_First_Name on the First_Name column of the Employee_details table.

16. DROP INDEX Statement

This SQL statement deletes the existing index of the SQL database table.

Syntax of DROP INDEX Statement:

1. DROP INDEX index_name;

Example of DROP INDEX Statement:

1. DROP INDEX idx_First_Name;

This example deletes the index idx_First_Name from the SQL database.

17. USE Statement

This SQL statement selects the existing SQL database. Before performing the operations on the database table, you
have to select the database from the multiple existing databases.

Syntax of USE Statement:

1. USE database_name;

Example of USE DATABASE Statement:

1. USE Company;

This example uses the company database.


SQL WHERE
A WHERE clause in SQL is a data manipulation language statement.

WHERE clauses are not mandatory clauses of SQL DML statements. But it can be used to limit the number of rows
affected by a SQL DML statement or returned by a query.

Actually. it filters the records. It returns only those queries which fulfill the specific conditions.

WHERE clause is used in SELECT, UPDATE, DELETE statement etc.

Let's see the syntax for sql where:

1. SELECT column1, column 2, ... column n

2. FROM table_name

3. WHERE [conditions]

WHERE clause uses some conditional selection

= Equal

> greater than

< less than

>= greater than or equal

<= less than or equal

<> not equal to


SQL AND
o The SQL AND condition is used in SQL query to create two or more conditions to be met.

o It is used in SQL SELECT, INSERT, UPDATE and DELETE

o Let's see the syntax for SQL AND:

o SELECT columns FROM tables WHERE condition 1 AND condition 2;


o The SQL AND condition require that both conditions should be met.

o The SQL AND condition also can be used to join multiple tables in a SQL statement.

o To understand this concept practically, let us see some examples.

Consider we have an employee table created into the database with the following data:

ID First_Name Last_Name Department Location

1 Harshad Kuwar Marketing Pune

2 Anurag Rajput IT Mumbai

3 Chaitali Tarle IT Chennai

4 Pranjal Patil IT Chennai

5 Suraj Tripathi Marketing Pune

6 Roshni Jadhav Finance Bangalore

7 Sandhya Jain Finance Bangalore


SQL "AND" example with "SELECT" statement

This is how an SQL "AND" condition can be used in the SQL SELECT statement.

Example 1:

Write a query to get the records from emp tables in which department of the employee is IT and location is Chennai.

Query:

1. mysql> SELECT *FROM emp WHERE Department = "IT" AND Location = "Chennai";

ID First_Name Last_Name Department Location

3 Chaitali Tarle IT Chennai

4 Pranjal Patil IT Chennai

In the emp table, there are three employees whose department is IT. But we have specified the AND condition
according to which the employee's location should not be other than Chennai. So, there are only two employees whose
department is IT and Location is Chennai.

Example 2:

Write a query to get the records from emp tables in which department of the employee is IT and location is Mumbai.

Query:

1. mysql> SELECT *FROM emp WHERE Department = "IT" AND Location = "Mumbai";
ID First_Name Last_Name Department Location

2 Anurag Rajput IT Mumbai

In the emp table, there are three employees whose department is IT. Among these three employees, there is only one
employee whose location is Mumbai. Due to the presence of the AND operator used in the query, a record must satisfy
both conditions.

SQL "AND" example with "UPDATE" statement

This is how the "AND" condition can be used in the SQL UPDATE statement.

Example 1:

Write a query to update the records in emp tables in which department of the employee is Marketing, and the first
name is Suraj. For that particular employee, set the updated value of the location as Delhi.

Query:

1. mysql> UPDATE emp SET Location = "Delhi" WHERE Department = "Marketing" AND First_Name = "Suraj";

We will use the SELECT query to verify the updated record.

1. mysql> SELECT *FROM emp;


ID First_Name Last_Name Department Location

1 Harshad Kuwar Marketing Pune

2 Anurag Rajput IT Mumbai

3 Chaitali Tarle IT Chennai

4 Pranjal Patil IT Chennai

5 Suraj Tripathi Marketing Delhi

6 Roshni Jadhav Finance Bangalore

7 Sandhya Jain Finance Bangalore

In the emp table, there are three employees whose department is IT. Among these three employees, there is only one
employee whose location is Mumbai. Due to the presence of the AND operator used in the query, a record must satisfy
both conditions.

Example 2:

Write a query to update the records in the emp table in which department of the employee is Finance and ID is 7. For
that particular employee, set the updated value of the department as HR.

Query:

1. mysql> UPDATE emp SET Department = "HR" WHERE Department = "Finance" AND ID = 7;
We will use the SELECT query to verify the updated record.

1. mysql> SELECT *FROM emp;

ID First_Name Last_Name Department Location

1 Harshad Kuwar Marketing Pune

2 Anurag Rajput IT Mumbai

3 Chaitali Tarle IT Chennai

4 Pranjal Patil IT Chennai

5 Suraj Tripathi Marketing Delhi

6 Roshni Jadhav Finance Bangalore

7 Sandhya Jain HR Bangalore

In the emp table, there are two employees whose department is Finance. Among these two employees, there is only
one employee whose ID is 7. Due to the presence of AND operator used in the query, a record must have the
department as Finance and ID as 7.
SQL "AND" example with "DELETE" statement

This is how an SQL "AND" condition can be used in the SQL DELETE statement.

Example 1:

Write a query to delete the records from the emp table in which the last name of the employee is Jain, and the Location
is Bangalore.

Query:

1. mysql> DELETE FROM emp WHERE Last_Name = 'Jain' AND Location = 'Bangalore';

We will use the SELECT query to verify the deleted record.

1. mysql> SELECT *FROM emp;

ID First_Name Last_Name Department Location

1 Harshad Kuwar Marketing Pune

2 Anurag Rajput IT Mumbai

3 Chaitali Tarle IT Chennai

4 Pranjal Patil IT Chennai


5 Suraj Tripathi Marketing Delhi

6 Roshni Jadhav Finance Bangalore

There is only one record in the emp table whose last name is Jain. But still, due to the presence of AND operator, the
second condition will also be checked according to which employee's location should be Bangalore. So, only that
particular record is deleted.

Example 2:

Write a query to delete the records from the emp table in which department of the employee is IT and Location is
Mumbai.

Query:

1. mysql> DELETE FROM emp WHERE Department = 'IT' AND Location = 'Mumbai';

We will use the SELECT query to verify the deleted record.

1. mysql> SELECT *FROM emp;

ID First_Name Last_Name Department Location

1 Harshad Kuwar Marketing Pune

3 Chaitali Tarle IT Chennai


4 Pranjal Patil IT Chennai

5 Suraj Tripathi Marketing Delhi

6 Roshni Jadhav Finance Bangalore

There are three records in the emp table whose department is IT. But only one record is deleted from the emp table,
which contains a total of 6 records. This happened because of the AND operator according to which the employee's
location should mandatorily be Mumbai. Therefore there is only one record that satisfies both the conditions. Hence,
it is deleted.

Join Operations:
A Join operation combines related tuples from different relations, if and only if a given join condition
is satisfied. It is denoted by ⋈.

Example:

EMPLOYEE

EMP_CODE EMP_NAME

101 Stephan

102 Jack

103 Harry

SALARY
EMP_CODE SALARY

101 50000

102 30000

103 25000

1. Operation: (EMPLOYEE ⋈ SALARY)

Result:

EMP_CODE EMP_NAME SALARY

101 Stephan 50000

102 Jack 30000

103 Harry 25000


Types of Join operations:

1. Natural Join:
o A natural join is the set of tuples of all combinations in R and S that are equal on their common
attribute names.

o It is denoted by ⋈.

Example: Let's use the above EMPLOYEE table and SALARY table:

Input:

1. EMP_NAME, SALARY (EMPLOYEE ⋈ SALARY)


Output:

EMP_NAME SALARY

Stephan 50000

Jack 30000

Harry 25000

2. Outer Join:

The outer join operation is an extension of the join operation. It is used to deal with missing information.

Example:

EMPLOYEE

EMP_NAME STREET CITY

Ram Civil line Mumbai

Shyam Park street Kolkata

Ravi M.G. Street Delhi

Hari Nehru nagar Hyderabad

FACT_WORKERS
EMP_NAME BRANCH SALARY

Ram Infosys 10000

Shyam Wipro 20000

Kuber HCL 30000

Hari TCS 50000

Input:

1. (EMPLOYEE ⋈ FACT_WORKERS)

Output:

EMP_NAME STREET CITY BRANCH SALARY

Ram Civil line Mumbai Infosys 10000

Shyam Park street Kolkata Wipro 20000

Hari Nehru nagar Hyderabad TCS 50000

An outer join is basically of three types:

a. Left outer join


b. Right outer join
c. Full outer join

a. Left outer join:


o Left outer join contains the set of tuples of all combinations in R and S that are equal on their
common attribute names.
o In the left outer join, tuples in R have no matching tuples in S.

o It is denoted by ⟕.

Example: Using the above EMPLOYEE table and FACT_WORKERS table

Input:

1. EMPLOYEE ⟕ FACT_WORKERS

EMP_NAME STREET CITY BRANCH SALARY

Ram Civil line Mumbai Infosys 10000

Shyam Park street Kolkata Wipro 20000

Hari Nehru street Hyderabad TCS 50000

Ravi M.G. Street Delhi NULL NULL

b. Right outer join:


o Right outer join contains the set of tuples of all combinations in R and S that are equal on their
common attribute names.
o In right outer join, tuples in S have no matching tuples in R.

o It is denoted by ⟖.
Example: Using the above EMPLOYEE table and FACT_WORKERS Relation

Input:

1. EMPLOYEE ⟖ FACT_WORKERS

Output:

EMP_NAME BRANCH SALARY STREET CITY

Ram Infosys 10000 Civil line Mumbai

Shyam Wipro 20000 Park street Kolkata

Hari TCS 50000 Nehru street Hyderabad

Kuber HCL 30000 NULL NULL

c. Full outer join:


o Full outer join is like a left or right join except that it contains all rows from both tables.
o In full outer join, tuples in R that have no matching tuples in S and tuples in S that have no
matching tuples in R in their common attribute name.

o It is denoted by ⟗.

Example: Using the above EMPLOYEE table and FACT_WORKERS table

Input:

1. EMPLOYEE ⟗ FACT_WORKERS

Output:
EMP_NAME STREET CITY BRANCH SALARY

Ram Civil line Mumbai Infosys 10000

Shyam Park street Kolkata Wipro 20000

Hari Nehru street Hyderabad TCS 50000

Ravi M.G. Street Delhi NULL NULL

Kuber NULL NULL HCL 30000

3. Equi join:

It is also known as an inner join. It is the most common join. It is based on matched data as per the
equality condition. The equi join uses the comparison operator(=).

Example:

CUSTOMER RELATION

CLASS_ID NAME

1 John

2 Harry

3 Jackson
PRODUCT

PRODUCT_ID CITY

1 Delhi

2 Mumbai

3 Noida

Input:

1. CUSTOMER ⋈ PRODUCT

Output:

CLASS_ID NAME PRODUCT_ID CITY

1 John 1 Delhi

2 Harry 2 Mumbai

3 Harry 3 Noida
Relational Algebra
Relational algebra is a procedural query language. It gives a step by step process to obtain the result of
the query. It uses operators to perform queries.

Types of Relational operation

1. Select Operation:
o The select operation selects tuples that satisfy a given predicate.
o It is denoted by sigma (σ).

1. Notation: σ p(r)

Where:

σ is used for selection prediction


r is used for relation
p is used as a propositional logic formula which may use connectors like: AND OR and NOT. These
relational can use as relational operators like =, ≠, ≥, <, >, ≤.
For example: LOAN Relation

BRANCH_NAME LOAN_NO AMOUNT

Downtown L-17 1000

Redwood L-23 2000

Perryride L-15 1500

Downtown L-14 1500

Mianus L-13 500

Roundhill L-11 900

Perryride L-16 1300

Input:

1. σ BRANCH_NAME="perryride" (LOAN)

Output:

BRANCH_NAME LOAN_NO AMOUNT

Perryride L-15 1500


Perryride L-16 1300

2. Project Operation:
o This operation shows the list of those attributes that we wish to appear in the result. Rest of the
attributes are eliminated from the table.
o It is denoted by ∏.

1. Notation: ∏ A1, A2, An (r)

Where

A1, A2, A3 is used as an attribute name of relation r.

Example: CUSTOMER RELATION

NAME STREET CITY

Jones Main Harrison

Smith North Rye

Hays Main Harrison

Curry North Rye

Johnson Alma Brooklyn

Brooks Senator Brooklyn


Input:

1. ∏ NAME, CITY (CUSTOMER)

Output:

NAME CITY

Jones Harrison

Smith Rye

Hays Harrison

Curry Rye

Johnson Brooklyn

Brooks Brooklyn

3. Union Operation:
o Suppose there are two tuples R and S. The union operation contains all the tuples that are either
in R or S or both in R & S.

o It eliminates the duplicate tuples. It is denoted by ∪.

1. Notation: R ∪ S

A union operation must hold the following condition:

o R and S must have the attribute of the same number.


o Duplicate tuples are eliminated automatically.

Example:

DEPOSITOR RELATION

CUSTOMER_NAME ACCOUNT_NO

Johnson A-101

Smith A-121

Mayes A-321

Turner A-176

Johnson A-273

Jones A-472

Lindsay A-284

BORROW RELATION

CUSTOMER_NAME LOAN_NO

Jones L-17
Smith L-23

Hayes L-15

Jackson L-14

Curry L-93

Smith L-11

Williams L-17

Input:

1. ∏ CUSTOMER_NAME (BORROW) ∪ ∏ CUSTOMER_NAME (DEPOSITOR)

Output:

CUSTOMER_NAME

Johnson

Smith

Hayes

Turner
Jones

Lindsay

Jackson

Curry

Williams

Mayes

4. Set Intersection:
o Suppose there are two tables R and S. The set intersection operation contains all tuples that are
in both R & S.
o It is denoted by intersection ∩.

1. Notation: R ∩ S

Example: Using the above DEPOSITOR table and BORROW table

Input:

1. ∏ CUSTOMER_NAME (BORROW) ∩ ∏ CUSTOMER_NAME (DEPOSITOR)

Output:

CUSTOMER_NAME
Smith

Jones

5. Set Difference:
o Suppose there are two tuples R and S. The set difference operation contains all tuples that are
in R but not in S.
o It is denoted by intersection minus (-).

1. Notation: R - S

Example: Using the above DEPOSITOR table and BORROW table

Input:

1. ∏ CUSTOMER_NAME (BORROW) - ∏ CUSTOMER_NAME (DEPOSITOR)

Output:

CUSTOMER_NAME

Jackson

Hayes

Willians

Curry
6. Cartesian product
o The Cartesian product is used to combine each row in one table with each row in the other table.
It is also known as a cross product.
o It is denoted by X.

1. Notation: E X D

Example:

EMPLOYEE

EMP_ID EMP_NAME EMP_DEPT

1 Smith A

2 Harry C

3 John B

DEPARTMENT

DEPT_NO DEPT_NAME

A Marketing

B Sales

C Legal
Input:

1. EMPLOYEE X DEPARTMENT

Output:

EMP_ID EMP_NAME EMP_DEPT DEPT_NO DEPT_NAME

1 Smith A A Marketing

1 Smith A B Sales

1 Smith A C Legal

2 Harry C A Marketing

2 Harry C B Sales

2 Harry C C Legal

3 John B A Marketing

3 John B B Sales

3 John B C Legal

7. Rename Operation:

The rename operation is used to rename the output relation. It is denoted by rho (ρ).
Example: We can use the rename operator to rename STUDENT relation to STUDENT1.

1. ρ(STUDENT1, STUDENT)

Relational Calculus

In contrast to Relational Algebra, Relational Calculus is a non-procedural query language, that is, it
tells what to do but never explains how to do it.

Relational calculus exists in two forms −

Tuple Relational Calculus (TRC)

Filtering variable ranges over tuples

Notation − {T | Condition}

Returns all tuples T that satisfies a condition.

For example −

{ T.name | Author(T) AND T.article = 'database' }

Output − Returns tuples with 'name' from Author who has written article on 'database'.

TRC can be quantified. We can use Existential (∃) and Universal Quantifiers (∀).

For example −

{ R| ∃T ∈ Authors(T.article='database' AND R.name=T.name)}

Output − The above query will yield the same result as the previous one.

Domain Relational Calculus (DRC)

In DRC, the filtering variable uses the domain of attributes instead of entire tuple values (as done in
TRC, mentioned above).

Notation −
{ a1, a2, a3, ..., an | P (a1, a2, a3, ... ,an)}

Where a1, a2 are attributes and P stands for formulae built by inner attributes.

For example −

{< article, page, subject > | ∈ TutorialsPoint ∧ subject = 'database'}

Output − Yields Article, Page, and Subject from the relation TutorialsPoint, where subject is database.

Just like TRC, DRC can also be written using existential and universal quantifiers. DRC also involves
relational operators.

The expression power of Tuple Relation Calculus and Domain Relation Calculus is equivalent to
Relational Algebra.

Functional Dependency
The functional dependency is a relationship that exists between two attributes. It typically exists
between the primary key and non-key attribute within a table.

1. X → Y

The left side of FD is known as a determinant, the right side of the production is known as a dependent.

For example:

Assume we have an employee table with attributes: Emp_Id, Emp_Name, Emp_Address.

Here Emp_Id attribute can uniquely identify the Emp_Name attribute of employee table because if we
know the Emp_Id, we can tell that employee name associated with it.

Functional dependency can be written as:

1. Emp_Id → Emp_Name

We can say that Emp_Name is functionally dependent on Emp_Id.


Types of Functional dependency

1. Trivial functional dependency


o A → B has trivial functional dependency if B is a subset of A.
o The following dependencies are also trivial like: A → A, B → B

Example:

1. Consider a table with two columns Employee_Id and Employee_Name.


2. {Employee_id, Employee_Name} → Employee_Id is a trivial functional dependency as
Employee_Id is a subset of {Employee_Id, Employee_Name}.
3. Also, Employee_Id → Employee_Id and Employee_Name → Employee_Name are trivial depend
encies too.
2. Non-trivial functional dependency
o A → B has a non-trivial functional dependency if B is not a subset of A.
o When A intersection B is NULL, then A → B is called as complete non-trivial.

Example:

1. ID → Name,
2. Name → DOB
Normalization

What is Normalization?
o Normalization is the process of organizing the data in the database.
o Normalization is used to minimize the redundancy from a relation or set of relations. It is also
used to eliminate undesirable characteristics like Insertion, Update, and Deletion Anomalies.
o Normalization divides the larger table into smaller and links them using relationships.
o The normal form is used to reduce redundancy from the database table.

Why do we need Normalization?

The main reason for normalizing the relations is removing these anomalies. Failure to eliminate
anomalies leads to data redundancy and can cause data integrity and other problems as the database
grows. Normalization consists of a series of guidelines that helps to guide you in creating a good
database structure.

Data modification anomalies can be categorized into three types:

o Insertion Anomaly: Insertion Anomaly refers to when we cannot insert a new tuple into a
relationship due to lack of data.
o Deletion Anomaly: The delete anomaly refers to the situation where the deletion of data results
in the relation loss of some other important data.
o Updatation Anomaly: The update anomaly is when an update of a single data value requires
multiple rows of data to be updated.
Types of Normal Forms:

Normalization works through a series of stages called Normal forms. The normal forms apply to
individual relations. The relation is said to be in particular normal form if it satisfies constraints.

Following are the various types of Normal forms:

Normal Description
Form

1NF A relation is in 1NF if it contains an atomic value.

2NF A relation will be in 2NF if it is in 1NF and all non-key attributes are fully functional

Dependent on the primary key.


3NF A relation will be in 3NF if it is in 2NF and no transition dependency exists.

BCNF A stronger definition of 3NF is known as Boyce Codd's normal form.

4NF A relation will be in 4NF if it is in Boyce Codd's normal form and has no multi-valued dependency.

5NF A relation is in 5NF. If it is in 4NF and does not contain any join dependency, joining should

be lossless.

Advantages of Normalization
o Normalization helps to minimize data redundancy.
o Greater overall database organization.
o Data consistency within the database.
o Much more flexible database design.
o Enforces the concept of relational integrity.

Disadvantages of Normalization
o You cannot start building the database before knowing what the user needs.
o The performance degrades when normalizing the relations to higher normal forms, i.e., 4NF,
5NF.
o It is very time-consuming and difficult to normalize relations of a higher degree.
o Careless decomposition may lead to a bad database design, leading to serious problems.

First Normal Form (1NF)


o A relation will be 1NF if it contains an atomic value.
o It states that an attribute of a table cannot hold multiple values. It must hold only single-valued
attribute.
o First normal form disallows the multi-valued attribute, composite attribute, and their
combinations.

Example: Relation EMPLOYEE is not in 1NF because of multi-valued attribute EMP_PHONE.

EMPLOYEE table:

EMP_ID EMP_NAME EMP_PHONE EMP_STATE

14 John 7272826385, UP
9064738238

20 Harry 8574783832 Bihar

12 Sam 7390372389, Punjab


8589830302

The decomposition of the EMPLOYEE table into 1NF has been shown below:

EMP_ID EMP_NAME EMP_PHONE EMP_STATE

14 John 7272826385 UP

14 John 9064738238 UP
20 Harry 8574783832 Bihar

12 Sam 7390372389 Punjab

12 Sam 8589830302 Punjab

Second Normal Form (2NF)


o In the 2NF, relational must be in 1NF.
o In the second normal form, all non-key attributes are fully functional dependent on the primary
key

Example: Let's assume, a school can store the data of teachers and the subjects they teach. In a school,
a teacher can teach more than one subject.

TEACHER table

TEACHER_ID SUBJECT TEACHER_AGE

25 Chemistry 30

25 Biology 30

47 English 35

83 Math 38
83 Computer 38

In the given table, non-prime attribute TEACHER_AGE is dependent on TEACHER_ID which is a


proper subset of a candidate key. That's why it violates the rule for 2NF.

To convert the given table into 2NF, we decompose it into two tables:Play V

TEACHER_DETAIL table:

TEACHER_ID TEACHER_AGE

25 30

47 35

83 38

TEACHER_SUBJECT table:

TEACHER_ID SUBJECT

25 Chemistry

25 Biology

47 English
83 Math

83 Computer

Third Normal Form (3NF)


o A relation will be in 3NF if it is in 2NF and not contain any transitive partial dependency.
o 3NF is used to reduce the data duplication. It is also used to achieve the data integrity.
o If there is no transitive dependency for non-prime attributes, then the relation must be in third
normal form.

A relation is in third normal form if it holds at least one of the following conditions for every non-
trivial function dependency X → Y.

1. X is a super key.
2. Y is a prime attribute, i.e., each element of Y is part of some candidate key.

Example:

EMPLOYEE_DETAIL table:

EMP_ID EMP_NAME EMP_ZIP EMP_STATE EMP_CITY

222 Harry 201010 UP Noida

333 Stephan 02228 US Boston

444 Lan 60007 US Chicago


555 Katharine 06389 UK Norwich

666 John 462007 MP Bhopal

Super key in the table above:

1. {EMP_ID}, {EMP_ID, EMP_NAME}, {EMP_ID, EMP_NAME, EMP_ZIP}....so on

Candidate key: {EMP_ID}

Non-prime attributes: In the given table, all attributes except EMP_ID are non-prime.

Here, EMP_STATE & EMP_CITY dependent on EMP_ZIP and EMP_ZIP dependent on


EMP_ID. The non-prime attributes (EMP_STATE, EMP_CITY) transitively dependent on
super key(EMP_ID). It violates the rule of third normal form.

That's why we need to move the EMP_CITY and EMP_STATE to the new
<EMPLOYEE_ZIP> table, with EMP_ZIP as a Primary key.

EMPLOYEE table:

EMP_ID EMP_NAME EMP_ZIP

222 Harry 201010

333 Stephan 02228

444 Lan 60007


555 Katharine 06389

666 John 462007

EMPLOYEE_ZIP table:

EMP_ZIP EMP_STATE EMP_CITY

201010 UP Noida

02228 US Boston

60007 US Chicago

06389 UK Norwich

462007 MP Bhopal

Boyce Codd normal form (BCNF)


o BCNF is the advance version of 3NF. It is stricter than 3NF.
o A table is in BCNF if every functional dependency X → Y, X is the super key of the table.
o For BCNF, the table should be in 3NF, and for every FD, LHS is super key.

Example: Let's assume there is a company where employees work in more than one department.

EMPLOYEE table:
EMP_ID EMP_COUNTRY EMP_DEPT DEPT_TYPE EMP_DEPT_NO

264 India Designing D394 283

264 India Testing D394 300

364 UK Stores D283 232

364 UK Developing D283 549

In the above table Functional dependencies are as follows:

1. EMP_ID → EMP_COUNTRY
2. EMP_DEPT → {DEPT_TYPE, EMP_DEPT_NO}

Candidate key: {EMP-ID, EMP-DEPT}Pla

The table is not in BCNF because neither EMP_DEPT nor EMP_ID alone are keys.

To convert the given table into BCNF, we decompose it into three tables:

EMP_COUNTRY table:

EMP_ID EMP_COUNTRY

264 India

264 India
EMP_DEPT table:

EMP_DEPT DEPT_TYPE EMP_DEPT_NO

Designing D394 283

Testing D394 300

Stores D283 232

Developing D283 549

EMP_DEPT_MAPPING table:

EMP_ID EMP_DEPT

D394 283

D394 300

D283 232

D283 549

Functional dependencies:

1. EMP_ID → EMP_COUNTRY
2. EMP_DEPT → {DEPT_TYPE, EMP_DEPT_NO}

Candidate keys:

For the first table: EMP_ID


For the second table: EMP_DEPT
For the third table: {EMP_ID, EMP_DEPT}

Now, this is in BCNF because left side part of both the functional dependencies is a key.

Relational Decomposition
o When a relation in the relational model is not in appropriate normal form then the decomposition
of a relation is required.
o In a database, it breaks the table into multiple tables.
o If the relation has no proper decomposition, then it may lead to problems like loss of
information.
o Decomposition is used to eliminate some of the problems of bad design like anomalies,
inconsistencies, and redundancy.
Types of Decomposition

Lossless Decomposition
o If the information is not lost from the relation after decompositon, then the decomposition will
be lossless.
o The lossless decomposition guarantees that the join of relations will result in the same relation
as it was decomposed.
o The relation is said to be lossless decomposition if natural joins of all the decomposition give
the original relation.

Example:

EMPLOYEE_DEPARTMENT table:
EMP_ID EMP_NAME EMP_AGE EMP_CITY DEPT_ID DEPT_NAME

22 Denim 28 Mumbai 827 Sales

33 Alina 25 Delhi 438 Marketing

46 Stephan 30 Bangalore 869 Finance

52 Katherine 36 Mumbai 575 Production

60 Jack 40 Noida 678 Testing

The above relation is decomposed into two relations EMPLOYEE and DEPARTMENT

EMPLOYEE table:Play

EMP_ID EMP_NAME EMP_AGE EMP_CITY

22 Denim 28 Mumbai

33 Alina 25 Delhi

46 Stephan 30 Bangalore
52 Katherine 36 Mumbai

60 Jack 40 Noida

DEPARTMENT table

DEPT_ID EMP_ID DEPT_NAME

827 22 Sales

438 33 Marketing

869 46 Finance

575 52 Production

678 60 Testing

Now, when these two relations are joined on the common column "EMP_ID", then the resultant relation
will look like:

Employee ⋈ Department

EMP_ID EMP_NAME EMP_AGE EMP_CITY DEPT_ID DEPT_NAME

22 Denim 28 Mumbai 827 Sales


33 Alina 25 Delhi 438 Marketing

46 Stephan 30 Bangalore 869 Finance

52 Katherine 36 Mumbai 575 Production

60 Jack 40 Noida 678 Testing

Hence, the decomposition is Lossless join decomposition.

Dependency Preserving
o It is an important constraint of the database.
o In the dependency preservation, at least one decomposed table must satisfy every dependency.
o If a relation R is decomposed into relation R1 and R2, then the dependencies of R either must
be a part of R1 or R2 or must be derivable from the combination of functional dependencies of
R1 and R2.
o For example, suppose there is a relation R (A, B, C, D) with functional dependency set (A-
hy>BC). The relational R is decomposed into R1(ABC) and R2(AD) which is dependency
preserving because FD A->BC is a part of relation R1(ABC).

What is Transaction?
A set of logically related operations is known as a transaction. The main operations of a transaction
are:

Read(A): Read operations Read(A) or R(A) reads the value of A from the database and stores it in a
buffer in the main memory.
Write (A): Write operation Write(A) or W(A) writes the value back to the database from the buffer.
(Note: It doesn’t always need to write it to a database back it just writes the changes to buffer this is
the reason where dirty read comes into the picture)

Let us take a debit transaction from an account that consists of the following operations:

1. R(A);
2. A=A-1000;
3. W(A);
Assume A’s value before starting the transaction is 5000.

• The first operation reads the value of A from the database and stores it in a buffer.
• the Second operation will decrease its value by 1000. So buffer will contain 4000.
• the Third operation will write the value from the buffer to the database. So A’s final value will be
4000.

But it may also be possible that the transaction may fail after executing some of its operations. The
failure can be because of hardware, software or power, etc. For example, if the debit transaction
discussed above fails after executing operation 2, the value of A will remain 5000 in the database which
is not acceptable by the bank. To avoid this, Database has two important operations:

Commit: After all instructions of a transaction are successfully executed, the changes made by a
transaction are made permanent in the database.

Rollback: If a transaction is not able to execute all operations successfully, all the changes made by a
transaction are undone.

Properties of a transaction:

Atomicity: As a transaction is a set of logically related operations, either all of them should be
execute d or none. A debit transaction discussed above should either execute all three operations or
none. If the debit transaction fails after executing operations 1 and 2 then its new value of 4000 will
not be updated in the database which leads to inconsistency.

Consistency: If operations of debit and credit transactions on the same account are executed
concurrently, it may leave the database in an inconsistent state.
• For Example, with T1 (debit of Rs. 1000 from A) and T2 (credit of 500 to A) executing
concurrently, the database reaches an inconsistent state.
• Let us assume the Account balance of A is Rs. 5000. T1 reads A(5000) and stores the value in its
local buffer space. Then T2 reads A(5000) and also stores the value in its local buffer space.
• T1 performs A=A-1000 (5000-1000=4000) and 4000 is stored in T1 buffer space. Then T2
performs A=A+500 (5000+500=5500) and 5500 is stored in the T2 buffer space. T1 writes the
value from its buffer back to the database.
• A’s value is updated to 4000 in the database and then T2 writes the value from its buffer back to
the database. A’s value is updated to 5500 which shows that the effect of the debit transaction is
lost and the database has become inconsistent.
• To maintain consistency of the database, we need concurrency control protocols which will be
discussed in the next article. The operations of T1 and T2 with their buffers and database have
been shown in Table 1.

T1 T1’s buffer space T2 T2’s Buffer Space Database

A=5000
R(A); A=5000 A=5000

A=5000 R(A); A=5000 A=5000

A=A-1000; A=4000 A=5000 A=5000

A=4000 A=A+500; A=5500

W(A); A=5500 A=4000

W(A); A=5500

Table 1

Isolation: The result of a transaction should not be visible to others before the transaction is committed.
For example, let us assume that A’s balance is Rs. 5000 and T1 debits Rs. 1000 from A. A’s new
balance will be 4000. If T2 credits Rs. 500 to A’s new balance, A will become 4500, and after this T1
fails. Then we have to roll back T2 as well because it is using the value produced by T1. So transaction
results are not made visible to other transactions before it commits.

Durable: Once the database has committed a transaction, the changes made by the transaction should
be permanent. e.g.; If a person has credited $500000 to his account, the bank can’t say that the update
has been lost. To avoid this problem, multiple copies of the database are stored at different locations.
DBMS Concurrency Control
Concurrency Control is the management procedure that is required for controlling concurrent execution
of the operations that take place on a database.

But before knowing about concurrency control, we should know about concurrent execution.

Concurrent Execution in DBMS


o In a multi-user system, multiple users can access and use the same database at one time, which
is known as the concurrent execution of the database. It means that the same database is
executed simultaneously on a multi-user system by different users.
o While working on the database transactions, there occurs the requirement of using the database
by multiple users for performing different operations, and in that case, concurrent execution of
the database is performed.
o The thing is that the simultaneous execution that is performed should be done in an interleaved
manner, and no operation should affect the other executing operations, thus maintaining the
consistency of the database. Thus, on making the concurrent execution of the transaction
operations, there occur several challenging problems that need to be solved.

Problems with Concurrent Execution

In a database transaction, the two main operations are READ and WRITE operations. So, there is a
need to manage these two operations in the concurrent execution of the transactions as if these
operations are not performed in an interleaved manner, and the data may become inconsistent. So, the
following problems occur with the Concurrent Execution of the operations:
Problem 1: Lost Update Problems (W - W Conflict)

The problem occurs when two different database transactions perform the read/write operations on the
same database items in an interleaved manner (i.e., concurrent execution) that makes the values of the
items incorrect hence making the database inconsistent.

For example:

Consider the below diagram where two transactions TX and TY, are performed on the same
account A where the balance of account A is $300.

o At time t1, transaction TX reads the value of account A, i.e., $300 (only read).
o At time t2, transaction TX deducts $50 from account A that becomes $250 (only deducted and
not updated/write).
o Alternately, at time t3, transaction TY reads the value of account A that will be $300 only
because TX didn't update the value yet.
o At time t4, transaction TY adds $100 to account A that becomes $400 (only added but not
updated/write).
o At time t6, transaction TX writes the value of account A that will be updated as $250 only, as
TY didn't update the value yet.
o Similarly, at time t7, transaction TY writes the values of account A, so it will write as done at
time t4 that will be $400. It means the value written by TX is lost, i.e., $250 is lost.

Hence data becomes incorrect, and database sets to inconsistent.

Dirty Read Problems (W-R Conflict)

The dirty read problem occurs when one transaction updates an item of the database, and somehow the
transaction fails, and before the data gets rollback, the updated database item is accessed by another
transaction. There comes the Read-Write Conflict between both transactions.

For example:

Consider two transactions TX and TY in the below diagram performing read/write operations on
account A where the available balance in account A is $300:
o At time t1, transaction TX reads the value of account A, i.e., $300.
o At time t2, transaction TX adds $50 to account A that becomes $350.
o At time t3, transaction TX writes the updated value in account A, i.e., $350.
o Then at time t4, transaction TY reads account A that will be read as $350.
o Then at time t5, transaction TX rollbacks due to server problem, and the value changes back to
$300 (as initially).
o But the value for account A remains $350 for transaction TY as committed, which is the dirty
read and therefore known as the Dirty Read Problem.

Unrepeatable Read Problem (W-R Conflict)

Also known as Inconsistent Retrievals Problem that occurs when in a transaction, two different values
are read for the same database item.

For example:
Consider two transactions, TX and TY, performing the read/write operations on account A,
having an available balance = $300. The diagram is shown below:

o At time t1, transaction TX reads the value from account A, i.e., $300.
o At time t2, transaction TY reads the value from account A, i.e., $300.
o At time t3, transaction TY updates the value of account A by adding $100 to the available
balance, and then it becomes $400.
o At time t4, transaction TY writes the updated value, i.e., $400.
o After that, at time t5, transaction TX reads the available value of account A, and that will be read
as $400.
o It means that within the same transaction TX, it reads two different values of account A, i.e., $
300 initially, and after updating made by transaction TY, it reads $400. It is an unrepeatable read
and is therefore known as the Unrepeatable read problem.
Thus, in order to maintain consistency in the database and avoid such problems that take place in
concurrent execution, management is needed, and that is where the concept of Concurrency Control
comes into role.

DBMS – Deadlock
In a multi-process system, deadlock is an unwanted situation that arises in a shared resource
environment, where a process indefinitely waits for a resource that is held by another process.

For example, assume a set of transactions {T0, T1, T2, ...,Tn}. T0 needs a resource X to complete its
task. Resource X is held by T1, and T1 is waiting for a resource Y, which is held by T2. T2 is waiting
for resource Z, which is held by T0. Thus, all the processes wait for each other to release resources. In
this situation, none of the processes can finish their task. This situation is known as a deadlock.

Deadlocks are not healthy for a system. In case a system is stuck in a deadlock, the transactions involved
in the deadlock are either rolled back or restarted.

Deadlock Prevention

To prevent any deadlock situation in the system, the DBMS aggressively inspects all the operations,
where transactions are about to execute. The DBMS inspects the operations and analyzes if they can
create a de adlock situation. If it finds that a deadlock situation might occur, then that transaction is
never allowed to be executed.

There are deadlock prevention schemes that use timestamp ordering mechanism of transactions in order
to predetermine a deadlock situation.

Wait-Die Scheme

This scheme allows the older transaction to wait but kills the younger one.
Wound-Wait Scheme

This scheme, allows the younger transaction to wait; but when an older transaction requests an item
held by a younger one, the older transaction forces the younger one to abort and release the item.

Redundant Array of Independent Disks(RAID)

In the Redundant Array of Independent Disks technology, two or more secondary storage devices are
connected so that the devices operate as one storage medium. A RAID array consists of several disks
linked together for a variety of purposes.

Disk arrays are categorized by their RAID levels.

• RAID 0: At this level, disks are organized in a striped array. Blocks of data are divided into
disks and distributed over disks. Parallel writing and reading of data occur on each disk. This
improves performance and speed. Level 0 does not support parity and backup.

Raid-0

• RAID 1: Mirroring is used in RAID 1. A RAID controller copies data across all disks in an
array when data is sent to it. In case of failure, RAID level 1 provides 100% redundancy.
Raid-1

• RAID 2: The data in RAID 2 is striped on different disks, and the Error Correction Code is
recorded using Hamming distance. Similarly to level 0, each bit within a word is stored on a
separate disk, and ECC codes for the data words are saved on a separate set of disks. As a result
of its complex structure and high cost, RAID 2 cannot be commercially deployed.

Raid-2

• RAID 3: Data is striped across multiple disks in RAID 3. Data words are parsed to generate a
parity bit. It is stored on a different disk. Thus, single-disk failures can be avoided.
Raid-3

• RAID 4: This level involves writing an entire block of data onto data disks, and then generating
the parity and storing it somewhere else. At level 3, bytes are striped, while at level 4, blocks are
striped. Both levels 3 and 4 require a minimum of three disks.

Raid-4

• RAID 5: The data blocks in RAID 5 are written to different disks, but the parity bits are spread
out across all the data disks rather than being stored on a separate disk.
Raid-5

• RAID 6: The RAID 6 level extends the level 5 concept. A pair of independent parities are
generated and stored on multiple disks at this level. A pair of independent parities are generated
and stored on multiple disks at this level. Ideally, you need four disk drives for this level.

Raid-6
Record: The term record in DBMS refers to a collection of items or data organized within a table
within a set of fields related to a particular topic or theme. As an example, student records are kept
based on their grades and scores in college departments.

File Organization in DBMS


A database consists of a huge amount of data. The data is grouped within a table in RDBMS, and each
table has related records. A user can see that the data is stored in the form of tables, but in actuality,
this huge amount of data is stored in physical memory in the form of files.

File
A file is named a collection of related information that is recorded on secondary storage such
as magnetic disks, magnetic tapes, and optical disks.

The Objective of File Organization

• It helps in the faster selection of records i.e. it makes the process faster.
• Different Operations like inserting, deleting, and updating different records are faster and easier.
• It prevents us from inserting duplicate records via various operations.
• It helps in storing the records or the data very efficiently at a minimal cost.

Types of File Organizations


Various methods have been introduced to Organize files. These particular methods have advantages
and disadvantages on the basis of access or selection. Thus it is all upon the programmer to decide
the best-suited file Organization method according to his requirements.

Some types of File Organizations are:


• Sequential File Organization
• Heap File Organization
• Hash File Organization
• B+ Tree File Organization
• Clustered File Organization
• ISAM (Indexed Sequential Access Method)

Heap (Unordered) File Organization

• Description: Data is stored randomly in no particular order.


• Advantages: Simple to implement and suitable for small data volumes.
• Disadvantages: Inefficient for search as it requires scanning the entire file.

2. Sequential File Organization

• Description: Data is stored in a sequential order based on a sorting key.


• Advantages: Fast and sorted data retrieval.
• Disadvantages: Insertion and deletion require rearranging the file.

3. Clustered File Organization

• Description: Similar records are stored together based on a clustering field.


• Advantages: Efficient for queries accessing related records.
• Disadvantages: Complex to maintain and less flexible.

4. Hash File Organization

• Description: Uses a hash function to compute the address of a record.


• Advantages: Fast for equality searches.
• Disadvantages: Poor performance for range queries.
5. B+ Tree File Organization

• Description: Data is stored hierarchically in a balanced tree structure.


• Advantages: Efficient for both equality and range queries.
• Disadvantages: Requires additional storage for tree maintenance.

6. Indexed File Organization

• Description: An index is created to map keys to record locations, like a book's index.
• Advantages: Fast retrieval through index traversal.
• Disadvantages: Extra storage required for the index.

You might also like