unit-3 DBMS
unit-3 DBMS
SQL: QUERIES, CONSTRAINTS, TRIGGERS: form of basic SQL query, UNION, INTERSECT,
and EXCEPT, Nested Queries, aggregation operators, NULL values, complex integrity constraints
in SQL, triggers and active databases. Schema Refinement: Problems caused by redundancy,
decompositions, problems related to decomposition, reasoning about functional dependencies, First,
Second, Third normal forms, BCNF, lossless join decomposition, multivalued dependencies, Fourth
normal form, Fifth normal form.
SQL commands: SQL commands are essential for managing databases effectively. These
commands are divided into categories such as Data Definition Language (DDL), Data
Manipulation Language (DML), Data Control Language (DCL), Data Query Language
(DQL), and Transaction Control Language (TCL).
1
1. Data Definition Language (DDL) in SQL
DDL or Data Definition Language actually consists of the SQL commands that can be
used to defining, altering, and deleting database structures such as tables, indexes,
and schemas. It simply deals with descriptions of the database schema and is used
to create and modify the structure of database objects in the database
Common DDL Commands
Command Description Syntax
Example of DDL
CREATE TABLE employees (
employee_id INT PRIMARY KEY,
first_name VARCHAR(50),
last_name VARCHAR(50),
hire_date DATE
);
In this example, a new table called employees is created with columns for employee ID,
first name, last name, and hire date.
2
to it. This command allows getting the data out of the database to perform operations with
it. When a SELECT is fired against a table or tables the result is compiled into a
further temporary table, which is displayed or perhaps received by the program.
DQL Command
Command Description Syntax
Example of DQL
SELECT first_name, last_name, hire_date
FROM employees
WHERE department = 'Sales'
ORDER BY hire_date DESC;
This query retrieves employees’ first and last names, along with their hire dates, from the
employees table, specifically for those in the ‘Sales’ department, sorted by hire date.
3. Data Manipulation Language (DML) in SQL
The SQL commands that deal with the manipulation of data present in the database
belong to DML or Data Manipulation Language and this includes most of the SQL
statements. It is the component of the SQL statement that controls access to data and to
the database. Basically, DCL statements are grouped with DML statements.
Common DML Commands
Command Description Syntax
Table control
LOCK LOCK TABLE table_name IN lock_mode;
concurrency
Call a PL/SQL or
CALL CALL procedure_name(arguments);
JAVA subprogram
3
Example of DML
INSERT INTO employees (first_name, last_name, department)
VALUES ('Jane', 'Smith', 'HR');
This query inserts a new record into the employees table with the first name ‘Jane’, last
name ‘Smith’, and department ‘HR’.
4. Data Control Language (DCL) in SQL
DCL (Data Control Language) includes commands such
as GRANT and REVOKE which mainly deal with the rights, permissions, and other
controls of the database system. These commands are used to control access to data in the
database by granting or revoking permissions.
Common DCL Commands
Command Description Syntax
Example of DCL
GRANT SELECT, UPDATE ON employees TO user_name;
This command grants the user user_name the permissions to select and update records in
the employees table.
5. Transaction Control Language (TCL) in SQL
Transactions group a set of tasks into a single execution unit. Each transaction begins
with a specific task and ends when all the tasks in the group are successfully completed. If
any of the tasks fail, the transaction fails. Therefore, a transaction has only two
results: success or failure. We can explore more about transactions here.
Common TCL Commands
Command Description Syntax
4
Command Description Syntax
Example of TCL
BEGIN TRANSACTION;
UPDATE employees SET department = 'Marketing' WHERE department = 'Sales';
SAVEPOINT before_update;
UPDATE employees SET department = 'IT' WHERE department = 'HR';
ROLLBACK TO SAVEPOINT before_update;
COMMIT;
In this example, a transaction is started, changes are made, and a savepoint is set. If
needed, the transaction can be rolled back to the savepoint before being committed.
Important SQL Commands
1. SELECT: Used to retrieve data from a database.
2. INSERT: Used to add new data to a database.
3. UPDATE: Used to modify existing data in a database.
4. DELETE: Used to remove data from a database.
5. CREATE TABLE: Used to create a new table in a database.
6. ALTER TABLE: Used to modify the structure of an existing table.
7. DROP TABLE: Used to delete an entire table from a database.
8. WHERE: Used to filter rows based on a specified condition.
9. ORDER BY: Used to sort the result set in ascending or descending order.
10. JOIN: Used to combine rows from two or more tables based on a related column
between them.
QUERIES:
A query in a DBMS is a request made by a user or application to retrieve or manipulate
data stored in a database. This request is typically formulated using a structured query
language (SQL) or a query interface provided by the DBMS. The primary purpose of a
query is to specify precisely what data is needed and how it should be retrieved or
modified.
SQL (Structured Query Language)
A standardized programming language used to interact with relational databases. SQL
provides a set of commands for querying, updating, and managing databases.
Table
A fundamental component of a relational database, representing a collection of related data
organized into rows and columns. Each table in a database typically corresponds to a
specific entity or concept.
Field/Column
5
A single piece of data stored within a table, representing a specific attribute or
characteristic of the entities described by the table.
Record/Row
A complete set of data representing an individual instance or entity stored within a table.
Each row contains values for each field/column defined in the table schema.
Primary Key
A unique identifier for each record in a table,ensuring that each row can be uniquely
identified and accessed. Primary keys are used to establish relationships between tables and
enforce data integrity.
Query Language
The language used to communicate with a database management system. This language
allows users to perform operations such as data retrieval, manipulation, and schema
definition.
Major Commands in SQL with Examples
To illustrate the major SQL commands, let's use a SQLite database file named
`company.db`, which contains a table named `employees`. We'll demonstrate various SQL
commands with real changes to this database.
Example Database Structure
Table: employees
1 John Doe 30 HR
3 Michael Lee 40 IT
SELECT Statement
The SELECT statement is used to retrieve data from one or more tables in a database.
Syntax
SELECT column1, column2, ...
FROM table_name
WHERE condition;
Example
SELECT * FROM employees WHERE department = 'IT';
This query selects all columns from the "employees" table where the department is 'IT'.
Output
3|Michael Lee|40|IT
INSERT Statement
The INSERT statement is used to add new records into a table.
Syntax
INSERT INTO table_name (column1, column2, ...)
VALUES (value1, value2, ...);
6
Example
INSERT INTO employees (name, age, department)
VALUES ('Sarah Johnson', 28, 'Marketing');
This query inserts a new employee record into the "employees" table with specified values.
Output
To see, if the new data has been successfully inserted, you can execute the SELECT
command, like this
SELECT * FROM employees;
Now, you'll get the entire table and you can see that the new data has been added to the
database
1|John Doe|30|HR
2|Jane Smith|35|Finance
3|Michael Lee|40|IT
4|Sarah Johnson|28|Marketing
UPDATE Statement
The UPDATE statement is used to modify existing records in a table.
Syntax
UPDATE table_name
SET column1 = value1, column2 = value2, ...
WHERE condition;
Example
UPDATE employees
SET department = 'Operations'
WHERE name = 'Michael Lee';
This query updates the department of the employee named 'Michael Lee' to 'Operations'.
Output
Let's run the SELECT command to see the updated database
SELECT * FROM employees;
You can see that, the database has been updated and now Michael's department is set to
Operations
1|John Doe|30|HR
2|Jane Smith|35|Finance
3|Michael Lee|40|Operations
4|Sarah Johnson|28|Marketing
DELETE Statement
The DELETE statement is used to remove existing records from a table.
Syntax
DELETE FROM table_name
WHERE condition;
Example
DELETE FROM employees
WHERE age > 35;
This query deletes records from the "employees" table where the age is greater than 35.
Output
Execute the SELECT command to check the updated database:
SELECT * FROM employees;
7
You can see that Michael has been removed from the database as he is the only one with an
age over 35.
1|John Doe|30|HR
2|Jane Smith|35|Finance
4|Sarah Johnson|28|Marketing
CONSTRAINTS:
Types of Constraints:
Domain Constraints:
These define the permissible values or types of data that a column can hold.
Examples: NOT NULL (prevents null values), CHECK (ensures values meet specific
criteria), DEFAULT (sets a default value).
Entity Integrity Constraints:
These ensure that each record or row in a table is unique and identifiable.
Examples: PRIMARY KEY (uniquely identifies each row), UNIQUE (ensures uniqueness for
specific columns).
Referential Integrity Constraints:
These maintain consistency between related tables by ensuring that foreign key values exist
in the primary key of the related table.
Example: FOREIGN KEY (establishes relationships between tables).
Informational Constraints:
These are attributes of certain constraints that are not enforced by the database manager but
can be used for optimization or documentation purposes.
Example: COMMENT (provides information about a constraint).
TRIGGERS:
SQL triggers are a critical feature in database management systems (DBMS) that provide
automatic execution of a set of SQL statements when specific database events, such
as INSERT, UPDATE, or DELETE operations, occur. Triggers are commonly used to
maintain data integrity, track changes, and enforce business rules automatically, without
needing manual input.
Syntax
create trigger [trigger_name]
[before | after]
{insert | update | delete}
on [table_name]
FOR EACH ROW
BEGIN
END;
8
Types of SQL Triggers
Triggers can be categorized into different types based on the action they are associated
with:
1. DDL Triggers
The Data Definition Language (DDL) command events such
as Create_table, Create_view, drop_table, Drop_view, and Alter_table cause the DDL
triggers to be activated. They allow us to track changes in the structure of the database. The
trigger will prevent any table creation, alteration, or deletion in the database.
2.DML Triggers
The Data manipulation Language (DML) command events that begin with Insert, Update,
and Delete set off the DML triggers. DML triggers are used for data validation, ensuring
that modifications to a table are done under controlled conditions.
3. Logon Triggers
These triggers are fired in response to logon events. Logon triggers are useful
for monitoring user sessions or restricting user access to the database. As a result,
the PRINT statement messages and any errors generated by the trigger will all be visible
in the SQL Server error log. Authentication errors prevent logon triggers from being
used. These triggers can be used to track login activity or set a limit on the number of
sessions that a given login can have in order to audit and manage server sessions.
1. SELECT Clause: This is where you specify the columns you want to retrieve. Use an
asterisk (*) to retrieve all columns.
2. FROM Clause: This specifies from which table or tables you want to retrieve the data.
3. WHERE Clause (optional): This allows you to filter the results based on a condition.
4. DISTINCT Clause (optional): is an optional keyword indicating that the answer should
not contain duplicates. Normally if we write the SQL without DISTINCT operator then it
does not eliminate the duplicates.
9
Here are the primary components of SQL queries:
UNION:
The SQL UNION operator is used to combine the result sets of two or more SELECT
queries into a single result set. It is a powerful tool in SQL that helps aggregate data from
multiple tables, especially when the tables have similar structures.
In this guide, we’ll explore the SQL UNION operator, how it differs from UNION ALL,
and provide detailed examples to demonstrate its usage.
The SQL UNION operator combines the results of two or more SELECT statements into
one result set. By default, UNION removes duplicate rows, ensuring that the result set
contains only distinct records.
There are some rules for using the SQL UNION operator.
Rules for SQL UNION
Each table used within UNION must have the same number of columns.
The columns must have the same data types.
The columns in each table must be in the same order.
Syntax:
The Syntax of the SQL UNION operator is:
SELECT columnnames FROM table1
UNION
SELECT columnnames FROM table2;
UNION operator provides unique values by default. To find duplicate values, use UNION
ALL.
Note: SQL UNION and UNION ALL difference is that UNION operator removes duplicate rows from
results set and
UNION ALL operator retains all rows, including duplicate.
Examples of SQL UNION
Let’s look at an example of UNION operator in SQL to understand it better.
Let’s create two tables “Emp1” and “Emp2”;
Emp1 Table
Write the following SQL query to create Emp1 table.
CREATE TABLE Emp1(
10
EmpID INT PRIMARY KEY,
Name VARCHAR(50),
Country VARCHAR(50),
Age int(2),
mob int(10)
);
-- Insert some sample data into the Customers table
INSERT INTO Emp1 (EmpID, Name,Country, Age, mob)
VALUES (1, 'Shubham', 'India','23','738479734'),
(2, 'Aman ', 'Australia','21','436789555'),
(3, 'Naveen', 'Sri lanka','24','34873847'),
(4, 'Aditya', 'Austria','21','328440934'),
(5, 'Nishant', 'Spain','22','73248679');
Emp1 Table
Emp2 Table
Write the following SQL query to create Emp2 table
CREATE TABLE Emp2(
EmpID INT PRIMARY KEY,
Name VARCHAR(50),
Country VARCHAR(50),
Age int(2),
mob int(10)
);
-- Insert some sample data into the Customers table
INSERT INTO Emp2 (EmpID, Name,Country, Age, mob)
VALUES (1, 'Tommy', 'England','23','738985734'),
(2, 'Allen', 'France','21','43678055'),
(3, 'Nancy', 'India','24','34873847'),
(4, 'Adi', 'Ireland','21','320254934'),
(5, 'Sandy', 'Spain','22','70248679');
11
Output:
Emp2 Table
output
12
Country
Australia
Austria
England
France
India
India
Ireland
Spain
Spain
Sri lanka
13
INTERSECT:
In SQL, the INTERSECT clause is used to retrieve the common
records between two SELECT queries. It returns only the rows that are present in both
result sets. This makes INTERSECT an essential clause when we need to find overlapping
data between two or more queries.
In this article, we will explain the SQL INTERSECT clause, its syntax, key
characteristics, and examples. We will also explore its usage with conditions
like BETWEENand LIKE, along with performance considerations and alternatives.
What is SQL INTERSECT?
The INTERSECT clause in SQL is used to combine two SELECT statements but the
dataset returned by the INTERSECT statement will be the intersection of the data sets of
the two SELECT statements. In simple words, the INTERSECT statement will return
only those rows that will be common to both of the SELECT statements.
The INTERSECT operator is a set operation in SQL, similar to UNION and EXCEPT.
While UNION combines results from two queries and removes
duplicates, INTERSECT returns only the records that exist in both queries, ensuring
uniqueness.
Customers Table
Orders Table
Orders Table
15
In this example, we retrieve customers who exist in both
the Customers and Orders tables. The INTERSECT operator ensures that only those
customers who have placed an order appear in the result.
Query:
SELECT CustomerID
FROM Customers
INTERSECT
SELECT CustomerID
FROM Orders;
Output:
CustomerID
8
Explanation:
The query returns only those customers who appear in both
the Customers and Orders tables.
If a customer exists in Customers but has never placed an order, they won’t appear in the
result.
Customer IDs 2, 3, 5, 6, 7, and 8 appear in both the Customers and Orders tables
Example 2: Using INTERSECT with BETWEEN Operator
In this example, we apply the INTERSECT operator along with the BETWEEN condition to
filter records based on a specified range. The query retrieves customers
whose CustomerID falls between 3 and 8 and who have placed an order. The result
contains only the common CustomerID values that meet both conditions.
Query:
SELECT CustomerID
FROM Customers
WHERE CustomerID BETWEEN 3 AND 8
INTERSECT
SELECT CustomerID
FROM Orders;
Output:
16
CustomerID
8
Explanation:
The first SELECT statement filters customers with CustomerIDbetween 3 and 8.
The INTERSECT operator ensures that only customers from this filtered set who have
placed an order are included in the result.
Customers 3, 5, 6, 7, and 8 fall within the specified range (3 to 8).
Example 3: Using INTERSECT with LIKE Operator
In this example, we use the INTERSECT operator along with the LIKE operator to find
common customers whose FirstName starts with the letter ‘J’ in both
the Customers and Orders tables.
Query:
SELECT CustomerID
FROM Customers
WHERE FirstName LIKE 'J%'
INTERSECT
SELECT CustomerID
FROM Orders;
Output:
CustomerID
2
Explanation:
The query finds customers whose first name starts with ‘J’
in both the Customers and Orders tables.
The INTERSECT operator ensures that only those customers who have placed an order
are included in the result.
The final output includes Customer 2 (Jane) only, as per the given example.
EXCEPT:
17
The SQL EXCEPT operator is used to return the rows from the first SELECT
statement that are not present in the second SELECT statement. This operator is
conceptually similar to the subtract operator in relational algebra. It is particularly
useful for excluding specific data from your result set.
The SQL EXCEPT operator allows you to return the rows that exist in the first result set
but not in the second. It is useful for finding records in one table that do not have
corresponding records in another table.
Syntax:
SELECT column_name(s)
FROM table1
EXCEPT
SELECT column_name(s)
FROM table2;
Students Table
-- Create Students Table
CREATE TABLE Students (
StudentID INT PRIMARY KEY,
Name VARCHAR(100),
Course VARCHAR(100)
);
1 Rohan DBMS
2 Kevin OS
3 Mansi DBMS
4 Mansi ADA
5 Rekha ADA
6 Megha OS
18
Teaching Assistant Table
-- Create Teaching Assistant Table
CREATE TABLE TA (
StudentID INT PRIMARY KEY,
Name VARCHAR(100),
Course VARCHAR(100)
);
1 Kevin TOC
2 Sita IP
3 Manik AP
4 Rekha SNS
Mansi
Megha
Rohan
19
Explanation: The EXCEPT operator returns the names that are present in the Students
table but not in the TA table. Notice that “Rohan”, “Mansi”, and “Megha” are returned, as
these students are not listed in the TA table.
Example 2: Retaining Duplicates with EXCEPTALL
By default, EXCEPT removes duplicates from the result set. To retain duplicates, you can
use EXCEPT ALL instead.
Query:
SELECT NameFROM StudentsEXCEPTSELECT NameFROM TA;
Output:
Name
Rohan
Mansi
Mansi
Megha
Explanation: In this case, “Mansi” appears twice in the output because it appears twice in
the Students table and is not in the TA table. EXCEPT ALL retains duplicates from the
first result set.
SQL EXCEPT vs. SQL NOT IN
While both EXCEPT and NOT IN are used to exclude certain records, there are important
differences between them.
Feature EXCEPT NOT IN
Generally more efficient for large May be slower for large datasets,
Performance datasets as it processes only the especially when checking
required rows multiple conditions
When you need to find rows that When you need to check a
Use Case exist in one result set but not the specific column’s values against
other a list
20
Nested Queries:
Nested queries in SQL are a powerful tool for retrieving data from databases in a
structured and efficient manner. They allow us to execute a query within another query,
making it easier to handle complex data operations.
This article explores everything we need to know about SQL nested queries, including
types, syntax, examples, and outputs. By the end of this guide, we’ll be able to use nested
queries confidently for tasks like filtering, aggregation, and data extraction.
To better understand nested queries, we will use the following sample tables: STUDENT,
COURSE, and STUDENT_COURSE. These tables simulate a real-world scenario of
students, courses, and their enrollment details, which will be used in the examples below.
1. STUDENT Table
The STUDENT table stores information about students, including their unique ID, name,
address, phone number, and age.
STUDENT Table
2. COURSE Table
The STUDENT_COURSE table maps students to the courses they have enrolled in. It
uses the student and course IDs as foreign keys.
COURSE Table
3. STUDENT_COURSE Table
This table maps students to the courses they have enrolled in, with columns for student ID
(S_ID) and course ID (C_ID):
Student_Course Table
Aggregation operators:
21
In a DBMS, aggregation operators are used to perform operations on a group of values to return
a single summarizing value. The most common aggregation operators include COUNT, SUM,
AVG, MIN, and MAX.
Here are some examples of how you might use these operators:
COUNT
Returns the number of rows that matches a specified criterion.
Syntax
COUNT(expression)
Example:
SUM
Returns the total sum of a numeric column.
Syntax
SUM(expression)
Example:
AVG
Returns the average value of a numeric column.
Syntax
AVG(expression)
Example:
22
MIN
Returns the smallest value of the selected column.
Syntax
MIN(expression)
Example:
MAX
Returns the largest value of the selected column.
Syntax
MAX(expression)
Example:
This query would return the highest salary for each department in the Employees table.
Null Values:
In SQL, some records in a table may not have values for every field, and such fields
are termed as NULL values. These occur when data is unavailable during entry or
23
when the attribute does not apply to a specific record. To handle such scenarios, SQL
provides a special placeholder value called NULL to represent unknown, unavailable,
or inapplicable data.
Importance of NULL Value
It is essential to understand that a NULL value differs from a zero or an empty string.
A NULL value represents missing or undefined data. Since it is often not possible to
determine which interpretation applies, SQL treats all NULL values as distinct and
does not distinguish between them. Typically, it can have one of three
interpretations:
1. Value Unknown: The value exists but is not known.
2. Value Not Available: The value exists but is intentionally withheld.
3. Attribute Not Applicable: The value is undefined for a specific record.
Principles of NULL values
Setting a NULL value is appropriate when the actual value is unknown , or when a
value is not meaningful.
A NULL value is not equivalent to a value of ZERO if the data type is a number
and is not equivalent to spaces if the data type is a character.
A NULL value can be inserted into columns of any data type.
A NULL value will evaluate NULL in any expression.
Suppose if any column has a NULL value, then UNIQUE, FOREIGN key, and
CHECK constraints will ignore by SQL.
Logical Behavior
SQL uses three-valued logic (3VL) : TRUE, FALSE, and UNKNOWN. Logical expressions
involving NULL return UNKNOWN.
AND: Returns FALSE if one operand is FALSE; otherwise, returns UNKNOWN.
OR: Returns TRUE if one operand is TRUE; otherwise, returns UNKNOWN.
NOT: Negates the operand; UNKNOWN remains UNKNOWN.
Logical Behaviour of OR
24
SQL allows queries that check whether an attribute value is NULL. Rather than using
= or to compare an attribute value to NULL, SQL uses IS and IS NOT. This is
because SQL considers each NULL value as being distinct from every other NULL
value, so equality comparison is not appropriate.
Example: Employee Table
CREATE TABLE Employee (
Fname VARCHAR(50),
Lname VARCHAR(50),
SSN VARCHAR(11),
Phoneno VARCHAR(15),
Salary FLOAT
);
Employee Table
25
Output
IS NULL Operator
26
Integrity constraints in SQL:
Integrity constraints in SQL are rules that help ensure the accuracy and reliability of data in the
database. They ensure that certain conditions are met when data is inserted, updated, or deleted.
While primary key, unique, and foreign key constraints are commonly discussed and used, SQL
allows for more complex constraints through the use of CHECK and custom triggers. Here are
some examples of complex integrity constraints:
27
3. Using Stored Procedures
Sometimes, instead of direct data manipulation on tables, using stored procedures can help
maintain more complex integrity constraints by wrapping logic inside the procedure. For
instance, you could have a procedure that checks several conditions before inserting a record.
4. Using TRIGGERS
A trigger is a procedural code in a database that automatically executes in response to certain
events on a particular table or view. Essentially, triggers are special types of stored procedures
that run automatically when an INSERT, UPDATE, or DELETE operation occurs.
A trigger is a predefined action that the database automatically executes in response to certain
events on a particular table or view. Triggers are typically used to maintain the integrity of the
data, automate data-related tasks, and extend the database functionalities.
When implementing complex constraints, it's crucial to strike a balance. While they can ensure data
integrity, they can also add overhead to the database system and increase the complexity of the schema
and the operations performed on it. Proper documentation and understanding of each constraint's purpose
are essential.
Triggers and active databases are closely related concepts in the domain of DBMS.
Let's delve into what each of them means and how they are interconnected.
Triggers
There are various types of triggers based on when they are executed:
BEFORE: Trigger is executed before the triggering event.
AFTER: Trigger is executed after the triggering event.
INSTEAD OF: Trigger is used to override the triggering event, primarily for views.
They can also be categorized by the triggering event:
28
INSERT: Trigger is executed when a new row is inserted.
UPDATE: Trigger is executed when a row is updated.
DELETE: Trigger is executed when a row is deleted.
Here's the basic syntax for creating a trigger in SQL, using MySQL as an
Syntax
Example of a Trigger
Suppose we have an `Employees` table and we want to maintain an `AuditLog` table that keeps
a record of salary changes for employees.
Employees Table
CREATE TABLE Employees (
EmployeeID INT PRIMARY KEY,
Name VARCHAR(255),
Salary DECIMAL(10, 2)
);
29
AuditLog Table
CREATE TABLE AuditLog (
LogID INT AUTO_INCREMENT PRIMARY KEY,
EmployeeID INT,
OldSalary DECIMAL(10, 2),
NewSalary DECIMAL(10, 2),
ChangeDate DATETIME
);
Now, let's create a trigger that automatically inserts a record into the `AuditLog` table whenever
there's an update to the `Salary` column in the `Employees` table.
Trigger
mysql> DELIMITER //
mysql> CREATE TRIGGER AfterSalaryUpdate
AFTER UPDATE ON Employees
FOR EACH ROW
BEGIN
IF OLD.Salary != NEW.Salary THEN
INSERT INTO AuditLog (EmployeeID, OldSalary, NewSalary, ChangeDate)
VALUES (OLD.EmployeeID, OLD.Salary, NEW.Salary, NOW());
END IF;
END;
//
mysql> DELIMITER ;
Schema Refinement:
30
Insertion Anomaly: Difficulty in adding new data without also adding related data that
is not yet available.
Update Anomaly: Changes to data in one place might not be reflected in other places
where the same data is stored.
Deletion Anomaly: Deleting data can unintentionally remove other related data.
Data Integrity:
Schema refinement helps maintain the accuracy and consistency of the data.
Database Efficiency:
A well-designed schema can improve query performance and database maintenance.
Techniques for Schema Refinement
Normalization: A systematic approach to organizing data in tables to minimize
redundancy and anomalies.
Decomposition: Breaking down large tables into smaller, more manageable tables.
Functional Dependencies: Identifying relationships between attributes to guide the
schema design.
Integrity Constraints: Rules that enforce data consistency and validity.
It can be observed that values of attribute college name, college rank, and course are
being repeated which can lead to problems. Problems caused due to redundancy are:
Insertion anomaly
31
Deletion anomaly
Updation anomaly
Insertion Anomaly
If a student detail has to be inserted whose course is not being decided yet then
insertion will not be possible till the time course is decided for the student.
Student_ID
Name Contact College Course Rank
This problem happens when the insertion of a data record is not possible without
adding some additional unrelated data to the record.
Deletion Anomaly
If the details of students in this table are deleted then the details of the college will
also get deleted which should not occur by common sense. This anomaly happens
when the deletion of a data record results in losing some unrelated information that
was stored as part of the record that was deleted from a table.
It is not possible to delete some information without losing some other information
in the table as well.
Updation Anomaly
Suppose the rank of the college changes then changes will have to be all over the
database which will be time-consuming and computationally costly.
Student_ID Name Contact College Course Rank
All places should be updated, If updation does not occur at all places then the
database will be in an inconsistent state.
Redundancy in a database occurs when the same data is stored in multiple places.
Redundancy can cause various problems such as data inconsistencies, higher storage
requirements, and slower data retrieval.
32
Problems Caused Due to Redundancy
Data Inconsistency: Redundancy can lead to data inconsistencies, where the same
data is stored in multiple locations, and changes to one copy of the data are not
reflected in the other copies. This can result in incorrect data being used in
decision-making processes and can lead to errors and inconsistencies in the data.
Storage Requirements: Redundancy increases the storage requirements of a
database. If the same data is stored in multiple places, more storage space is
required to store the data. This can lead to higher costs and slower data retrieval.
Update Anomalies: Redundancy can lead to update anomalies, where changes
made to one copy of the data are not reflected in the other copies. This can result
in incorrect data being used in decision-making processes and can lead to errors
and inconsistencies in the data.
Performance Issues: Redundancy can also lead to performance issues, as the
database must spend more time updating multiple copies of the same data. This
can lead to slower data retrieval and slower overall performance of the database.
Security Issues: Redundancy can also create security issues, as multiple copies of
the same data can be accessed and manipulated by unauthorized users. This can
lead to data breaches and compromise the confidentiality, integrity, and
availability of the data.
Maintenance Complexity: Redundancy can increase the complexity of database
maintenance, as multiple copies of the same data must be updated and
synchronized. This can make it more difficult to troubleshoot and resolve issues
and can require more time and resources to maintain the database.
Data Duplication: Redundancy can lead to data duplication, where the same data
is stored in multiple locations, resulting in wasted storage space and increased
maintenance complexity. This can also lead to confusion and errors, as different
copies of the data may have different values or be out of sync.
Data Integrity: Redundancy can also compromise data integrity, as changes made
to one copy of the data may not be reflected in the other copies. This can result in
inconsistencies and errors and can make it difficult to ensure that the data is
accurate and up-to-date.
Usability Issues: Redundancy can also create usability issues, as users may have
difficulty accessing the correct version of the data or may be confused by
inconsistencies and errors. This can lead to frustration and decreased
productivity, as users spend more time searching for the correct data or
correcting errors.
To prevent redundancy in a database, normalization techniques can be used.
Normalization is the process of organizing data in a database to eliminate
redundancy and improve data integrity. Normalization involves breaking down a
larger table into smaller tables and establishing relationships between them. This
reduces redundancy and makes the database more efficient and reliable.
33
Decompositions:
Decomposition in DBMS
Types of Decomposition
There are two types of Decomposition:
Lossless Decomposition
Lossy Decomposition
Types of Decomposition
Lossless Decomposition
The process in which where we can regain the original relation R with the help of
joins from the multiple relations formed after decomposition. This process is termed
34
as lossless decomposition. It is used to remove the redundant data from the database
while retaining the useful information. The lossless decomposition tries to ensure
following things:
While regaining the original relation, no information should be lost.
If we perform join operation on the sub-divided relations, we must get the
original relation.
Example:
There is a relation called R(A, B, C)
A B C
55 16 27
48 52 89
55 16
48 52
R2(B, C)
B C
16 27
52 89
After performing the Join operation we get the same original relation
A B C
55 16 27
48 52 89
35
Lossy Decomposition
As the name suggests, lossy decomposition means when we perform join operation
on the sub-relations it doesn't result to the same relation which was decomposed.
After the join operation, we always found some extraneous tuples. These extra
tuples genrates difficulty for the user to identify the original tuples.
Example:
We have a relation R(A, B, C)
A B C
1 2 1
2 5 3
3 3 3
1 2
2 5
3 3
R2(B, C)
B C
2 1
5 3
3 3
36
A B C
1 2 1
2 5 3
2 3 3
3 5 3
3 3 3
1. Loss of Information
Non-loss decomposition: When a relation is decomposed into two or more smaller relations, and
the original relation can be perfectly reconstructed by taking the natural join of the decomposed
relations, then it is termed as lossless decomposition. If not, it is termed "lossy decomposition."
Example: Let's consider a table `R(A, B, C)` with a dependency `A → B`. If you decompose it
into `R1(A, B)` and `R2(B, C)`, it would be lossy because you can't recreate the original table
using natural joins.
Example: Consider a relation R(A,B,C) with the following data:
|A |B |C |
|----|----|----|
|1 |X |P |
|1 |Y |P |
|2 |Z |Q |
37
|A |B |
|----|----|
|1 |X |
|1 |Y |
|2 |Z |
R2(A, C):
|A |C |
|----|----|
|1 |P |
|1 |P |
|2 |Q |
Now, if we take the natural join of R1 and R2 on attribute A, we get back the original relation R.
Therefore, this is a lossless decomposition.
38
However, if we had a functional dependency in R, say A → C, which cannot be determined from
either R1 or R2 without joining them, then the decomposition would not be dependency-
preserving for that specific FD.
3. Increased Complexity
Decomposition leads to an increase in the number of tables, which can complicate queries and
maintenance tasks. While tools and ORM (Object-Relational Mapping) libraries can mitigate this
to some extent, it still adds complexity.
4. Redundancy
Incorrect decomposition might not eliminate redundancy, and in some cases, can even introduce
new redundancies.
5. Performance Overhead
An increased number of tables, while aiding normalization, can also lead to more complex SQL
queries involving multiple joins, which can introduce performance overheads.
42 abc CO A4
39
roll_no name dept_name dept_building
43 pqr IT A3
44 xyz CO A4
45 xyz IT A3
46 mno EC B2
47 jkl ME B2
From the above table we can conclude some valid functional dependencies:
roll_no → { name, dept_name, dept_building }→ Here, roll_no can determine
values of fields name, dept_name and dept_building, hence a valid Functional
dependency
roll_no → dept_name , Since, roll_no can determine whole set of {name,
dept_name, dept_building}, it can determine its subset dept_name also.
dept_name → dept_building , Dept_name can identify the dept_building
accurately, since departments with different dept_name will also have a different
dept_building
More valid functional dependencies: roll_no → name, {roll_no, name} ⇢
{dept_name, dept_building}, etc.
Here are some invalid functional dependencies:
name → dept_name Students with the same name can have different
dept_name, hence this is not a valid functional dependency.
dept_building → dept_name There can be multiple departments in the same
building. Example, in the above table departments ME and EC are in the same
building B2, hence dept_building → dept_name is an invalid functional
dependency.
More invalid functional dependencies: name → roll_no, {name, dept_name} →
roll_no, dept_building → roll_no, etc.
Read more about What is Functional Dependency in DBMS ?
Types of Functional Dependencies in DBMS
1. Trivial functional dependency
40
2. Non-Trivial functional dependency
3. Multivalued functional dependency
4. Transitive functional dependency
1. Trivial Functional Dependency
In Trivial Functional Dependency, a dependent is always a subset of the
determinant. i.e. If X → Y and Y is the subset of X, then it is called trivial functional
dependency.
Symbolically: A→B is trivial functional dependency if B is a subset of A.
The following dependencies are also trivial: A→A & B→B
Example 1 :
ABC -> AB
ABC -> A
ABC -> ABC
Example 2:
roll_no name age
42 abc 17
43 pqr 18
44 xyz 18
41
roll_no name age
42 abc 17
43 pqr 18
44 xyz 18
Functional Dependency:
{StudentID,CourseID}→CourseID
This is semi non-trivial because:
Part of the dependent attribute ( Course_ID) is already included in the determinant
({Student_ID, Course_ID}).
42
However, the dependency is not completely trivial because
{StudentID}→CourseID is not implied directly.
4. Multivalued Functional Dependency
In Multivalued functional dependency, entities of the dependent set are not
dependent on each other. i.e. If a → {b, c} and there exists no functional dependency
between b and c, then it is called a multivalued functional dependency.
Example:
bike_model manuf_year color
In this table:
X: bike_model
Y: color
Z: manuf_year
For each bike model (bike_model):
1. There is a group of colors (color) and a group of manufacturing years (manuf_year).
2. The colors do not depend on the manufacturing year, and the manufacturing year
does not depend on the colors. They are independent.
3. The sets of color and manuf_year are linked only to bike_model.
That’s what makes it a multivalued dependency.
In this case these two columns are said to be multivalued dependent on bike_model.
These dependencies can be represented like this:
Read more about Multivalued Dependency in DBMS.
5. Transitive Functional Dependency
In transitive functional dependency, dependent is indirectly dependent on
determinant. i.e. If a → b & b → c, then according to axiom of transitivity, a → c.
This is a transitive functional dependency.
Example:
43
enrol_no name dept building_no
42 abc CO 4
43 pqr EC 2
44 xyz IT 1
45 abc EC 2
Here, enrol_no → dept and dept → building_no. Hence, according to the axiom of
transitivity, enrol_no → building_no is a valid functional dependency. This is an
indirect functional dependency, hence called Transitive functional dependency.
6. Fully Functional Dependency
In full functional dependency an attribute or a set of attributes uniquely determines
another attribute or set of attributes. If a relation R has attributes X, Y, Z with the
dependencies X->Y and X->Z which states that those dependencies are fully
functional.
Read more about Fully Functional Dependency.
7. Partial Functional Dependency
In partial functional dependency a non key attribute depends on a part of the
composite key, rather than the whole key. If a relation R has attributes X, Y, Z
where X and Y are the composite key and Z is non key attribute. Then X->Z is a
partial functional dependency in RBDMS.
In the above table, Courses has a multi-valued attribute, so it is not in 1NF. The
Below Table is in 1NF as there is no multi-valued attribute.
There are many courses having the same course fee. Here, COURSE_FEE cannot
alone decide the value of COURSE_NO or STUD_NO.
COURSE_FEE together with STUD_NO cannot decide the value of
COURSE_NO.
46
COURSE_FEE together with COURSE_NO cannot decide the value of
STUD_NO.
The candidate key for this table is {STUD_NO, COURSE_NO} because the
combination of these two columns uniquely identifies each row in the table.
COURSE_FEE is a non-prime attribute because it is not part of the candidate
key {STUD_NO, COURSE_NO}.
But, COURSE_NO -> COURSE_FEE, i.e., COURSE_FEE is dependent on
COURSE_NO, which is a proper subset of the candidate key.
Therefore, Non-prime attribute COURSE_FEE is dependent on a proper subset
of the candidate key, which is a partial dependency and so this relation is not in
2NF.
To convert the above relation to 2NF, we need to split the table into two tables such
as : Table 1: STUD_NO, COURSE_NO Table 2: COURSE_NO, COURSE_FEE.
48
Rules for BCNF
Rule 1: The table should be in the 3rd Normal Form.
Rule 2: X should be a super-key for every functional dependency (FD) X−>Y in a
given relation.
Note: To test whether a relation is in BCNF, we identify all the determinants and
make sure that they are candidate keys.
To determine the highest normal form of a given relation R with functional
dependencies, the first step is to check whether the BCNF condition holds. If R is
found to be in BCNF, it can be safely deduced that the relation is also
in 3NF, 2NF, and 1NF. The 1NF has the least restrictive constraint – it only requires a
relation R to have atomic values in each tuple. The 2NF has a slightly more
restrictive constraint.
The 3NF has a more restrictive constraint than the first two normal forms but is less
restrictive than the BCNF. In this manner, the restriction increases as we traverse
down the hierarchy.
We are going to discuss some basic examples which let you understand the
properties of BCNF. We will discuss multiple examples here.
Example 1
Consider a relation R with attributes (student, teacher, subject).
FD: { (student, Teacher) -> subject, (student, subject) -> Teacher, (Teacher) -> subject}
Candidate keys are (student, teacher) and (student, subject).
The above relation is in 3NF (since there is no transitive dependency). A relation
R is in BCNF if for every non-trivial FD X->Y, X must be a key.
The above relation is not in BCNF, because in the FD (teacher->subject), teacher
is not a key. This relation suffers with anomalies −
For example, if we delete the student Tahira , we will also lose the information
that N.Gupta teaches C. This issue occurs because the teacher is a determinant
but not a candidate key.
49
Stu_ID Stu_Branch Stu_Course Branch_Number Stu_Course_No
& Engineering
Electronics &
VLSI
102 Communication B_003 401
Technology
Engineering
Electronics &
Mobile
102 Communication B_003 402
Communication
Engineering
50
Stu_Course Branch_Number Stu_Course_No
101 201
101 202
102 401
102 402
52
In Lossless Decomposition, we select the common attribute and the criteria for
selecting a common attribute is that the common attribute must be a candidate key or
super key in either relation R1, R2, or both.
Decomposition of a relation R into R1 and R2 is a lossless-join decomposition if at
least one of the following functional dependencies is in F+ (Closure of functional
dependencies)
Example of Lossless Decomposition
— Employee (Employee_Id, Ename, Salary, Department_Id, Dname)
Can be decomposed using lossless decomposition as,
— Employee_desc (Employee_Id, Ename, Salary, Department_Id)
— Department_desc (Department_Id, Dname)
Alternatively the lossy decomposition would be as joining these tables is not
possible so not possible to get back original data.
– Employee_desc (Employee_Id, Ename, Salary)
– Department_desc (Department_Id, Dname)
R1 ∩ R2 → R1
OR
R1 ∩ R2 → R2
In a database management system (DBMS), a lossless decomposition is a process of
decomposing a relation schema into multiple relations in such a way that it preserves
the information contained in the original relation. Specifically, a lossless
decomposition is one in which the original relation can be reconstructed by joining
the decomposed relations.
To achieve lossless decomposition, a set of conditions known as Armstrong’s
axioms can be used. These conditions ensure that the decomposed relations will
retain all the information present in the original relation. Specifically, the two most
important axioms for lossless decomposition are the reflexivity and the
decomposition axiom.
The reflexivity axiom states that if a set of attributes is a subset of another set of
attributes, then the larger set of attributes can be inferred from the smaller set. The
decomposition axiom states that if a relation R can be decomposed into two relations
R1 and R2, then the original relation R can be reconstructed by taking the natural
join of R1 and R2.
There are several algorithms available for performing lossless decomposition in
DBMS, such as the BCNF (Boyce-Codd Normal Form) decomposition and the 3NF
(Third Normal Form) decomposition. These algorithms use a set of rules to
decompose a relation into multiple relations while ensuring that the original relation
can be reconstructed without any loss of information.
53
Multivalued Dependency:
In Database Management Systems (DBMS), multivalued dependency (MVD) deals
with complex attribute relationships in which an attribute may have many
independent values while yet depending on another attribute or group of attributes. It
improves database structure and consistency and is essential for data integrity and
database normalization.
MVD or multivalued dependency means that for a single value of attribute ‘a’
multiple values of attribute ‘b’ exist. We write it as,
a --> --> b
It is read as a is multi-valued dependent on b. Suppose a person named Geeks is
working on 2 projects Microsoft and Oracle and has 2 hobbies namely Reading and
Music. This can be expressed in a tabular format in the following way.
Example
Project and Hobby are multivalued attributes as they have more than one value for a
single person i.e., Geeks.
What is Multivalued Dependency?
When one attribute in a database depends on another attribute and has many
independent values, it is said to have multivalued dependency (MVD). It supports
maintaining data accuracy and managing intricate data interactions.
Multi Valued Dependency (MVD)
We can say that multivalued dependency exists if the following conditions are met.
Conditions for MVD
54
Any attribute say a multiple define another attribute b; if any legal relation r(R), for
all pairs of tuples t1 and t2 in r, such that,
t1[a] = t2[a]
Then there exists t3 and t4 in r such that.
t1[a] = t2[a] = t3[a] = t4[a]
t1[b] = t3[b]; t2[b] = t4[b]
t1 = t4; t2 = t3
Then multivalued (MVD) dependency exists. To check the MVD in given table, we
apply the conditions stated above and we check it with the values in the given table.
Example
55
Finding from table,
t1 = t4 = Reading
And
t2 = t3 = Music
So, condition 3 is Satisfied. All conditions are satisfied, therefore,
a --> --> b
According to table we have got,
name --> --> project
And for,
a --> --> C
We get,
name --> --> hobby
Hence, we know that MVD exists in the above table and it can be stated by,
name --> --> project
name --> --> hobby
S1 A
S2 B
Table R2
CID CNAME
sC1 C
C2 D
S1 A C1 C
57
SID SNAME CID CNAME
S1 A C2 D
S2 B C1 C
S2 B C2 D
Joint Dependency
Example:
Table R1
Company Product
C1 Pendrive
C1 mic
C2 speaker
C2 speaker
Company->->Product
Table R2
58
Agent Company
Aman C1
Aman C2
Mohan C1
Agent->->Company
Table R3
Agent Product
Aman Pendrive
Aman Mic
Aman speaker
Mohan speaker
Agent->->Product
Table R1⋈R2⋈R3
Company Product Agent
C1 Pendrive Aman
C1 mic Aman
C2 speaker speaker
C1 speaker Aman
Agent->->Product
Fifth Normal Form/Projected Normal Form (5NF)
A relation R is in Fifth Normal Form if and only if everyone joins dependency in R
is implied by the candidate keys of R. A relation decomposed into two relations
must have lossless join Property, which ensures that no spurious or extra tuples are
generated when relations are reunited through a natural join.
59
Properties
A relation R is in 5NF if and only if it satisfies the following conditions:
1. R should be already in 4NF.
2. It cannot be further non loss decomposed (join dependency).
Example – Consider the above schema, with a case as “if a company makes a product
and an agent is an agent for that company, then he always sells that product for the
company”. Under these circumstances, the ACP table is shown as:
Table ACP
Agent Company Product
A1 PQR Nut
A1 PQR Bolt
A1 XYZ Nut
A1 XYZ Bolt
A2 PQR Nut
The relation ACP is again decomposed into 3 relations. Now, the natural Join of all
three relations will be shown as:
Table R1
Agent Company
A1 PQR
A1 XYZ
A2 PQR
Table R2
Agent Product
A1 Nut
A1 Bolt
60
Agent Product
A2 Nut
Table R3
Company Product
PQR Nut
PQR Bolt
XYZ Nut
XYZ Bolt
The result of the Natural Join of R1 and R3 over ‘Company’ and then the Natural
Join of R13 and R2 over ‘Agent’and ‘Product’ will be Table ACP.
61