RELATIONAL DATABASE MANAGEMENT SYSTEM (RDBMS)
UNIT – II: RELATIONAL DATA MODEL
Relational Model was proposed by E.F. Codd to model data in the form of relations or
tables. After designing the conceptual model of Database using ER diagram, we need to convert
the conceptual model in the relational model which can be implemented using any RDBMS
languages like Oracle SQL, MySQL etc.
What is Relational Model?
Relational Model represents how data is stored in Relational Databases. A relational
database stores data in the form of relations (tables). Consider a relation STUDENT with
attributes ROLL_NO, NAME, ADDRESS, PHONE and AGE shown in Table 1.
STUDENT
ROLL_NO NAME ADDRESS PHONE AGE
1 RAM DELHI 9455123451 18
2 RAMESH GURGAON 9652431543 18
3 SUJIT ROHTAK 9156253131 20
4 SURESH DELHI 18
1. Relational Model Concepts
1. Attribute: Each column in a Table. Attributes are the properties which define a
relation. e.g., Student_Rollno, NAME,etc.
2. Tables – In the Relational model the, relations are saved in the table format. It is stored
along with its entities. A table has two properties rows and columns. Rows represent
records and columns represent attributes.
3. Tuple – It is nothing but a single row of a table, which contains a single record.
4. Relation Schema: A relation schema represents the name of the relation with its
attributes.
5. Degree: The total number of attributes which in the relation is called the degree of the
relation.
6. Cardinality: Total number of rows present in the Table.
7. Column: The column represents the set of values for a specific attribute.
8. Relation instance – Relation instance is a finite set of tuples in the RDBMS system.
Relation instances never have duplicate tuples.
9. Relation key – Every row has one, two or multiple attributes, which is called relation
key.
10. Attribute domain – Every attribute has some pre-defined value and scope which is
known as attribute domain.
Mr. R. Boaz Gladson, Assistant Professor, PG Department of Computer Science, Voorhees College-Vellore.
RELATIONAL DATABASE MANAGEMENT SYSTEM (RDBMS)
2. Relational Constraints
Relational Integrity constraints in DBMS are referred to conditions which must be present
for a valid relation. These Relational constraints in DBMS are derived from the rules in the mini-
world that the database represents.
There are many types of Integrity Constraints in DBMS. Constraints on the Relational
database management system is mostly divided into three main categories are:
1. Domain Constraints
2. Key Constraints
3. Referential Integrity Constraints
Domain Constraints
Domain constraints can be violated if an attribute value is not appearing in the
corresponding domain or it is not of the appropriate data type.
Domain constraints specify that within each tuple, and the value of each attribute must be
unique. This is specified as data types which include standard data types integers, real numbers,
characters, Booleans, variable length strings, etc.
Example:
Create DOMAIN CustomerName
CHECK (value not NULL)
Key Constraints
An attribute that can uniquely identify a tuple in a relation is called the key of the table.
The value of the attribute for different tuples in the relation has to be unique.
Example:
In the given table, CustomerID is a key attribute of Customer Table. It is most likely to
have a single key for one customer, CustomerID =1 is only for the CustomerName =” Google”
Mr. R. Boaz Gladson, Assistant Professor, PG Department of Computer Science, Voorhees College-Vellore.
RELATIONAL DATABASE MANAGEMENT SYSTEM (RDBMS)
.
CustomerID CustomerName Status
1 Google Active
2 Amazon Active
3 Apple Inactive
Referential Integrity Constraints
Referential Integrity constraints in DBMS are based on the concept of Foreign Keys. A
foreign key is an important attribute of a relation which should be referred to in other
relationships. Referential integrity constraint state happens where relation refers to a key attribute
of a different or same relation. However, that key element must exist in the table.
Example:
3. RELATIONAL LANGUAGE
Relational language is a type of programming language in which the
programming logic is composed of relations and the output is computed based on
the query applied. Relational language works on relations among data and
entities to compute a result.
3.1 Relational Algebra
Relational algebra is a procedural query language. It gives a step by step process to obtain
the result of the query. It uses operators to perform queries.
Types of Relational operation
Mr. R. Boaz Gladson, Assistant Professor, PG Department of Computer Science, Voorhees College-Vellore.
RELATIONAL DATABASE MANAGEMENT SYSTEM (RDBMS)
1. Select Operation:
o The select operation selects tuples that satisfy a given predicate.
o It is denoted by sigma (σ).
o Notation: σ p(r)
Where
σ is used for selection prediction
r is used for relation
p is used as a propositional logic formula which may use connectors like: AND OR and NOT.
These relational can use as relational operators like =, ≠, ≥, <, >, ≤.
2. Project Operation:
o This operation shows the list of those attributes that we wish to appear in the result. Rest of
the attributes are eliminated from the table.
o It is denoted by ∏.
o Notation: ∏ A1, A2, An (r)
Where
A1, A2, A3 is used as an attribute name of relation r.
3. Union Operation:
o Suppose there are two tuples R and S. The union operation contains all the tuples that are
either in R or S or both in R & S.
o It eliminates the duplicate tuples. It is denoted by ∪.
o Notation: R ∪ S
o R and S must have the attribute of the same number.
o Duplicate tuples are eliminated automatically.
4. Set Intersection:
o Suppose there are two tuples R and S. The set intersection operation contains all tuples that
are in both R & S.
o It is denoted by intersection ∩.
o Notation: R ∩ S
Mr. R. Boaz Gladson, Assistant Professor, PG Department of Computer Science, Voorhees College-Vellore.
RELATIONAL DATABASE MANAGEMENT SYSTEM (RDBMS)
5. Set Difference:
o Suppose there are two tuples R and S. The set intersection operation contains all tuples that
are in R but not in S.
o It is denoted by intersection minus (-).
o Notation: R - S
6. Cartesian product
o The Cartesian product is used to combine each row in one table with each row in the other
table. It is also known as a cross product.
o It is denoted by X.
o Notation: E X D
7. Rename Operation:
The rename operation is used to rename the output relation. It is denoted by rho (ρ).
Example: We can use the rename operator to rename STUDENT relation to STUDENT1.
Notation: ρ(STUDENT1, STUDENT)
3.2 Tuple Relational Calculus
There is an alternate way of formulating queries known as Relational Calculus. Relational
calculus is a non-procedural query language. In the non-procedural query language, the user is
concerned with the details of how to obtain the end results.
Types of Relational calculus
3.2.1. Tuple Relational Calculus (TRC)
It is a non-procedural query language which is based on finding a number of tuple
variables also known as range variable for which predicate holds true. It describes the desired
information without giving a specific procedure for obtaining that information. The tuple
relational calculus is specified to select the tuples in a relation.
Notation: A Query in the tuple relational calculus is expressed as following notation
Mr. R. Boaz Gladson, Assistant Professor, PG Department of Computer Science, Voorhees College-Vellore.
RELATIONAL DATABASE MANAGEMENT SYSTEM (RDBMS)
{T | P (T)} or {T | Condition (T)}
Where
T is the resulting tuples
P(T) is the condition used to fetch T.
3.2.2. Domain Relational Calculus (DRC)
The second form of relation is known as Domain relational calculus. In domain relational
calculus, filtering variable uses the domain of attributes. Domain relational calculus uses the same
operators as tuple calculus. It uses logical connectives ∧ (and), ∨ (or) and ┓ (not). It uses
Existential (∃) and Universal Quantifiers (∀) to bind the variable. The QBE or Query by example
is a query language related to domain relational calculus.
Notation:{ a1, a2, a3, ..., an | P (a1, a2, a3, ... ,an)}
Where
a1, a2 are attributes
P stands for formula built by inner attributes
4. SQL
o SQL stands for Structured Query Language. It is used for storing and managing data in
relational database management system (RDMS).
o It is a standard language for Relational Database System. It enables a user to create, read,
update and delete relational databases and tables.
o All the RDBMS like MySQL, Informix, Oracle, MS Access and SQL Server use SQL as
their standard database language.
o SQL allows users to query the database in a number of ways, using English-like
statements.
4.1 Basic Structure
o Structure query language is not case sensitive. Generally, keywords of SQL are
written in uppercase.
o Statements of SQL are dependent on text lines. We can use a single SQL statement
on one or multiple text line.
o Using the SQL statements, you can perform most of the actions in a database.
o SQL depends on tuple relational calculus and relational algebra.
4.2 Set Operations
SET operators are special type of operators which are used to combine the result of
two queries.
Operators covered under SET operators are:
Mr. R. Boaz Gladson, Assistant Professor, PG Department of Computer Science, Voorhees College-Vellore.
RELATIONAL DATABASE MANAGEMENT SYSTEM (RDBMS)
1. UNION
2. UNION ALL
3. INTERSECT
4. MINUS
There are certain rules which must be followed to perform operations using SET
operators in SQL. Rules are as follows:
1. The number and order of columns must be the same.
2. Data types must be compatible.
1. UNION:
o UNION will be used to combine the result of two select statements.
o Duplicate rows will be eliminated from the results obtained after performing the UNION
operation.
o Query: mysql> SELECT *FROM t_Boaz UNION SELECT *FROM t2_Boaz;
2. UNION ALL
o This operator combines all the records from both the queries.
o Duplicate rows will be not be eliminated from the results obtained after performing the
UNION ALL operation.
o Query: mysql> SELECT *FROM t_Boaz UNION ALL SELECT *FROM t2_Boaz;
3. INTERSECT:
o It is used to combine two SELECT statements, but it only returns the records which are
common from both SELECT statements.
Mr. R. Boaz Gladson, Assistant Professor, PG Department of Computer Science, Voorhees College-Vellore.
RELATIONAL DATABASE MANAGEMENT SYSTEM (RDBMS)
o Query:
mysql> SELECT *FROM t_Boaz INTERSECT SELECT *FROM t2_Boaz;
4. MINUS
o It displays the rows which are present in the first query but absent in the second query with
no duplicates.
o Query: mysql> SELECT *FROM t_Boaz MINUS SELECT *FROM t2_Boaz;
4.3 SQL Aggregate Functions
o SQL aggregation function is used to perform the calculations on multiple rows of a single
column of a table. It returns a single value.
o It is also used to summarize the data.
Types of SQL Aggregation Function
1. COUNT Function
o COUNT function is used to Count the number of rows in a database table. It can work on
both numeric and non-numeric data types.
o COUNT function uses the COUNT(*) that returns the count of all the rows in a specified
table. COUNT(*) considers duplicate and Null.
Syntax
COUNT(*)
or
COUNT( [ALL|DISTINCT] expression )
2. SUM Function
Sum function is used to calculate the sum of all selected columns. It works on numeric
fields only.
Syntax
Mr. R. Boaz Gladson, Assistant Professor, PG Department of Computer Science, Voorhees College-Vellore.
RELATIONAL DATABASE MANAGEMENT SYSTEM (RDBMS)
SUM()
or
SUM( [ALL|DISTINCT] expression )
3. AVG function
The AVG function is used to calculate the average value of the numeric type. AVG
function returns the average of all non-Null values.
Syntax
AVG()
or
AVG( [ALL|DISTINCT] expression )
4. MAX Function
MAX function is used to find the maximum value of a certain column. This function
determines the largest value of all selected values of a column.
Syntax
MAX()
or
MAX( [ALL|DISTINCT] expression)
5. MIN Function
MIN function is used to find the minimum value of a certain column. This function
determines the smallest value of all selected values of a column.
Syntax
MIN()
or
MIN( [ALL|DISTINCT] expression)
4.4 Null Value
SQL supports a special value known as NULL which is used to represent the values of
attributes that may be unknown or not apply to a tuple. SQL places a NULL value in the field
in the absence of a user-defined value.
When a NULL is involved in a comparison operation, the result is considered to be
UNKNOWN.
SQL uses a three-valued logic with values True, False, and Unknown.
4.5 Complex view in SQL
1. Complex view is view which uses multiple data together and create the snapshot of
the data.
2. Relation between table : The relation between multiple table is must to create the
complex views.
Mr. R. Boaz Gladson, Assistant Professor, PG Department of Computer Science, Voorhees College-Vellore.
RELATIONAL DATABASE MANAGEMENT SYSTEM (RDBMS)
3. Complex view is nothing but the view which has been created with multiple joins,
group by statements or set operators to fetch the complex data from multiple tables.
4. The complex views are used to fetch the complex operations to fetch the complex
data from multiple table.
Example:
If there are two tables Customer table and Items table
[Link] :-Customer_name, Customer_num, Customer_code columns
[Link]:-Customer_code,Item_code,Item_name,Item_category columns
To create view to show the associated Items to Customer.
Create view V_Customer
as Select e.Customer_name,d.Item_name
from Customer e,Item d
where e.Customer_code=d.customer_code
Group by item_category;
4.6 Modification of Database
There are 3 modification statements:
INSERT Statement -- add rows to tables.
UPDATE Statement -- modify columns in table rows.
DELETE Statement -- remove rows from tables.
INSERT Statement
The INSERT Statement adds one or more rows to a table. It has two formats:
INSERT INTO table-1 [(column-list)] VALUES (value-list) and, INSERT INTO table-1 [(column-list)]
INSERT Examples: INSERT INTO p (pno, color) VALUES ('P4', 'Brown')
Before After
pno descr color pno descr Color
P1 Widget Blue P1 Widget Blue
P2 Widget Red => P2 Widget Red
P3 Dongle Green P3 Dongle Green
P4 NULL Brown
INSERT INTO sp
SELECT [Link], [Link], 500
FROM s, p
WHERE [Link]='Green' AND [Link]='London'
Before After
sno pno qty sno pno qty
S1 P1 NULL S1 P1 NULL
=>
S2 P1 200 S2 P1 200
S3 P1 1000 S3 P1 1000
Mr. R. Boaz Gladson, Assistant Professor, PG Department of Computer Science, Voorhees College-Vellore.
RELATIONAL DATABASE MANAGEMENT SYSTEM (RDBMS)
S3 P2 200 S3 P2 200
S2 P3 500
UPDATE Statement
The UPDATE statement modifies columns in selected table rows. It has the following general
format:
UPDATE table-1 SET set-list [WHERE predicate]
UPDATE Examples
UPDATE sp SET qty = qty + 20
Before After
sno pno qty sno pno Qty
S1 P1 NULL S1 P1 NULL
S2 P1 200 => S2 P1 220
S3 P1 1000 S3 P1 1020
S3 P2 200 S3 P2 220
UPDATE s
SET name = 'Tony', city = 'Milan'
WHERE sno = 'S3'
Before After
sno name city sno name city
S1 Pierre Paris S1 Pierre Paris
=>
S2 John London S2 John London
S3 Mario Rome S3 Tony Milan
DELETE Statement
The DELETE Statement removes selected rows from a table. It has the following general format:
DELETE FROM table-1 [WHERE predicate]
DELETE Examples: DELETE FROM sp WHERE pno = 'P1'
Before After
sno pno qty sno pno qty
S1 P1 NULL S3 P2 200
S2 P1 200 =>
S3 P1 1000
S3 P2 200
DELETE FROM p WHERE pno NOT IN (SELECT pno FROM sp)
Before After
pno descr color pno descr color
P1 Widget Blue => P1 Widget Blue
P2 Widget Red
Mr. R. Boaz Gladson, Assistant Professor, PG Department of Computer Science, Voorhees College-Vellore.
RELATIONAL DATABASE MANAGEMENT SYSTEM (RDBMS)
P3 Dongle Green P2 Widget Red
4.7 Joined Relations:
In SQL, JOIN clause is used to combine the records from “two or more tables in a database”.
Types of SQL JOIN
1. INNER JOIN
2. LEFT JOIN
3. RIGHT JOIN
4. FULL JOIN
Sample Table
EMPLOYEE
EMP_ID EMP_NAME CITY SALARY AGE
1 Angelina Chicago 200000 30
2 Robert Austin 300000 26
3 Christian Denver 100000 42
4 Kristen Washington 500000 29
5 Russell Los angels 200000 36
6 Marry Canada 600000 48
PROJECT_NO EMP_ID DEPARTMENT
101 1 Testing
102 2 Development
103 3 Designing
104 4 Development
1. INNER JOIN
In SQL, INNER JOIN selects records that have matching values in both tables as long as
the condition is satisfied. It returns the combination of all rows from both the tables where the
condition satisfies.
Syntax
Mr. R. Boaz Gladson, Assistant Professor, PG Department of Computer Science, Voorhees College-Vellore.
RELATIONAL DATABASE MANAGEMENT SYSTEM (RDBMS)
SELECT table1.column1, table1.column2, table2.column1,....
FROM table1
INNER JOIN table2
ON table1.matching_column = table2.matching_column;
Query
SELECT EMPLOYEE.EMP_NAME, [Link]
FROM EMPLOYEE
INNER JOIN PROJECT
ON PROJECT.EMP_ID = EMPLOYEE.EMP_ID;
Output
EMP_NAME DEPARTMENT
Angelina Testing
Robert Development
Christian Designing
Kristen Development
2. LEFT JOIN
The SQL left join returns all the values from left table and the matching values from the
right table. If there is no matching join value, it will return NULL.
Syntax
SELECT table1.column1, table1.column2, table2.column1,....
FROM table1
LEFT JOIN table2
ON table1.matching_column = table2.matching_column;
Query
SELECT EMPLOYEE.EMP_NAME, [Link]
FROM EMPLOYEE
LEFT JOIN PROJECT
ON PROJECT.EMP_ID = EMPLOYEE.EMP_ID;
Output
EMP_NAME DEPARTMENT
Mr. R. Boaz Gladson, Assistant Professor, PG Department of Computer Science, Voorhees College-Vellore.
RELATIONAL DATABASE MANAGEMENT SYSTEM (RDBMS)
Angelina Testing
Robert Development
Christian Designing
Kristen Development
Russell NULL
Marry NULL
3. RIGHT JOIN
In SQL, RIGHT JOIN returns all the values from the values from the rows of right table
and the matched values from the left table. If there is no matching in both tables, it will return
NULL.
Syntax
SELECT table1.column1, table1.column2, table2.column1,....
FROM table1
RIGHT JOIN table2
ON table1.matching_column = table2.matching_column;
Query
SELECT EMPLOYEE.EMP_NAME, [Link]
FROM EMPLOYEE
RIGHT JOIN PROJECT
ON PROJECT.EMP_ID = EMPLOYEE.EMP_ID;
Output
EMP_NAME DEPARTMENT
Angelina Testing
Robert Development
Christian Designing
Kristen Development
4. FULL JOIN
In SQL, FULL JOIN is the result of a combination of both left and right outer join. Join
tables have all the records from both tables. It puts NULL on the place of matches not found.
Syntax
Mr. R. Boaz Gladson, Assistant Professor, PG Department of Computer Science, Voorhees College-Vellore.
RELATIONAL DATABASE MANAGEMENT SYSTEM (RDBMS)
SELECT table1.column1, table1.column2, table2.column1,....
FROM table1
FULL JOIN table2
ON table1.matching_column = table2.matching_column;
Query
SELECT EMPLOYEE.EMP_NAME, [Link]
FROM EMPLOYEE
FULL JOIN PROJECT
ON PROJECT.EMP_ID = EMPLOYEE.EMP_ID;
Output
EMP_NAME DEPARTMENT
Angelina Testing
Robert Development
Christian Designing
Kristen Development
Russell NULL
Marry NULL
4.8 DDL Commands in SQL
DDL is an abbreviation of Data Definition Language.
The DDL Commands in Structured Query Language are used to create and modify the
schema of the database and its objects. The syntax of DDL commands is predefined for describing
the data.
5 DDL commands in SQL:
1. CREATE Command
2. DROP Command
3. ALTER Command
4. TRUNCATE Command
5. RENAME Command
1. CREATE Command
CREATE is a DDL command used to create databases, tables, triggers and other database objects.
Syntax to Create a Database: CREATE Database Database_Name;
Mr. R. Boaz Gladson, Assistant Professor, PG Department of Computer Science, Voorhees College-Vellore.
RELATIONAL DATABASE MANAGEMENT SYSTEM (RDBMS)
Syntax to create a new table:
CREATE TABLE table_name
(
column_Name1 data_type ( size of the column ) ,
column_Name2 data_type ( size of the column) ,
column_Name3 data_type ( size of the column) ,
...
column_NameN data_type ( size of the column )
);
2. DROP Command
DROP is a DDL command used to delete/remove the database objects from the SQL database.
We can easily remove the entire table, view, or index from the database using this DDL command.
Syntax to remove a database: DROP DATABASE Database_Name;
Syntax to remove a table: DROP TABLE Table_Name;
3. ALTER Command
ALTER is a DDL command which changes or modifies the existing structure of the database,
and it also changes the schema of database objects.
Syntax to add a newfield in the table:
ALTER TABLE name_of_table ADD column_name column_definition;
Syntax to remove a column from the table:
ALTER TABLE name_of_table DROP Column_Name_1 , column_Name_2 , ….., column_Name_N;
4. TRUNCATE Command
TRUNCATE is another DDL command which deletes or removes all the records from the table.
Syntax of TRUNCATE command: TRUNCATE TABLE Table_Name;
5. RENAME Command
RENAME is a DDL command which is used to change the name of the database table.
Syntax of RENAME command: RENAME TABLE Old_Table_Name TO New_Table_Name;
4.8 Embedded SQL:
It is the language that we use to perform operations and transactions on the databases.
Advantages of Embedded SQL
Some of the advantages of using SQL embedded in high-level languages are as follows:
Helps to access databases from anywhere.
Mr. R. Boaz Gladson, Assistant Professor, PG Department of Computer Science, Voorhees College-Vellore.
RELATIONAL DATABASE MANAGEMENT SYSTEM (RDBMS)
Allows integrating authentication service for large scale applications.
Provides extra security to database transactions.
Avoids logical errors while performing transactions on our database.
Makes it easy to integrate the frontend and the backend of our application.
4.9 Dynamic SQL:
Dynamic SQL is the process that we follow for programming SQL queries in such a way that
the queries are built dynamically with the application operations.
It helps us to manage big industrial applications and manage the transactions without any
added overhead.
If a query compiles successfully it implies that the syntax is correct.
If a query compiles successfully it verifies that all the permissions and validations are correct.
4.10 Other SQL Functions
SQL has many built-in functions for performing calculations on data.
SQL Aggregate Functions
SQL aggregate functions return a single value, calculated from values in a column.
Useful aggregate functions:
AVG() - Returns the average value
COUNT() - Returns the number of rows
FIRST() - Returns the first value
LAST() - Returns the last value
MAX() - Returns the largest value
MIN() - Returns the smallest value
SUM() - Returns the sum
SQL Scalar functions
SQL scalar functions return a single value, based on the input value.
Useful scalar functions:
UCASE() - Converts a field to upper case
LCASE() - Converts a field to lower case
MID() - Extract characters from a text field
LEN() - Returns the length of a text field
ROUND() - Rounds a numeric field to the number of decimals specified
NOW() - Returns the current system date and time
FORMAT() - Formats how a field is to be displayed.
4.11 Integrity and Security
1. Database Integrity
Data integrity in the database is the correctness, consistency and completeness of data. Data
integrity is enforced using the following three integrity constraints:
Entity Integrity - This is related to the concept of primary keys. All tables should have
their own primary keys which should uniquely identify a row and not be NULL.
Mr. R. Boaz Gladson, Assistant Professor, PG Department of Computer Science, Voorhees College-Vellore.
RELATIONAL DATABASE MANAGEMENT SYSTEM (RDBMS)
Referential Integrity -This is related to the concept of foreign keys. A foreign key is a key
of a relation that is referred in another relation.
Domain Integrity -This means that there should be a defined domain for all the columns in
a database.
1. Database Security
Database security has many different layers, but the key aspects are:
Authentication
User authentication is to make sure that the person accessing the database is who he claims
to be. Authentication can be done at the operating system level or even the database level itself.
Many authentication systems such as retina scanners or bio-metrics are used to make sure
unauthorized people cannot access the database.
Authorization
Authorization is a privilege provided by the Database Administer. Users of the database
can only view the contents they are authorized to view. The rest of the database is out of bounds
to them.
The different permissions for authorizations available are:
Primary Permission - This is granted to users publicly and directly.
Secondary Permission - This is granted to groups and automatically awarded to a user if
he is a member of the group.
Public Permission - This is publicly granted to all the users.
Context sensitive permission - This is related to sensitive content and only granted to a
select users.
Mr. R. Boaz Gladson, Assistant Professor, PG Department of Computer Science, Voorhees College-Vellore.
RELATIONAL DATABASE MANAGEMENT SYSTEM (RDBMS)
UNIT – III: DATA NORMALIZATION
Normalization
A large database defined as a single relation may result in data duplication.
o Normalization is the process of organizing the data in the database.
o Normalization is used to minimize the redundancy from a relation or set of relations. It is
also used to eliminate undesirable characteristics like Insertion, Update, and Deletion
Anomalies.
o Normalization divides the larger table into smaller and links them using relationships.
o The normal form is used to reduce redundancy from the database table.
1. Pitfalls in Relational Database Design
Pitfalls in Relational database Design Relational database design requires that we find a
“good” collection of relational schemas. A bad design may lead to
Repetition of information
Inability to represent certain information
Design Goals for Relational Database
Avoid redundant data
Ensure that relationships among attributes are represented
Facilitate the checking of updates for violation of database integrity constraints
Example
Consider the relational schema Lending-schema = (branch-name, branch-city, assets, customer-
name, loan- number, amount)
Redundancy
Data for branch name, branch city, assets are repeated for each loan that a branch makes.
Wastes space and complicates updating
Null Values
cannot store information about a branch if no loan exists
can use null values, but they are difficult to handle.
In the given example the database design is faulty which makes the above pitfalls in database. if
the design is not good then there will be faults in databases.
2. Decomposition
o When a relation in the relational model is not in appropriate normal form then the
decomposition of a relation is required.
o In a database, it breaks the table into multiple tables.
o If the relation has no proper decomposition, then it may lead to problems like loss
of information.
o Decomposition is used to eliminate some of the problems of bad design like
anomalies, inconsistencies, and redundancy.
2.1 Types of Decomposition
Mr. R. Boaz Gladson, Assistant Professor, PG Department of Computer Science, Voorhees College-Vellore.
RELATIONAL DATABASE MANAGEMENT SYSTEM (RDBMS)
2.2 Lossless Decomposition
o If the information is not lost from the relation that is decomposed, then the decomposition
will be lossless.
o The lossless decomposition guarantees that the join of relations will result in the same
relation as it was decomposed.
o The relation is said to be lossless decomposition if natural joins of all the decomposition
give the original relation.
2.3 Dependency Preserving
o It is an important constraint of the database.
o In the dependency preservation, at least one decomposed table must satisfy every
dependency.
o If a relation R is decomposed into relation R1 and R2, then the dependencies of R either
must be a part of R1 or R2 or must be derivable from the combination of functional
dependencies of R1 and R2.
o For example, suppose there is a relation R (A, B, C, D) with functional dependency set (A-
>BC). The relational R is decomposed into R1(ABC) and R2(AD) which is dependency
preserving because FD A->BC is a part of relation R1(ABC).
3. Functional Dependencies
The functional dependency is a relationship that exists between two attributes. It
typically exists between the primary key and non-key attribute within a table.
X → Y
The left side of FD is known as a determinant, the right side of the production is
known as a dependent.
Mr. R. Boaz Gladson, Assistant Professor, PG Department of Computer Science, Voorhees College-Vellore.
RELATIONAL DATABASE MANAGEMENT SYSTEM (RDBMS)
Functional dependency can be written as: Emp_Id → Emp_Name
We can say that Emp_Name is functionally dependent on Emp_Id.
Types of Functional dependency
3.1. Trivial functional dependency
o A → B has trivial functional dependency if B is a subset of A.
o The following dependencies are also trivial like: A → A, B → B
Example:
Consider a table with two columns Employee_Id and Employee_Name.
{Employee_id, Employee_Name} → Employee_Id is a trivial functional dependency as
Employee_Id is a subset of {Employee_Id, Employee_Name}.
Also, Employee_Id → Employee_Id and Employee_Name → Employee_Name are trivial depend
encies too.
3.2. Non-trivial functional dependency
o A → B has a non-trivial functional dependency if B is not a subset of A.
o When A intersection B is NULL, then A → B is called as complete non-trivial.
Example:
ID → Name,
Name → DOB
4. Normalization
A large database defined as a single relation may result in data duplication. This repetition of
data may result in:
o Making relations very large.
o It isn't easy to maintain and update data as it would involve searching many records in
relation.
Mr. R. Boaz Gladson, Assistant Professor, PG Department of Computer Science, Voorhees College-Vellore.
RELATIONAL DATABASE MANAGEMENT SYSTEM (RDBMS)
o Wastage and poor utilization of disk space and resources.
o The likelihood of errors and inconsistencies increases.
Data modification anomalies can be categorized into three types:
o Insertion Anomaly: Insertion Anomaly refers to when one cannot insert a new tuple into
a relationship due to lack of data.
o Deletion Anomaly: The delete anomaly refers to the situation where the deletion of data
results in the unintended loss of some other important data.
o Updatation Anomaly: The update anomaly is when an update of a single data value
requires multiple rows of data to be updated.
Types of Normal Forms:
Normalization works through a series of stages called Normal forms. The normal forms
apply to individual relations. The relation is said to be in particular normal form if it satisfies
constraints.
Normal Description
Form
1NF A relation is in 1NF if it contains an atomic value.
2NF A relation will be in 2NF if it is in 1NF and all non-key attributes are fully functional
dependent on the primary key.
3NF A relation will be in 3NF if it is in 2NF and no transition dependency exists.
BCNF A stronger definition of 3NF is known as Boyce Codd's normal form.
Mr. R. Boaz Gladson, Assistant Professor, PG Department of Computer Science, Voorhees College-Vellore.
RELATIONAL DATABASE MANAGEMENT SYSTEM (RDBMS)
4NF A relation will be in 4NF if it is in Boyce Codd's normal form and has no multi-
valued dependency.
5NF A relation is in 5NF. If it is in 4NF and does not contain any join dependency, joining
should be lossless.
4.1 First Normal Form (1NF)
o A relation will be 1NF if it contains an atomic value.
o It states that an attribute of a table cannot hold multiple values. It must hold only single-
valued attribute.
o First normal form disallows the multi-valued attribute, composite attribute, and their
combinations.
o Example: Relation EMPLOYEE is not in 1NF because of multi-valued attribute
EMP_PHONE.
EMPLOYEE table:
EMP_ID EMP_NAME EMP_PHONE EMP_STATE
14 John 7272826385, UP
9064738238
20 Harry 8574783832 Bihar
12 Sam 7390372389, Punjab
8589830302
o The decomposition of the EMPLOYEE table into 1NF has been shown below:
EMP_ID EMP_NAME EMP_PHONE EMP_STATE
14 John 7272826385 UP
14 John 9064738238 UP
20 Harry 8574783832 Bihar
12 Sam 7390372389 Punjab
12 Sam 8589830302 Punjab
4.2 Second Normal Form (2NF)
o In the 2NF, relational must be in 1NF.
Mr. R. Boaz Gladson, Assistant Professor, PG Department of Computer Science, Voorhees College-Vellore.
RELATIONAL DATABASE MANAGEMENT SYSTEM (RDBMS)
o In the second normal form, all non-key attributes are fully functional dependent on the
primary key
Example: Let's assume, a school can store the data of teachers and the subjects they teach. In a
school, a teacher can teach more than one subject.
TEACHER table
TEACHER_ID SUBJECT TEACHER_AGE
25 Chemistry 30
25 Biology 30
47 English 35
83 Math 38
83 Computer 38
In the given table, non-prime attribute TEACHER_AGE is dependent on TEACHER_ID
which is a proper subset of a candidate key. That's why it violates the rule for 2NF.
To convert the given table into 2NF, we decompose it into two tables:
TEACHER_DETAIL table:
TEACHER_ID TEACHER_AGE
25 30
47 35
83 38
TEACHER_SUBJECT table:
TEACHER_ID SUBJECT
25 Chemistry
25 Biology
47 English
83 Math
Mr. R. Boaz Gladson, Assistant Professor, PG Department of Computer Science, Voorhees College-Vellore.
RELATIONAL DATABASE MANAGEMENT SYSTEM (RDBMS)
83 Computer
4.3 Third Normal Form (3NF)
o A relation will be in 3NF if it is in 2NF and not contain any transitive partial dependency.
o 3NF is used to reduce the data duplication. It is also used to achieve the data integrity.
o If there is no transitive dependency for non-prime attributes, then the relation must be in
third normal form.
A relation is in third normal form if it holds atleast one of the following conditions for every non-
trivial function dependency X → Y.
1. X is a super key.
2. Y is a prime attribute, i.e., each element of Y is part of some candidate key.
Example:
EMPLOYEE_DETAIL table:
EMP_ID EMP_NAME EMP_ZIP EMP_STATE EMP_CITY
222 Harry 201010 UP Noida
333 Stephan 02228 US Boston
444 Lan 60007 US Chicago
555 Katharine 06389 UK Norwich
666 John 462007 MP Bhopal
Super key in the table above:
{EMP_ID}, {EMP_ID, EMP_NAME}, {EMP_ID, EMP_NAME, EMP_ZIP}....so on
Candidate key: {EMP_ID}
Non-prime attributes: In the given table, all attributes except EMP_ID are non-prime.
Here, EMP_STATE & EMP_CITY dependent on EMP_ZIP and EMP_ZIP
dependent on EMP_ID. The non-prime attributes (EMP_STATE, EMP_CITY) transitively
dependent on super key(EMP_ID). It violates the rule of third normal form.
That's why we need to move the EMP_CITY and EMP_STATE to the new
<EMPLOYEE_ZIP> table, with EMP_ZIP as a Primary key.
EMPLOYEE table:
Mr. R. Boaz Gladson, Assistant Professor, PG Department of Computer Science, Voorhees College-Vellore.
RELATIONAL DATABASE MANAGEMENT SYSTEM (RDBMS)
EMP_ID EMP_NAME EMP_ZIP
222 Harry 201010
333 Stephan 02228
444 Lan 60007
555 Katharine 06389
666 John 462007
EMPLOYEE_ZIP table:
EMP_ZIP EMP_STATE EMP_CITY
201010 UP Noida
02228 US Boston
60007 US Chicago
06389 UK Norwich
462007 MP Bhopal
4.4 Boyce Codd normal form (BCNF)
o BCNF is the advance version of 3NF. It is stricter than 3NF.
o A table is in BCNF if every functional dependency X → Y, X is the super key of the table.
o For BCNF, the table should be in 3NF, and for every FD, LHS is super key.
Example: Let's assume there is a company where employees work in more than one department.
EMPLOYEE table:
EMP_ID EMP_COUNTRY EMP_DEPT DEPT_TYPE EMP_DEPT_NO
264 India Designing D394 283
264 India Testing D394 300
364 UK Stores D283 232
364 UK Developing D283 549
In the above table Functional dependencies are as follows:
Mr. R. Boaz Gladson, Assistant Professor, PG Department of Computer Science, Voorhees College-Vellore.
RELATIONAL DATABASE MANAGEMENT SYSTEM (RDBMS)
EMP_ID → EMP_COUNTRY
EMP_DEPT → {DEPT_TYPE, EMP_DEPT_NO}
Candidate key: {EMP-ID, EMP-DEPT}
The table is not in BCNF because neither EMP_DEPT nor EMP_ID alone are keys.
To convert the given table into BCNF, we decompose it into three tables:
EMP_COUNTRY table:
EMP_ID EMP_COUNTRY
264 India
264 India
EMP_DEPT table:
EMP_DEPT DEPT_TYPE EMP_DEPT_NO
Designing D394 283
Testing D394 300
Stores D283 232
Developing D283 549
EMP_DEPT_MAPPING table:
EMP_ID EMP_DEPT
D394 283
D394 300
D283 232
D283 549
Functional dependencies:
EMP_ID → EMP_COUNTRY
EMP_DEPT → {DEPT_TYPE, EMP_DEPT_NO}
Candidate keys:
Mr. R. Boaz Gladson, Assistant Professor, PG Department of Computer Science, Voorhees College-Vellore.
RELATIONAL DATABASE MANAGEMENT SYSTEM (RDBMS)
For the first table: EMP_ID
For the second table: EMP_DEPT
For the third table: {EMP_ID, EMP_DEPT}
4.5 Fourth normal form (4NF)
o A relation will be in 4NF if it is in Boyce Codd normal form and has no multi-valued
dependency.
o For a dependency A → B, if for a single value of A, multiple values of B exists, then the
relation will be a multi-valued dependency.
Example
STUDENT
STU_ID COURSE HOBBY
21 Computer Dancing
21 Math Singing
34 Chemistry Dancing
74 Biology Cricket
59 Physics Hockey
The given STUDENT table is in 3NF, but the COURSE and HOBBY are two independent
entity. Hence, there is no relationship between COURSE and HOBBY.
In the STUDENT relation, a student with STU_ID, 21 contains two
courses, Computer and Math and two hobbies, Dancing and Singing. So there is a Multi-valued
dependency on STU_ID, which leads to unnecessary repetition of data.
So to make the above table into 4NF, we can decompose it into two tables:
STUDENT_COURSE
STU_ID COURSE
21 Computer
21 Math
34 Chemistry
74 Biology
59 Physics
Mr. R. Boaz Gladson, Assistant Professor, PG Department of Computer Science, Voorhees College-Vellore.
RELATIONAL DATABASE MANAGEMENT SYSTEM (RDBMS)
STUDENT_HOBBY
STU_ID HOBBY
21 Dancing
21 Singing
34 Dancing
74 Cricket
59 Hockey
4.6 Fifth normal form (5NF)
o A relation is in 5NF if it is in 4NF and not contains any join dependency and joining
should be lossless.
o 5NF is satisfied when all the tables are broken into as many tables as possible in order to
avoid redundancy.
o 5NF is also known as Project-join normal form (PJ/NF).
Example
SUBJECT LECTURER SEMESTER
Computer Boaz Semester 1
Computer Mercy Semester 1
Math Mercy Semester 1
Math Akash Semester 2
Chemistry Praveen Semester 1
In the above table, John takes both Computer and Math class for Semester 1 but he doesn't
take Math class for Semester 2. In this case, combination of all these fields required to identify a
valid data.
Suppose we add a new Semester as Semester 3 but do not know about the subject and who
will be taking that subject so we leave Lecturer and Subject as NULL. But all three columns
together acts as a primary key, so we can't leave other two columns blank.
So to make the above table into 5NF, we can decompose it into three relations P1, P2 &
P3:
Mr. R. Boaz Gladson, Assistant Professor, PG Department of Computer Science, Voorhees College-Vellore.
RELATIONAL DATABASE MANAGEMENT SYSTEM (RDBMS)
P1
SEMESTER SUBJECT
Semester 1 Computer
Semester 1 Math
Semester 1 Chemistry
Semester 2 Math
P2
SUBJECT LECTURER
Computer Boaz
Computer Mercy
Math Mercy
Math Akash
Chemistry Praveen
P3
SEMSTER LECTURER
Semester 1 Boaz
Semester 1 Mercy
Semester 1 Mercy
Semester 2 Akash
Semester 1 Praveen
Mr. R. Boaz Gladson, Assistant Professor, PG Department of Computer Science, Voorhees College-Vellore.
RELATIONAL DATABASE MANAGEMENT SYSTEM (RDBMS)
UNIT – IV: STORAGE AND FILE ORGANIZATION
Storage System in DBMS
A database system provides an ultimate view of the stored data. However, data in the form
of bits, bytes get stored in different storage devices.
1. Disks
Magnetic Disk Storage: This type of storage media is also known as online storage media. A
magnetic disk is used for storing the data for a long time. It is capable of storing an entire
database. It is the responsibility of the computer system to make availability of the data from a
disk to the main memory for further accessing. Also, if the system performs any operation over
the data, the modified data should be written back to the disk. The tremendous capability of a
magnetic disk is that it does not affect the data due to a system crash or failure, but a disk failure
can easily ruin as well as destroy the stored data.
2. RAID
RAID or Redundant Array of Independent Disks, is a technology to connect multiple
secondary storage devices and use them as a single storage media.
RAID consists of an array of disks in which multiple disks are connected together to
achieve different goals. RAID levels define the use of disk arrays.
RAID 0
In this level, a striped array of disks is implemented. The data is broken down into blocks
and the blocks are distributed among disks. Each disk receives a block of data to write/read in
parallel. It enhances the speed and performance of the storage device. There is no parity and
backup in Level 0.
RAID 1
RAID 1 uses mirroring techniques. When data is sent to a RAID controller, it sends a copy
of data to all the disks in the array. RAID level 1 is also called mirroring and provides 100%
redundancy in case of a failure.
RAID 2
RAID 2 records Error Correction Code using Hamming distance for its data, striped on
different disks. Like level 0, each data bit in a word is recorded on a separate disk and ECC codes
Mr. R. Boaz Gladson, Assistant Professor, PG Department of Computer Science, Voorhees College-Vellore.
RELATIONAL DATABASE MANAGEMENT SYSTEM (RDBMS)
of the data words are stored on a different set disks. Due to its complex structure and high cost,
RAID 2 is not commercially available.
RAID 3
RAID 3 stripes the data onto multiple disks. The parity bit generated for data word is
stored on a different disk. This technique makes it to overcome single disk failures.
RAID 4
In this level, an entire block of data is written onto data disks and then the parity is
generated and stored on a different disk. Note that level 3 uses byte-level striping, whereas level 4
uses block-level striping. Both level 3 and level 4 require at least three disks to implement RAID.
RAID 5
RAID 5 writes whole data blocks onto different disks, but the parity bits generated for data
block stripe are distributed among all the data disks rather than storing them on a different
dedicated disk.
RAID 6
RAID 6 is an extension of level 5. In this level, two independent parities are generated and
stored in distributed fashion among multiple disks. Two parities provide additional fault tolerance.
This level requires at least four disk drives to implement RAID.
Mr. R. Boaz Gladson, Assistant Professor, PG Department of Computer Science, Voorhees College-Vellore.
RELATIONAL DATABASE MANAGEMENT SYSTEM (RDBMS)
3. Tertiary Storage
It is the storage type that is external from the computer system. It has the slowest speed. But it
is capable of storing a large amount of data. It is also known as Offline storage. Tertiary storage is
generally used for data backup. There are following tertiary storage devices available:
o Optical Storage: An optical storage can store megabytes or gigabytes of data. A Compact
Disk (CD) can store 700 megabytes of data with a playtime of around 80 minutes. On the
other hand, a Digital Video Disk or a DVD can store 4.7 or 8.5 gigabytes of data on each
side of the disk.
o Tape Storage: It is the cheapest storage medium than disks. Generally, tapes are used for
archiving or backing up the data. It provides slow access to data as it accesses data
sequentially from the start. Thus, tape storage is also known as sequential-access storage.
Disk storage is known as direct-access storage as we can directly access the data from any
location on disk.
4. Storage Access
These storage media are organized on the basis of data accessing speed, cost per unit of
data to buy the medium, and by medium's reliability. Thus, we can create a hierarchy of storage
media on the basis of its cost and speed.
The higher levels are expensive but fast. On moving down, the cost per bit is decreasing,
and the access time is increasing. Also, the storage media from the main memory to up
represents the volatile nature, and below the main memory, all are non-volatile devices.
Mr. R. Boaz Gladson, Assistant Professor, PG Department of Computer Science, Voorhees College-Vellore.
RELATIONAL DATABASE MANAGEMENT SYSTEM (RDBMS)
1. Cache Memory and Main memory
Cache memory and main memory are at the top level in the memory hierarchy which are
responsible for fast execution.
Example: RAM, ROM etc.
2. Secondary memory
Secondary memory or storage is used to store data in computer system. The secondary storage
is relatively slower than cache or main memory.
Example: Magnetic tape, hard disk, CD, DVD etc.
3. Memory Hierarchy
A computer system has a well-defined hierarchy of memory. A CPU has direct access to it main
memory as well as its inbuilt registers. The access time of the main memory is obviously less than
the CPU speed. To minimize this speed mismatch, cache memory is introduced. Cache memory
provides the fastest access time and it contains data that is most frequently accessed by the CPU.
The memory with the fastest access is the costliest one. Larger storage devices offer slow speed
and they are less expensive, however they can store huge volumes of data as compared to CPU
registers or cache memory.
5. File Organisation
1. The File is a collection of records. Using the primary key, we can access the records. The
type and frequency of access can be determined by the type of file organization which was
used for a given set of records.
2. File organization is a logical relationship among various records. This method defines how
file records are mapped onto disk blocks.
3. File organization is used to describe the way in which the records are stored in terms of
blocks, and the blocks are placed on the storage medium.
Mr. R. Boaz Gladson, Assistant Professor, PG Department of Computer Science, Voorhees College-Vellore.
RELATIONAL DATABASE MANAGEMENT SYSTEM (RDBMS)
4. The first approach to map the database to the file is to use the several files and store only
one fixed length record in any given file.
Types of file organization:
File organization contains various methods. These particular methods have pros and cons
on the basis of access or selection. In the file organization, the programmer decides the best-suited
file organization method according to his requirement.
o Sequential file organization
o Heap file organization
o Hash file organization
o B+ file organization
o Indexed sequential access method (ISAM)
o Cluster file organization
5.1 Sequential File Organization
This method is the easiest method for file organization. In this method, files are stored
sequentially. This method can be implemented in two ways:
1. Pile File Method:
o It is a quite simple method. In this method, we store the record in a sequence, i.e., one after
another. Here, the record will be inserted in the order in which they are inserted into tables.
o In case of updating or deleting of any record, the record will be searched in the memory
blocks. When it is found, then it will be marked for deleting, and the new record is
inserted.
Mr. R. Boaz Gladson, Assistant Professor, PG Department of Computer Science, Voorhees College-Vellore.
RELATIONAL DATABASE MANAGEMENT SYSTEM (RDBMS)
Insertion of the new record:
Suppose we have four records R1, R3 and so on upto R9 and R8 in a sequence. Hence, records are
nothing but a row in the table. Suppose we want to insert a new record R2 in the sequence, then it
will be placed at the end of the file. Here, records are nothing but a row in any table.
2. Sorted File Method:
o In this method, the new record is always inserted at the file's end, and then it will sort the
sequence in ascending or descending order. Sorting of records is based on any primary key
or any other key.
o In the case of modification of any record, it will update the record and then sort the file,
and lastly, the updated record is placed in the right place.
Insertion of the new record:
There is a preexisting sorted sequence of four records R1, R3 and so on upto R6 and R7.
Suppose a new record R2 has to be inserted in the sequence, then it will be inserted at the end of
the file, and then it will sort the sequence.
Mr. R. Boaz Gladson, Assistant Professor, PG Department of Computer Science, Voorhees College-Vellore.
RELATIONAL DATABASE MANAGEMENT SYSTEM (RDBMS)
5.2 Heap file organization
o It is the simplest and most basic type of organization. It works with data blocks. In heap
file organization, the records are inserted at the file's end. When the records are inserted, it
doesn't require the sorting and ordering of records.
o When the data block is full, the new record is stored in some other block. This new data
block need not to be the very next data block, but it can select any data block in the
memory to store new records. The heap file is also known as an unordered file.
Insertion of a new record
five records R1, R3, R6, R4 and R5 in a heap and suppose we want to insert a new record R2 in
a heap. If the data block 3 is full then it will be inserted in any of the database selected by the
DBMS.
Mr. R. Boaz Gladson, Assistant Professor, PG Department of Computer Science, Voorhees College-Vellore.
RELATIONAL DATABASE MANAGEMENT SYSTEM (RDBMS)
If the database is very large then searching, updating or deleting of record will be time-
consuming because there is no sorting or ordering of records. In the heap file organization, we
need to check all the data until we get the requested record.
5.3 Hash File Organization
Hash File Organization uses the computation of hash function on some fields of the records.
The hash function's output determines the location of disk block where the records are to be
placed.
When a record has to be received using the hash key columns, then the address is
generated, and the whole record is retrieved using that address. In the same way, when a new
record has to be inserted, then the address is generated using the hash key and record is directly
inserted. The same process is applied in the case of delete and update.
Mr. R. Boaz Gladson, Assistant Professor, PG Department of Computer Science, Voorhees College-Vellore.
RELATIONAL DATABASE MANAGEMENT SYSTEM (RDBMS)
5.4 B+ File Organization
o B+ tree file organization is the advanced method of an indexed sequential access method.
It uses a tree-like structure to store records in File.
o It uses the same concept of key-index where the primary key is used to sort the records.
For each primary key, the value of the index is generated and mapped with the record.
o The B+ tree is similar to a binary search tree (BST), but it can have more than two
children. In this method, all the records are stored only at the leaf node. Intermediate nodes
act as a pointer to the leaf nodes. They do not contain any records.
The above B+ tree shows that:
o There is one root node of the tree, i.e., 25.
o There is an intermediary layer with nodes. They do not store the actual record. They have
only pointers to the leaf node.
o The nodes to the left of the root node contain the prior value of the root and nodes to the
right contain next value of the root, i.e., 15 and 30 respectively.
o There is only one leaf node which has only values, i.e., 10, 12, 17, 20, 24, 27 and 29.
o Searching for any record is easier as all the leaf nodes are balanced.
o In this method, searching any record can be traversed through the single path and accessed
easily.
Mr. R. Boaz Gladson, Assistant Professor, PG Department of Computer Science, Voorhees College-Vellore.
RELATIONAL DATABASE MANAGEMENT SYSTEM (RDBMS)
5.5 Indexed sequential access method (ISAM)
ISAM method is an advanced sequential file organization. In this method, records are stored in
the file using the primary key. An index value is generated for each primary key and mapped with
the record. This index contains the address of the record in the file.
If any record has to be retrieved based on its index value, then the address of the data block
is fetched and the record is retrieved from the memory.
5.6 Cluster file organization
o When the two or more records are stored in the same file, it is known as clusters. These
files will have two or more tables in the same data block, and key attributes which are used
to map these tables together are stored only once.
o This method reduces the cost of searching for various records in different files.
o The cluster file organization is used when there is a frequent need for joining the tables
with the same condition. These joins will give only a few records from both tables. In the
given example, we are retrieving the record for only particular departments.
Mr. R. Boaz Gladson, Assistant Professor, PG Department of Computer Science, Voorhees College-Vellore.
RELATIONAL DATABASE MANAGEMENT SYSTEM (RDBMS)
This can directly insert, update or delete any record. Data is sorted based on the key with which
searching is done. Cluster key is a type of key with which joining of the table is performed.
Types of Cluster file organization:
Cluster file organization is of two types:
1. Indexed Clusters:
In indexed cluster, records are grouped based on the cluster key and stored together. The
above EMPLOYEE and DEPARTMENT relationship is an example of an indexed cluster. Here,
all the records are grouped based on the cluster key- DEP_ID and all the records are grouped.
2. Hash Clusters:
It is similar to the indexed cluster. In hash cluster, instead of storing the records based on
the cluster key, we generate the value of the hash key for the cluster key and store the records with
the same hash key value.
6. Data Dictionary Storage
In the relational database system, it maintains all information of a relation or table, from its
schema to the applied constraints. All the metadata is stored. In general, metadata refers to the
data about data. So, storing the relational schemas and other metadata about the relations in a
structure is known as Data Dictionary or System Catalog.
A data dictionary is like the A-Z dictionary of the relational database system holding all
information of each relation in the database.
The types of information a system must store are:
o Name of the relations
o Name of the attributes of each relation
o Lengths and domains of attributes
o Name and definitions of the views defined on the database
Mr. R. Boaz Gladson, Assistant Professor, PG Department of Computer Science, Voorhees College-Vellore.
RELATIONAL DATABASE MANAGEMENT SYSTEM (RDBMS)
o Various integrity constraints
With this, the system also keeps the following data based on users of the system:
o Name of authorized users
o Accounting and authorization information about users.
o The authentication information for users, such as passwords or other related information.
In addition to this, the system may also store some statistical and descriptive data about the
relations, such as:
o Number of tuples in each relation
o Method of storage for each relation, such as clustered or non-clustered.
A system may also store the storage organization, whether sequential, hash, or heap. It also
notes the location where each relation is stored:
o If relations are stored in the files of the operating system, the data dictionary note, and
stores the names of the file.
o If the database stores all the relations in a single file, the data dictionary notes and store the
blocks containing records of each relation in a data structure similar to a linked list.
At last, it also stores the information regarding each index of all the relations:
o Name of the index.
o Name of the relation being indexed.
o Attributes on which the index is defined.
o The type of index formed.
All the above information or metadata is stored in a data dictionary. The data dictionary also
maintains updated information whenever they occur in the relations. Such metadata constitutes a
miniature database. Some systems store the metadata in the form of a relation in the database
itself. The system designers design the way of representation of the data dictionary. Also, a data
dictionary stores the data in a non-formalized manner. It does not use any normal form so as to
fastly access the data stored in the dictionary.
For example, in the data dictionary, it uses underline below the value to represent that the
following field contains a primary key.
The database system requires fetching records from a relation, it firstly finds in the relation
of data dictionary about the location and storage organization of the relation. After confirming the
details, it finally retrieves the required record from the database.
Mr. R. Boaz Gladson, Assistant Professor, PG Department of Computer Science, Voorhees College-Vellore.
RELATIONAL DATABASE MANAGEMENT SYSTEM (RDBMS)
UNIT – V: QUERY PROCESSING AND TRANSACTION MANAGEMENT
1. Query Processing
Query Processing is the activity performed in extracting data from the database. In query
processing, it takes various steps for fetching the data from the database. The steps involved are:
1. Parsing and translation
2. Optimization
3. Evaluation
The query processing works in the following way:
Parsing and Translation
As query processing includes certain activities for data retrieval. Initially, the given user
queries get translated in high-level database languages such as SQL. It gets translated into
expressions that can be further used at the physical level of the file system. After this, the actual
evaluation of the queries and a variety of query -optimizing transformations and takes place. Thus
before processing a query, a computer system needs to translate the query into a human-readable
and understandable language.
SQL or Structured Query Language is the best suitable choice for humans. But, it is not
perfectly suitable for the internal representation of the query to the system. Relational algebra is
well suited for the internal representation of a query. The translation process in query processing
is similar to the parser of a query. When a user executes any query, for generating the internal
form of the query, the parser in the system checks the syntax of the query, verifies the name of the
relation in the database, the tuple, and finally the required attribute value. The parser creates a tree
of the query, known as 'parse-tree.' Further, translate it into the form of relational algebra. With
this, it evenly replaces all the use of the views when used in the query.
select emp_name from Employee where salary>10000;
o σsalary>10000 (πsalary (Employee))
o πsalary (σsalary>10000 (Employee))
Evaluation
For this, with addition to the relational algebra translation, it is required to annotate the
translated relational algebra expression with the instructions used for specifying and evaluating
Mr. R. Boaz Gladson, Assistant Professor, PG Department of Computer Science, Voorhees College-Vellore.
RELATIONAL DATABASE MANAGEMENT SYSTEM (RDBMS)
each operation. Thus, after translating the user query, the system executes a query evaluation plan.
Query Evaluation Plan
o In order to fully evaluate a query, the system needs to construct a query evaluation plan.
o The annotations in the evaluation plan may refer to the algorithms to be used for the
particular index or the specific operations.
o Such relational algebra with annotations is referred to as Evaluation Primitives. The
evaluation primitives carry the instructions needed for the evaluation of the operation.
o Thus, a query evaluation plan defines a sequence of primitive operations used for
evaluating a query. The query evaluation plan is also referred to as the query execution
plan.
o A query execution engine is responsible for generating the output of the given query. It
takes the query execution plan, executes it, and finally makes the output for the user query.
Optimization
o The cost of the query evaluation can vary for different types of queries. Although the
system is responsible for constructing the evaluation plan, the user does need not to write
their query efficiently.
o Usually, a database system generates an efficient query evaluation plan, which minimizes
its cost. This type of task performed by the database system and is known as Query
Optimization.
o For optimizing a query, the query optimizer should have an estimated cost analysis of each
operation. It is because the overall operation cost depends on the memory allocations to
several operations, execution costs, and so on
2. Transaction Concept
o The transaction is a set of logically related operation. It contains a group of tasks.
o A transaction is an action or series of actions. It is performed by a single user to perform
operations for accessing the contents of the database.
Example: Suppose an employee of bank transfers Rs 800 from X's account to Y's account.
This small transaction contains several low-level tasks:
X's Account
Open_Account(X)
Old_Balance = [Link]
New_Balance = Old_Balance - 800
[Link] = New_Balance
Close_Account(X)
Y's Account
Open_Account(Y)
Mr. R. Boaz Gladson, Assistant Professor, PG Department of Computer Science, Voorhees College-Vellore.
RELATIONAL DATABASE MANAGEMENT SYSTEM (RDBMS)
Old_Balance = [Link]
New_Balance = Old_Balance + 800
[Link] = New_Balance
Close_Account(Y)
Operations of Transaction:
Following are the main operations of transaction:
Read(X): Read operation is used to read the value of X from the database and stores it in a buffer
in main memory.
Write(X): Write operation is used to write the value back to the database from the buffer.
Let's take an example to debit transaction from an account which consists of following operations:
1. R(X);
2. X = X - 500;
3. W(X);
Let's assume the value of X before starting of the transaction is 4000.
o The first operation reads X's value from database and stores it in a buffer.
o The second operation will decrease the value of X by 500. So buffer will contain 3500.
o The third operation will write the buffer's value to the database. So X's final value will be
3500.
But it may be possible that because of the failure of hardware, software or power, etc. that
transaction may fail before finished all the operations in the set.
For example: If in the above transaction, the debit transaction fails after executing operation 2
then X's value will remain 4000 in the database which is not acceptable by the bank.
To solve this problem, we have two important operations:
Commit: It is used to save the work done permanently.
Rollback: It is used to undo the work done.
3. Concurrency Control
Concurrency Control is the management procedure that is required for controlling
concurrent execution of the operations that take place on a database.
Concurrency Control is the working concept that is required for controlling and managing
the concurrent execution of database operations and thus avoiding the inconsistencies in the
database. Thus, for maintaining the concurrency of the database, we have the concurrency control
protocols.
Mr. R. Boaz Gladson, Assistant Professor, PG Department of Computer Science, Voorhees College-Vellore.
RELATIONAL DATABASE MANAGEMENT SYSTEM (RDBMS)
Concurrency Control Protocols
The concurrency control protocols ensure the atomicity, consistency, isolation,
durability and serializability of the concurrent execution of the database transactions. Therefore,
these protocols are categorized as:
o Lock Based Concurrency Control Protocol
o Time Stamp Concurrency Control Protocol
o Validation Based Concurrency Control Protocol
3.1 Lock-Based Protocol
In this type of protocol, any transaction cannot read or write data until it acquires an
appropriate lock on it. There are two types of lock:
1. Shared lock:
o It is also known as a Read-only lock. In a shared lock, the data item can only read by the
transaction.
o It can be shared between the transactions because when the transaction holds a lock, then it
can't update the data on the data item.
2. Exclusive lock:
o In the exclusive lock, the data item can be both reads as well as written by the transaction.
o This lock is exclusive, and in this lock, multiple transactions do not modify the same data
simultaneously.
There are four types of lock protocols available:
1. Simplistic lock protocol
It is the simplest way of locking the data while transaction. Simplistic lock-based protocols
allow all the transactions to get the lock on the data before insert or delete or update on it. It will
unlock the data item after completing the transaction.
2. Pre-claiming Lock Protocol
o Pre-claiming Lock Protocols evaluate the transaction to list all the data items on which
they need locks.
o Before initiating an execution of the transaction, it requests DBMS for all the lock on all
those data items.
Mr. R. Boaz Gladson, Assistant Professor, PG Department of Computer Science, Voorhees College-Vellore.
RELATIONAL DATABASE MANAGEMENT SYSTEM (RDBMS)
o If all the locks are granted then this protocol allows the transaction to begin. When the
transaction is completed then it releases all the lock.
o If all the locks are not granted then this protocol allows the transaction to rolls back and
waits until all the locks are granted.
3. Two-phase locking (2PL)
o The two-phase locking protocol divides the execution phase of the transaction into three
parts.
o In the first part, when the execution of the transaction starts, it seeks permission for the
lock it requires.
o In the second part, the transaction acquires all the locks. The third phase is started as soon
as the transaction releases its first lock.
o In the third phase, the transaction cannot demand any new locks. It only releases the
acquired locks.
There are two phases of 2PL:
Growing phase: In the growing phase, a new lock on the data item may be acquired by the
transaction, but none can be released.
Shrinking phase: In the shrinking phase, existing lock held by the transaction may be released,
but no new locks can be acquired.
In the below example, if lock conversion is allowed then the following phase can happen:
1. Upgrading of lock (from S(a) to X (a)) is allowed in growing phase.
Mr. R. Boaz Gladson, Assistant Professor, PG Department of Computer Science, Voorhees College-Vellore.
RELATIONAL DATABASE MANAGEMENT SYSTEM (RDBMS)
2. Downgrading of lock (from X(a) to S(a)) must be done in shrinking phase.
Example:
4. Strict Two-phase locking (Strict-2PL)
o The first phase of Strict-2PL is similar to 2PL. In the first phase, after acquiring all the
locks, the transaction continues to execute normally.
o The only difference between 2PL and strict 2PL is that Strict-2PL does not release a lock
after using it.
o Strict-2PL waits until the whole transaction to commit, and then it releases all the locks at
a time.
o Strict-2PL protocol does not have shrinking phase of lock release.
4. Deadlock Handling:
A deadlock is a condition where two or more transactions are waiting indefinitely for one
another to give up locks. Deadlock is said to be one of the most feared complications in DBMS as
no task ever gets finished and is in waiting state forever.
Mr. R. Boaz Gladson, Assistant Professor, PG Department of Computer Science, Voorhees College-Vellore.
RELATIONAL DATABASE MANAGEMENT SYSTEM (RDBMS)
Deadlock Avoidance
o When a database is stuck in a deadlock state, then it is better to avoid the database rather
than aborting or restating the database. This is a waste of time and resource.
o Deadlock avoidance mechanism is used to detect any deadlock situation in advance. A
method like "wait for graph" is used for detecting the deadlock situation but this method is
suitable only for the smaller database. For the larger database, deadlock prevention method
can be used.
Deadlock Detection
In a database, when a transaction waits indefinitely to obtain a lock, then the DBMS
should detect whether the transaction is involved in a deadlock or not. The lock manager
maintains a Wait for the graph to detect the deadlock cycle in the database.
Wait for Graph
o This is the suitable method for deadlock detection. In this method, a graph is created based
on the transaction and their lock. If the created graph has a cycle or closed loop, then there
is a deadlock.
o The wait for the graph is maintained by the system for every transaction which is waiting
for some data held by the others. The system keeps checking the graph if there is any cycle
in the graph.
Deadlock Prevention
o Deadlock prevention method is suitable for a large database. If the resources are allocated
in such a way that deadlock never occurs, then the deadlock can be prevented.
Mr. R. Boaz Gladson, Assistant Professor, PG Department of Computer Science, Voorhees College-Vellore.
RELATIONAL DATABASE MANAGEMENT SYSTEM (RDBMS)
o The Database management system analyzes the operations of the transaction whether they
can create a deadlock situation or not. If they do, then the DBMS never allowed that
transaction to be executed.
Wait-Die scheme
In this scheme, if a transaction requests for a resource which is already held with a
conflicting lock by another transaction then the DBMS simply checks the timestamp of both
transactions. It allows the older transaction to wait until the resource is available for execution.
Wound wait scheme
o In wound wait scheme, if the older transaction requests for a resource which is held by the
younger transaction, then older transaction forces younger one to kill the transaction and
release the resource. After the minute delay, the younger transaction is restarted but with
the same timestamp.
o If the older transaction has held a resource which is requested by the Younger transaction,
then the younger transaction is asked to wait until older releases it.
Mr. R. Boaz Gladson, Assistant Professor, PG Department of Computer Science, Voorhees College-Vellore.