DBMS and SQL
DBMS and SQL
As the name suggests, the database management system consists of two parts. They are:
Database: A database is a collection of related data. By data, we mean known facts that can be
recorded and that have implicit meaning.
2. A database is designed, built, and populated with data for a specific purpose. It has an intended group
of users and some preconceived application in which these users are interested.
A database management system (DBMS) is a collection of programs that enables users to create and
maintain a database.
A DBMS is hence a general-purpose software system that facilitates the processes of defining,
constructing and manipulating databases for various applications. The database and software are
together called a database system.10.7MDK, JRE, and JVM
For example: The college Database organizes the data about the admin, staff, students and faculty etc.
Using the database, you can easily retrieve, insert, and delete the information.
Characteristics of DBMS
o It uses a digital repository established on a server to store and manage the information.
o It can provide a clear and logical view of the process that manipulates data.
o It contains ACID properties which maintain data in a healthy state in case of failure.
o It can view the database from different viewpoints according to the requirements of the user.
Advantages of DBMS
o Controls database redundancy: It can control data redundancy because it stores all the data in one single
database file and that recorded data is placed in the database.
o Data sharing: In DBMS, the authorized users of an organization can share the data among multiple users.
o Easily Maintenance: It can be easily maintainable due to the centralized nature of the database system.
Disadvantages of DBMS
o Cost of Hardware and Software: It requires a high speed of data processor and large memory size to run
DBMS software.
o Size: It occupies a large space of disks and large memory to run them efficiently.
Many persons are involved in the design, use and maintenance of a large database. Different types of
DBMS users are:
2. Database Designer
3. End users
DBMS Architecture
o The DBMS design depends upon its architecture. The basic client/server architecture is used to deal with a
large number of PCs, web servers, database servers and other components that are connected with networks.
o The client/server architecture consists of many PCs and a workstation which are connected via the network.
o DBMS architecture depends upon how users are connected to the database to get their request done.
Types of DBMS Architecture
Database architecture can be seen as a single tier or multi-tier. But logically, database architecture is of two types
like: 2-tier architecture and 3-tier architecture.
1-Tier Architecture
o In this architecture, the database is directly available to the user. It means the user can directly sit on the
DBMS and uses it.
o Any changes done here will directly be done on the database itself. It doesn't provide a handy tool for end
users.
o The 1-Tier architecture is used for development of the local application, where programmers can directly
communicate with the database for the quick response.
2-Tier Architecture
o The 2-Tier architecture is same as basic client-server. In the two-tier architecture, applications on the client
end can directly communicate with the database at the server side. For this interaction, API's
like: ODBC, JDBC are used.
o The user interfaces and application programs are run on the client-side.
o The server side is responsible to provide the functionalities like: query processing and transaction
management.
o To communicate with the DBMS, client-side application establishes a connection with the server side.
3-Tier Architecture
o The 3-Tier architecture contains another layer between the client and server. In this architecture, client can't
directly communicate with the server.
o The application on the client-end interacts with an application server which further communicates with the
database system.
o End user has no idea about the existence of the database beyond the application server. The database also has
no idea about any other user beyond the application.
History of DBMS
•
Data is a collection of facts and figures. The data collection was increasing day to day and they needed
to be stored in a device or a safer software. Charles Bachman was the first person to develop the
Integrated Data Store (IDS) which was based on a network data model for which he was inaugurated
with the Turing Award (The most prestigious award which is equivalent to the Nobel prize in the field
of Computer Science.). It was developed in the early 1960s. In the late 1960s, IBM (International
Business Machines Corporation) developed the Integrated Management Systems which is the standard
database system used to date in many places. It was developed based on the hierarchical database
model. It was during the year 1970 that the
Relational Model
Relational data model is the primary data model, which is used widely around the world for data storage
and processing. This model is simple and it has all the properties and capabilities required to process
data with storage efficiency.
Concepts
Tables − In relational data model, relations are saved in the format of Tables. This format stores the
relation among entities. A table has rows and columns, where rows represents records and columns
represent the attributes.
Tuple − A single row of a table, which contains a single record for that relation is called a tuple.
Relation instance − A finite set of tuples in the relational database system represents relation instance.
Relation instances do not have duplicate tuples.
Relation schema − A relation schema describes the relation name (table name), attributes, and their
names.
Relation key − Each row has one or more attributes, known as relation key, which can identify the row
in the relation (table) uniquely.
Attribute domain − Every attribute has some pre-defined value scope, known as attribute domain.
Integrity Constraints
o Integrity constraints are a set of rules. It is used to maintain the quality of information.
o Integrity constraints ensure that the data insertion, updating, and other processes have to be
performed in such a way that data integrity is not affected.
o Thus, integrity constraint is used to guard against accidental damage to the database.
1. Domain constraints
o Domain constraints can be defined a valid set of values for an attribute.
o The data type of domain includes string, character, integer, time, date, currency, etc. The value
of the attribute must be available in the corresponding domain.
Example:
Example:
3. Referential Integrity Constraints
o A referential integrity constraint is specified between two tables.
o In the Referential integrity constraints, if a foreign key in Table 1 refers to the Primary Key of
Table 2, then every value of the Foreign Key in Table 1 must be null or be available in Table 2.
Example:
4. Key constraints
o Keys are the entity set that is used to identify an entity within its entity set uniquely.
o An entity set can have multiple keys, but out of which one key will be the primary key. A
primary key can contain a unique and null value in the relational table.
Example:
ER model
o ER model stands for an Entity-Relationship model. It is a high-level data model. This model is
used to define the data elements and relationship for a specified system.
o It develops a conceptual design for the database. It also develops a very simple and easy to
design view of data.
o In ER modeling, the database structure is portrayed as a diagram called an entity-relationship
diagram.
For example, Suppose we design a school database. In this database, the student will be an entity with
attributes like address, name, id, age, etc. The address can be another entity with attributes like city,
street name, pin code, etc and there will be a relationship between them.
Component of ER Diagram
1. Entity:
An entity may be any object, class, person or place. In the ER diagram, an entity can be represented as
rectangles.
Consider an organization as an example- manager, product, employee, department etc. can be taken as
an entity.
a. Weak Entity
An entity that depends on another entity called a weak entity. The weak entity doesn't contain any key
attribute of its own. The weak entity is represented by a double rectangle.
2. Attribute
The attribute is used to describe the property of an entity. Eclipse is used to represent an attribute.
For example, id, age, contact number, name, etc. can be attributes of a student.
a. Key Attribute
The key attribute is used to represent the main characteristics of an entity. It represents a primary key.
The key attribute is represented by an ellipse with the text underlined.
b. Composite Attribute
An attribute that composed of many other attributes is known as a composite attribute. The composite
attribute is represented by an ellipse, and those ellipses are connected with an ellipse.
c. Multivalued Attribute
An attribute can have more than one value. These attributes are known as a multivalued attribute. The
double oval is used to represent multivalued attribute.
For example, a student can have more than one phone number.
d. Derived Attribute
An attribute that can be derived from other attribute is known as a derived attribute. It can be
represented by a dashed ellipse.
For example, A person's age changes over time and can be derived from another attribute like Date of
birth.
3. Relationship
A relationship is used to describe the relation between entities. Diamond or rhombus is used to represent
the relationship.
a. One-to-One Relationship
When only one instance of an entity is associated with the relationship, then it is known as one to one
relationship.
For example, A female can marry to one male, and a male can marry to one female.
b. One-to-many relationship
When only one instance of the entity on the left, and more than one instance of an entity on the right
associates with the relationship then this is known as a one-to-many relationship.
For example, Scientist can invent many inventions, but the invention is done by the only specific
scientist.
c. Many-to-one relationship
When more than one instance of the entity on the left, and only one instance of an entity on the right
associates with the relationship then it is known as a many-to-one relationship.
For example, Student enrolls for only one course, but a course can have many students.
d. Many-to-many relationship
When more than one instance of the entity on the left, and more than one instance of an entity on the
right associates with the relationship then it is known as a many-to-many relationship.
For example, Employee can assign by many projects and project can have many employees.
Relationship
Relationships are represented by diamond-shaped box. Name of the relationship is written inside the
diamond-box. All the entities (rectangles) participating in a relationship, are connected to it by a line.
A relationship where two entities are participating is called a binary relationship. Cardinality is the
number of instance of an entity from a relation that can be associated with the relation.
• One-to-one − When only one instance of an entity is associated with the relationship, it is
marked as '1:1'. The following image reflects that only one instance of each entity should be
associated with the relationship. It depicts one-to-one relationship.
• One-to-many − When more than one instance of an entity is associated with a relationship, it is
marked as '1:N'. The following image reflects that only one instance of entity on the left and
more than one instance of an entity on the right can be associated with the relationship. It depicts
one-to-many relationship.
• Many-to-one − When more than one instance of entity is associated with the relationship, it is
marked as 'N:1'. The following image reflects that more than one instance of an entity on the
left and only one instance of an entity on the right can be associated with the relationship. It
depicts many-to-one relationship.
• Many-to-many − The following image reflects that more than one instance of an entity on the
left and more than one instance of an entity on the right can be associated with the relationship.
It depicts many-to-many relationship.
Participation Constraints
• Total Participation − Each entity is involved in the relationship. Total participation is
represented by double lines.
• Partial participation − Not all entities are involved in the relationship. Partial participation is
represented by single lines.
Generalization
As mentioned above, the process of generalizing entities, where the generalized entities contain the
properties of all the generalized entities, is called generalization. In generalization, a number of entities
are brought together into one generalized entity based on their similar characteristics. For example,
pigeon, house sparrow, crow and dove can all be generalized as Birds.
Specialization
Specialization is the opposite of generalization. In specialization, a group of entities is divided into sub-
groups based on their characteristics. Take a group ‘Person’ for example. A person has name, date of
birth, gender, etc. These properties are common in all persons, human beings. But in a company,
persons can be identified as employee, employer, customer, or vendor, based on what role they play in
the company.
Similarly, in a school database, persons can be specialized as teacher, student, or a staff, based on what
role they play in school as entities.
Keys
o Keys play an important role in the relational database.
o It is used to uniquely identify any record or row of data from the table. It is also used to establish
and identify relationships between tables.
For example: In Student table, ID is used as a key because it is unique for each student. In PERSON
table, passport_number, license_number, SSN are keys since they are unique for each person.
Types of key:
1. Primary key
o It is the first key which is used to identify one and only one instance of an entity uniquely. An
entity can contain multiple keys as we saw in PERSON table. The key which is most suitable
from those lists become a primary key.
o In the EMPLOYEE table, ID can be primary key since it is unique for each employee. In the
EMPLOYEE table, we can even select License_Number and Passport_Number as primary key
since they are also unique.
o For each entity, selection of the primary key is based on requirement and developers.
2. Candidate key
o A candidate key is an attribute or set of an attribute which can uniquely identify a tuple.
o The remaining attributes except for primary key are considered as a candidate key. The
candidate keys are as strong as the primary key.
For example: In the EMPLOYEE table, id is best suited for the primary key. Rest of the attributes like
SSN, Passport_Number, and License_Number, etc. are considered as a candidate key.
3. Super Key
Super key is a set of an attribute which can uniquely identify a tuple. Super key is a superset of a
candidate key.
For example: In the above EMPLOYEE table, for(EMPLOEE_ID, EMPLOYEE_NAME) the name
of two employees can be the same, but their EMPLYEE_ID can't be the same. Hence, this combination
can also be a key.
4. Foreign key
o Foreign keys are the column of the table1 which is used to point to the primary key in table2.
o In a company, every employee works in a specific department, and employee and department
are two different entities. So we can't store the information of the department in the employee
table. That's why we link these two tables through the primary key of one table.
o We add the primary key of the DEPARTMENT table, Department_Id as a new attribute in the
EMPLOYEE table.
o Now in the EMPLOYEE table, Department_Id is the foreign key, and both the tables are related.
What is SQL?
SQL is Structured Query Language, which is a computer language for storing, manipulating and
retrieving data stored in a relational database.
SQL is the standard language for Relational Database System. All the Relational Database Management
Systems (RDMS) like MySQL, MS Access, Oracle, Sybase, Informix, Postgres and SQL Server use
SQL as their standard database language.
SQL Process
When you are executing an SQL command for any RDBMS, the system determines the best way to
carry out your request and SQL engine figures out how to interpret the task.
There are various components included in this process.
These components are −
• Query Dispatcher
• Optimization Engines
• Classic Query Engine
• SQL Query Engine, etc.
A classic query engine handles all the non-SQL queries, but a SQL query engine won't handle logical
files.
Following is a simple diagram showing the SQL Architecture −
SQL Commands
The standard SQL commands to interact with relational databases are CREATE, SELECT, INSERT,
UPDATE, DELETE and DROP. These commands can be classified into the following groups based
on their nature −
CREATE
1
Creates a new table, a view of a table, or other object in the database.
ALTER
2
Modifies an existing database object, such as a table.
DROP
3
Deletes an entire table, a view of a table or other objects in the database.
SELECT
1
Retrieves certain records from one or more tables.
INSERT
2
Creates a record.
UPDATE
3
Modifies records.
DELETE
4
Deletes records.
REVOKE
2
Takes back privileges granted from user.
There are Three types of binary Datatypes which are given below:
Binary It has a maximum length of 8000 bytes. It contains fixed-length binary data.
varbinary It has a maximum length of 8000 bytes. It contains variable-length binary data.
Image It has a maximum length of 2,147,483,647 bytes. It contains variable-length binary data.
Float -1.79E + 308 1.79E + 308 It is used to specify a floating-point value e.g. 6.2, 2.9
etc.
Char It has a maximum length of 8000 characters. It contains Fixed-length non-unicode characters.
varchar It has a maximum length of 8000 characters. It contains variable-length non-unicode characters.
Timestamp It stores the year, month, day, hour, minute, and the second value.
1. SELECT Statement
This SQL statement reads the data from the SQL database and shows it as the output to the database
user.
2. [ FROM table_name ]
3. [ WHERE condition ]
4. ORDER BY Last_Name
This example shows the Emp_ID, First_Name, Last_Name, Salary, and City of those employees
from the Employee_details table whose Salary is 100000. The output shows all the specified details
according to the ascending alphabetical order of Last_Name.
3. UPDATE Statement
This SQL statement changes or modifies the stored data in the SQL database.
1. UPDATE table_name
3. [ WHERE CONDITION ];
1. UPDATE Employee_details
This example changes the Salary of those employees of the Employee_details table
whose Emp_ID is 10 in the table.
3. DELETE Statement
This SQL statement deletes the stored data from the SQL database.
2. [ WHERE CONDITION ];
This example deletes the record of those employees from the Employee_details table
whose First_Name is Sumit in the table.
This SQL statement creates the new table in the SQL database.
2. (
5. .....
6. .....,
7. column_nameN data_type [columnN constraint(s)],
9. );
3. First_name VARCHAR(30),
4. Last_name VARCHAR(30),
5. Salary Money,
6. City VARCHAR(30),
8. );
This example creates the table Employee_details with five columns or fields in the SQL database. The
fields in the table are Emp_Id, First_Name, Last_Name, Salary, and City. The Emp_Id column in
the table acts as a primary key, which means that the Emp_Id column cannot contain duplicate values
and null values.
This SQL statement adds, deletes, and modifies the columns of the table in the SQL database.
The above 'SQL alter statement' renames the old column name to the new column name of the existing
database table.
The above SQL alter statement deletes the column of the existing database table.
This example adds the new field whose name is Designation with size 18 in
the Employee_details table of the SQL database.
This SQL statement deletes or removes the table and the structure, views, permissions, and triggers
associated with that table.
The above syntax of the drop statement deletes specified tables completely if they exist in the database.
This example drops the Employee_details table if it exists in the SQL database. This removes the
complete information if available in the table.
This SQL statement creates the new database in the database management system.
This SQL statement deletes the existing database with all the data tables and views from the database
management system.
The above example deletes the company database from the system.
9. INSERT INTO Statement
This SQL statement inserts the data or records in the existing table of the SQL database. This statement
can easily insert single and multiple records in a single query statement.
2. (
3. column_name1,
4. column_name2, .…,
5. column_nameN
6. )
7. VALUES
8. (value_1,
9. value_2, ..…,
10. value_N
11. );
2. (
3. Emp_ID,
4. First_name,
5. Last_name,
6. Salary,
7. City
8. )
9. VALUES
10. (101,
11. Akhil,
12. Sharma,
13. 40000,
14. Bangalore
15. );
This example inserts 101 in the first column, Akhil in the second column, Sharma in the third
column, 40000 in the fourth column, and Bangalore in the last column of the table Employee_details.
3. VALUES (101, Amit, Gupta, 50000, Mumbai), (101, John, Aggarwal, 45000, Calcutta), (101, Sidhu,
Arora, 55000, Mumbai);
This example inserts the records of three employees in the Employee_details table in the single query
statement.
This SQL statement deletes all the stored records from the table of the SQL database.
This example deletes the record of all employees from the Employee_details table of the database.
This SQL statement tells something about the specified table or view in the query.
1. DESCRIBE Employee_details;
This example explains the structure and other details about the Employee_details table.
This SQL statement shows the distinct values from the specified columns of the database table. This
statement is used with the SELECT keyword.
2. FROM table_name;
2. FROM Employee_details;
This example shows the distinct values of the City and Salary column from
the Employee_details table.
This SQL statement saves the changes permanently, which are done in the transaction of the SQL database.
1. COMMIT
This example deletes the records of those employees whose Salary is 30000 and then saves the changes permanently
in the database.
This SQL statement undo the transactions and operations which are not yet saved to the SQL database.
1. ROLLBACK
3. ROLLBACK;
This example deletes the records of those employees whose City is Mumbai and then undo the changes in the
database.
This SQL statement creates the new index in the SQL database table.
This example creates an index idx_First_Name on the First_Name column of the Employee_details table.
This SQL statement deletes the existing index of the SQL database table.
This example deletes the index idx_First_Name from the SQL database.
This SQL statement selects the existing SQL database. Before performing the operations on the database table, you
have to select the database from the multiple existing databases.
1. USE database_name;
1. USE Company;
WHERE clauses are not mandatory clauses of SQL DML statements. But it can be used to limit the number of rows
affected by a SQL DML statement or returned by a query.
Actually. it filters the records. It returns only those queries which fulfill the specific conditions.
2. FROM table_name
3. WHERE [conditions]
= Equal
o The SQL AND condition also can be used to join multiple tables in a SQL statement.
Consider we have an employee table created into the database with the following data:
This is how an SQL "AND" condition can be used in the SQL SELECT statement.
Example 1:
Write a query to get the records from emp tables in which department of the employee is IT and location is Chennai.
Query:
1. mysql> SELECT *FROM emp WHERE Department = "IT" AND Location = "Chennai";
In the emp table, there are three employees whose department is IT. But we have specified the AND condition
according to which the employee's location should not be other than Chennai. So, there are only two employees whose
department is IT and Location is Chennai.
Example 2:
Write a query to get the records from emp tables in which department of the employee is IT and location is Mumbai.
Query:
1. mysql> SELECT *FROM emp WHERE Department = "IT" AND Location = "Mumbai";
ID First_Name Last_Name Department Location
In the emp table, there are three employees whose department is IT. Among these three employees, there is only one
employee whose location is Mumbai. Due to the presence of the AND operator used in the query, a record must satisfy
both conditions.
This is how the "AND" condition can be used in the SQL UPDATE statement.
Example 1:
Write a query to update the records in emp tables in which department of the employee is Marketing, and the first
name is Suraj. For that particular employee, set the updated value of the location as Delhi.
Query:
1. mysql> UPDATE emp SET Location = "Delhi" WHERE Department = "Marketing" AND First_Name = "Suraj";
In the emp table, there are three employees whose department is IT. Among these three employees, there is only one
employee whose location is Mumbai. Due to the presence of the AND operator used in the query, a record must satisfy
both conditions.
Example 2:
Write a query to update the records in the emp table in which department of the employee is Finance and ID is 7. For
that particular employee, set the updated value of the department as HR.
Query:
1. mysql> UPDATE emp SET Department = "HR" WHERE Department = "Finance" AND ID = 7;
We will use the SELECT query to verify the updated record.
In the emp table, there are two employees whose department is Finance. Among these two employees, there is only
one employee whose ID is 7. Due to the presence of AND operator used in the query, a record must have the
department as Finance and ID as 7.
SQL "AND" example with "DELETE" statement
This is how an SQL "AND" condition can be used in the SQL DELETE statement.
Example 1:
Write a query to delete the records from the emp table in which the last name of the employee is Jain, and the Location
is Bangalore.
Query:
1. mysql> DELETE FROM emp WHERE Last_Name = 'Jain' AND Location = 'Bangalore';
There is only one record in the emp table whose last name is Jain. But still, due to the presence of AND operator, the
second condition will also be checked according to which employee's location should be Bangalore. So, only that
particular record is deleted.
Example 2:
Write a query to delete the records from the emp table in which department of the employee is IT and Location is
Mumbai.
Query:
1. mysql> DELETE FROM emp WHERE Department = 'IT' AND Location = 'Mumbai';
There are three records in the emp table whose department is IT. But only one record is deleted from the emp table,
which contains a total of 6 records. This happened because of the AND operator according to which the employee's
location should mandatorily be Mumbai. Therefore there is only one record that satisfies both the conditions. Hence,
it is deleted.
Join Operations:
A Join operation combines related tuples from different relations, if and only if a given join condition
is satisfied. It is denoted by ⋈.
Example:
EMPLOYEE
EMP_CODE EMP_NAME
101 Stephan
102 Jack
103 Harry
SALARY
EMP_CODE SALARY
101 50000
102 30000
103 25000
Result:
1. Natural Join:
o A natural join is the set of tuples of all combinations in R and S that are equal on their common
attribute names.
o It is denoted by ⋈.
Example: Let's use the above EMPLOYEE table and SALARY table:
Input:
EMP_NAME SALARY
Stephan 50000
Jack 30000
Harry 25000
2. Outer Join:
The outer join operation is an extension of the join operation. It is used to deal with missing information.
Example:
EMPLOYEE
FACT_WORKERS
EMP_NAME BRANCH SALARY
Input:
1. (EMPLOYEE ⋈ FACT_WORKERS)
Output:
o It is denoted by ⟕.
Input:
1. EMPLOYEE ⟕ FACT_WORKERS
o It is denoted by ⟖.
Example: Using the above EMPLOYEE table and FACT_WORKERS Relation
Input:
1. EMPLOYEE ⟖ FACT_WORKERS
Output:
o It is denoted by ⟗.
Input:
1. EMPLOYEE ⟗ FACT_WORKERS
Output:
EMP_NAME STREET CITY BRANCH SALARY
3. Equi join:
It is also known as an inner join. It is the most common join. It is based on matched data as per the
equality condition. The equi join uses the comparison operator(=).
Example:
CUSTOMER RELATION
CLASS_ID NAME
1 John
2 Harry
3 Jackson
PRODUCT
PRODUCT_ID CITY
1 Delhi
2 Mumbai
3 Noida
Input:
1. CUSTOMER ⋈ PRODUCT
Output:
1 John 1 Delhi
2 Harry 2 Mumbai
3 Harry 3 Noida
Relational Algebra
Relational algebra is a procedural query language. It gives a step by step process to obtain the result of
the query. It uses operators to perform queries.
1. Select Operation:
o The select operation selects tuples that satisfy a given predicate.
o It is denoted by sigma (σ).
1. Notation: σ p(r)
Where:
Input:
1. σ BRANCH_NAME="perryride" (LOAN)
Output:
2. Project Operation:
o This operation shows the list of those attributes that we wish to appear in the result. Rest of the
attributes are eliminated from the table.
o It is denoted by ∏.
Where
Output:
NAME CITY
Jones Harrison
Smith Rye
Hays Harrison
Curry Rye
Johnson Brooklyn
Brooks Brooklyn
3. Union Operation:
o Suppose there are two tuples R and S. The union operation contains all the tuples that are either
in R or S or both in R & S.
1. Notation: R ∪ S
Example:
DEPOSITOR RELATION
CUSTOMER_NAME ACCOUNT_NO
Johnson A-101
Smith A-121
Mayes A-321
Turner A-176
Johnson A-273
Jones A-472
Lindsay A-284
BORROW RELATION
CUSTOMER_NAME LOAN_NO
Jones L-17
Smith L-23
Hayes L-15
Jackson L-14
Curry L-93
Smith L-11
Williams L-17
Input:
Output:
CUSTOMER_NAME
Johnson
Smith
Hayes
Turner
Jones
Lindsay
Jackson
Curry
Williams
Mayes
4. Set Intersection:
o Suppose there are two tables R and S. The set intersection operation contains all tuples that are
in both R & S.
o It is denoted by intersection ∩.
1. Notation: R ∩ S
Input:
Output:
CUSTOMER_NAME
Smith
Jones
5. Set Difference:
o Suppose there are two tuples R and S. The set difference operation contains all tuples that are
in R but not in S.
o It is denoted by intersection minus (-).
1. Notation: R - S
Input:
Output:
CUSTOMER_NAME
Jackson
Hayes
Willians
Curry
6. Cartesian product
o The Cartesian product is used to combine each row in one table with each row in the other table.
It is also known as a cross product.
o It is denoted by X.
1. Notation: E X D
Example:
EMPLOYEE
1 Smith A
2 Harry C
3 John B
DEPARTMENT
DEPT_NO DEPT_NAME
A Marketing
B Sales
C Legal
Input:
1. EMPLOYEE X DEPARTMENT
Output:
1 Smith A A Marketing
1 Smith A B Sales
1 Smith A C Legal
2 Harry C A Marketing
2 Harry C B Sales
2 Harry C C Legal
3 John B A Marketing
3 John B B Sales
3 John B C Legal
7. Rename Operation:
The rename operation is used to rename the output relation. It is denoted by rho (ρ).
Example: We can use the rename operator to rename STUDENT relation to STUDENT1.
1. ρ(STUDENT1, STUDENT)
Relational Calculus
In contrast to Relational Algebra, Relational Calculus is a non-procedural query language, that is, it
tells what to do but never explains how to do it.
Notation − {T | Condition}
For example −
Output − Returns tuples with 'name' from Author who has written article on 'database'.
TRC can be quantified. We can use Existential (∃) and Universal Quantifiers (∀).
For example −
Output − The above query will yield the same result as the previous one.
In DRC, the filtering variable uses the domain of attributes instead of entire tuple values (as done in
TRC, mentioned above).
Notation −
{ a1, a2, a3, ..., an | P (a1, a2, a3, ... ,an)}
Where a1, a2 are attributes and P stands for formulae built by inner attributes.
For example −
Output − Yields Article, Page, and Subject from the relation TutorialsPoint, where subject is database.
Just like TRC, DRC can also be written using existential and universal quantifiers. DRC also involves
relational operators.
The expression power of Tuple Relation Calculus and Domain Relation Calculus is equivalent to
Relational Algebra.
Functional Dependency
The functional dependency is a relationship that exists between two attributes. It typically exists
between the primary key and non-key attribute within a table.
1. X → Y
The left side of FD is known as a determinant, the right side of the production is known as a dependent.
For example:
Here Emp_Id attribute can uniquely identify the Emp_Name attribute of employee table because if we
know the Emp_Id, we can tell that employee name associated with it.
1. Emp_Id → Emp_Name
Example:
Example:
1. ID → Name,
2. Name → DOB
Normalization
What is Normalization?
o Normalization is the process of organizing the data in the database.
o Normalization is used to minimize the redundancy from a relation or set of relations. It is also
used to eliminate undesirable characteristics like Insertion, Update, and Deletion Anomalies.
o Normalization divides the larger table into smaller and links them using relationships.
o The normal form is used to reduce redundancy from the database table.
The main reason for normalizing the relations is removing these anomalies. Failure to eliminate
anomalies leads to data redundancy and can cause data integrity and other problems as the database
grows. Normalization consists of a series of guidelines that helps to guide you in creating a good
database structure.
o Insertion Anomaly: Insertion Anomaly refers to when we cannot insert a new tuple into a
relationship due to lack of data.
o Deletion Anomaly: The delete anomaly refers to the situation where the deletion of data results
in the relation loss of some other important data.
o Updatation Anomaly: The update anomaly is when an update of a single data value requires
multiple rows of data to be updated.
Types of Normal Forms:
Normalization works through a series of stages called Normal forms. The normal forms apply to
individual relations. The relation is said to be in particular normal form if it satisfies constraints.
Normal Description
Form
2NF A relation will be in 2NF if it is in 1NF and all non-key attributes are fully functional
4NF A relation will be in 4NF if it is in Boyce Codd's normal form and has no multi-valued dependency.
5NF A relation is in 5NF. If it is in 4NF and does not contain any join dependency, joining should
be lossless.
Advantages of Normalization
o Normalization helps to minimize data redundancy.
o Greater overall database organization.
o Data consistency within the database.
o Much more flexible database design.
o Enforces the concept of relational integrity.
Disadvantages of Normalization
o You cannot start building the database before knowing what the user needs.
o The performance degrades when normalizing the relations to higher normal forms, i.e., 4NF,
5NF.
o It is very time-consuming and difficult to normalize relations of a higher degree.
o Careless decomposition may lead to a bad database design, leading to serious problems.
EMPLOYEE table:
14 John 7272826385, UP
9064738238
The decomposition of the EMPLOYEE table into 1NF has been shown below:
14 John 7272826385 UP
14 John 9064738238 UP
20 Harry 8574783832 Bihar
Example: Let's assume, a school can store the data of teachers and the subjects they teach. In a school,
a teacher can teach more than one subject.
TEACHER table
25 Chemistry 30
25 Biology 30
47 English 35
83 Math 38
83 Computer 38
To convert the given table into 2NF, we decompose it into two tables:Play V
TEACHER_DETAIL table:
TEACHER_ID TEACHER_AGE
25 30
47 35
83 38
TEACHER_SUBJECT table:
TEACHER_ID SUBJECT
25 Chemistry
25 Biology
47 English
83 Math
83 Computer
A relation is in third normal form if it holds at least one of the following conditions for every non-
trivial function dependency X → Y.
1. X is a super key.
2. Y is a prime attribute, i.e., each element of Y is part of some candidate key.
Example:
EMPLOYEE_DETAIL table:
Non-prime attributes: In the given table, all attributes except EMP_ID are non-prime.
That's why we need to move the EMP_CITY and EMP_STATE to the new
<EMPLOYEE_ZIP> table, with EMP_ZIP as a Primary key.
EMPLOYEE table:
EMPLOYEE_ZIP table:
201010 UP Noida
02228 US Boston
60007 US Chicago
06389 UK Norwich
462007 MP Bhopal
Example: Let's assume there is a company where employees work in more than one department.
EMPLOYEE table:
EMP_ID EMP_COUNTRY EMP_DEPT DEPT_TYPE EMP_DEPT_NO
1. EMP_ID → EMP_COUNTRY
2. EMP_DEPT → {DEPT_TYPE, EMP_DEPT_NO}
The table is not in BCNF because neither EMP_DEPT nor EMP_ID alone are keys.
To convert the given table into BCNF, we decompose it into three tables:
EMP_COUNTRY table:
EMP_ID EMP_COUNTRY
264 India
264 India
EMP_DEPT table:
EMP_DEPT_MAPPING table:
EMP_ID EMP_DEPT
D394 283
D394 300
D283 232
D283 549
Functional dependencies:
1. EMP_ID → EMP_COUNTRY
2. EMP_DEPT → {DEPT_TYPE, EMP_DEPT_NO}
Candidate keys:
Now, this is in BCNF because left side part of both the functional dependencies is a key.
Relational Decomposition
o When a relation in the relational model is not in appropriate normal form then the decomposition
of a relation is required.
o In a database, it breaks the table into multiple tables.
o If the relation has no proper decomposition, then it may lead to problems like loss of
information.
o Decomposition is used to eliminate some of the problems of bad design like anomalies,
inconsistencies, and redundancy.
Types of Decomposition
Lossless Decomposition
o If the information is not lost from the relation after decompositon, then the decomposition will
be lossless.
o The lossless decomposition guarantees that the join of relations will result in the same relation
as it was decomposed.
o The relation is said to be lossless decomposition if natural joins of all the decomposition give
the original relation.
Example:
EMPLOYEE_DEPARTMENT table:
EMP_ID EMP_NAME EMP_AGE EMP_CITY DEPT_ID DEPT_NAME
The above relation is decomposed into two relations EMPLOYEE and DEPARTMENT
EMPLOYEE table:Play
22 Denim 28 Mumbai
33 Alina 25 Delhi
46 Stephan 30 Bangalore
52 Katherine 36 Mumbai
60 Jack 40 Noida
DEPARTMENT table
827 22 Sales
438 33 Marketing
869 46 Finance
575 52 Production
678 60 Testing
Now, when these two relations are joined on the common column "EMP_ID", then the resultant relation
will look like:
Employee ⋈ Department
Dependency Preserving
o It is an important constraint of the database.
o In the dependency preservation, at least one decomposed table must satisfy every dependency.
o If a relation R is decomposed into relation R1 and R2, then the dependencies of R either must
be a part of R1 or R2 or must be derivable from the combination of functional dependencies of
R1 and R2.
o For example, suppose there is a relation R (A, B, C, D) with functional dependency set (A-
hy>BC). The relational R is decomposed into R1(ABC) and R2(AD) which is dependency
preserving because FD A->BC is a part of relation R1(ABC).
What is Transaction?
A set of logically related operations is known as a transaction. The main operations of a transaction
are:
Read(A): Read operations Read(A) or R(A) reads the value of A from the database and stores it in a
buffer in the main memory.
Write (A): Write operation Write(A) or W(A) writes the value back to the database from the buffer.
(Note: It doesn’t always need to write it to a database back it just writes the changes to buffer this is
the reason where dirty read comes into the picture)
Let us take a debit transaction from an account that consists of the following operations:
1. R(A);
2. A=A-1000;
3. W(A);
Assume A’s value before starting the transaction is 5000.
• The first operation reads the value of A from the database and stores it in a buffer.
• the Second operation will decrease its value by 1000. So buffer will contain 4000.
• the Third operation will write the value from the buffer to the database. So A’s final value will be
4000.
But it may also be possible that the transaction may fail after executing some of its operations. The
failure can be because of hardware, software or power, etc. For example, if the debit transaction
discussed above fails after executing operation 2, the value of A will remain 5000 in the database which
is not acceptable by the bank. To avoid this, Database has two important operations:
Commit: After all instructions of a transaction are successfully executed, the changes made by a
transaction are made permanent in the database.
Rollback: If a transaction is not able to execute all operations successfully, all the changes made by a
transaction are undone.
Properties of a transaction:
Atomicity: As a transaction is a set of logically related operations, either all of them should be
execute d or none. A debit transaction discussed above should either execute all three operations or
none. If the debit transaction fails after executing operations 1 and 2 then its new value of 4000 will
not be updated in the database which leads to inconsistency.
Consistency: If operations of debit and credit transactions on the same account are executed
concurrently, it may leave the database in an inconsistent state.
• For Example, with T1 (debit of Rs. 1000 from A) and T2 (credit of 500 to A) executing
concurrently, the database reaches an inconsistent state.
• Let us assume the Account balance of A is Rs. 5000. T1 reads A(5000) and stores the value in its
local buffer space. Then T2 reads A(5000) and also stores the value in its local buffer space.
• T1 performs A=A-1000 (5000-1000=4000) and 4000 is stored in T1 buffer space. Then T2
performs A=A+500 (5000+500=5500) and 5500 is stored in the T2 buffer space. T1 writes the
value from its buffer back to the database.
• A’s value is updated to 4000 in the database and then T2 writes the value from its buffer back to
the database. A’s value is updated to 5500 which shows that the effect of the debit transaction is
lost and the database has become inconsistent.
• To maintain consistency of the database, we need concurrency control protocols which will be
discussed in the next article. The operations of T1 and T2 with their buffers and database have
been shown in Table 1.
A=5000
R(A); A=5000 A=5000
W(A); A=5500
Table 1
Isolation: The result of a transaction should not be visible to others before the transaction is committed.
For example, let us assume that A’s balance is Rs. 5000 and T1 debits Rs. 1000 from A. A’s new
balance will be 4000. If T2 credits Rs. 500 to A’s new balance, A will become 4500, and after this T1
fails. Then we have to roll back T2 as well because it is using the value produced by T1. So transaction
results are not made visible to other transactions before it commits.
Durable: Once the database has committed a transaction, the changes made by the transaction should
be permanent. e.g.; If a person has credited $500000 to his account, the bank can’t say that the update
has been lost. To avoid this problem, multiple copies of the database are stored at different locations.
DBMS Concurrency Control
Concurrency Control is the management procedure that is required for controlling concurrent execution
of the operations that take place on a database.
But before knowing about concurrency control, we should know about concurrent execution.
In a database transaction, the two main operations are READ and WRITE operations. So, there is a
need to manage these two operations in the concurrent execution of the transactions as if these
operations are not performed in an interleaved manner, and the data may become inconsistent. So, the
following problems occur with the Concurrent Execution of the operations:
Problem 1: Lost Update Problems (W - W Conflict)
The problem occurs when two different database transactions perform the read/write operations on the
same database items in an interleaved manner (i.e., concurrent execution) that makes the values of the
items incorrect hence making the database inconsistent.
For example:
Consider the below diagram where two transactions TX and TY, are performed on the same
account A where the balance of account A is $300.
o At time t1, transaction TX reads the value of account A, i.e., $300 (only read).
o At time t2, transaction TX deducts $50 from account A that becomes $250 (only deducted and
not updated/write).
o Alternately, at time t3, transaction TY reads the value of account A that will be $300 only
because TX didn't update the value yet.
o At time t4, transaction TY adds $100 to account A that becomes $400 (only added but not
updated/write).
o At time t6, transaction TX writes the value of account A that will be updated as $250 only, as
TY didn't update the value yet.
o Similarly, at time t7, transaction TY writes the values of account A, so it will write as done at
time t4 that will be $400. It means the value written by TX is lost, i.e., $250 is lost.
The dirty read problem occurs when one transaction updates an item of the database, and somehow the
transaction fails, and before the data gets rollback, the updated database item is accessed by another
transaction. There comes the Read-Write Conflict between both transactions.
For example:
Consider two transactions TX and TY in the below diagram performing read/write operations on
account A where the available balance in account A is $300:
o At time t1, transaction TX reads the value of account A, i.e., $300.
o At time t2, transaction TX adds $50 to account A that becomes $350.
o At time t3, transaction TX writes the updated value in account A, i.e., $350.
o Then at time t4, transaction TY reads account A that will be read as $350.
o Then at time t5, transaction TX rollbacks due to server problem, and the value changes back to
$300 (as initially).
o But the value for account A remains $350 for transaction TY as committed, which is the dirty
read and therefore known as the Dirty Read Problem.
Also known as Inconsistent Retrievals Problem that occurs when in a transaction, two different values
are read for the same database item.
For example:
Consider two transactions, TX and TY, performing the read/write operations on account A,
having an available balance = $300. The diagram is shown below:
o At time t1, transaction TX reads the value from account A, i.e., $300.
o At time t2, transaction TY reads the value from account A, i.e., $300.
o At time t3, transaction TY updates the value of account A by adding $100 to the available
balance, and then it becomes $400.
o At time t4, transaction TY writes the updated value, i.e., $400.
o After that, at time t5, transaction TX reads the available value of account A, and that will be read
as $400.
o It means that within the same transaction TX, it reads two different values of account A, i.e., $
300 initially, and after updating made by transaction TY, it reads $400. It is an unrepeatable read
and is therefore known as the Unrepeatable read problem.
Thus, in order to maintain consistency in the database and avoid such problems that take place in
concurrent execution, management is needed, and that is where the concept of Concurrency Control
comes into role.
DBMS – Deadlock
In a multi-process system, deadlock is an unwanted situation that arises in a shared resource
environment, where a process indefinitely waits for a resource that is held by another process.
For example, assume a set of transactions {T0, T1, T2, ...,Tn}. T0 needs a resource X to complete its
task. Resource X is held by T1, and T1 is waiting for a resource Y, which is held by T2. T2 is waiting
for resource Z, which is held by T0. Thus, all the processes wait for each other to release resources. In
this situation, none of the processes can finish their task. This situation is known as a deadlock.
Deadlocks are not healthy for a system. In case a system is stuck in a deadlock, the transactions involved
in the deadlock are either rolled back or restarted.
Deadlock Prevention
To prevent any deadlock situation in the system, the DBMS aggressively inspects all the operations,
where transactions are about to execute. The DBMS inspects the operations and analyzes if they can
create a de adlock situation. If it finds that a deadlock situation might occur, then that transaction is
never allowed to be executed.
There are deadlock prevention schemes that use timestamp ordering mechanism of transactions in order
to predetermine a deadlock situation.
Wait-Die Scheme
This scheme allows the older transaction to wait but kills the younger one.
Wound-Wait Scheme
This scheme, allows the younger transaction to wait; but when an older transaction requests an item
held by a younger one, the older transaction forces the younger one to abort and release the item.
In the Redundant Array of Independent Disks technology, two or more secondary storage devices are
connected so that the devices operate as one storage medium. A RAID array consists of several disks
linked together for a variety of purposes.
• RAID 0: At this level, disks are organized in a striped array. Blocks of data are divided into
disks and distributed over disks. Parallel writing and reading of data occur on each disk. This
improves performance and speed. Level 0 does not support parity and backup.
Raid-0
• RAID 1: Mirroring is used in RAID 1. A RAID controller copies data across all disks in an
array when data is sent to it. In case of failure, RAID level 1 provides 100% redundancy.
Raid-1
• RAID 2: The data in RAID 2 is striped on different disks, and the Error Correction Code is
recorded using Hamming distance. Similarly to level 0, each bit within a word is stored on a
separate disk, and ECC codes for the data words are saved on a separate set of disks. As a result
of its complex structure and high cost, RAID 2 cannot be commercially deployed.
Raid-2
• RAID 3: Data is striped across multiple disks in RAID 3. Data words are parsed to generate a
parity bit. It is stored on a different disk. Thus, single-disk failures can be avoided.
Raid-3
• RAID 4: This level involves writing an entire block of data onto data disks, and then generating
the parity and storing it somewhere else. At level 3, bytes are striped, while at level 4, blocks are
striped. Both levels 3 and 4 require a minimum of three disks.
Raid-4
• RAID 5: The data blocks in RAID 5 are written to different disks, but the parity bits are spread
out across all the data disks rather than being stored on a separate disk.
Raid-5
• RAID 6: The RAID 6 level extends the level 5 concept. A pair of independent parities are
generated and stored on multiple disks at this level. A pair of independent parities are generated
and stored on multiple disks at this level. Ideally, you need four disk drives for this level.
Raid-6
Record: The term record in DBMS refers to a collection of items or data organized within a table
within a set of fields related to a particular topic or theme. As an example, student records are kept
based on their grades and scores in college departments.
File
A file is named a collection of related information that is recorded on secondary storage such
as magnetic disks, magnetic tapes, and optical disks.
• It helps in the faster selection of records i.e. it makes the process faster.
• Different Operations like inserting, deleting, and updating different records are faster and easier.
• It prevents us from inserting duplicate records via various operations.
• It helps in storing the records or the data very efficiently at a minimal cost.
• Description: An index is created to map keys to record locations, like a book's index.
• Advantages: Fast retrieval through index traversal.
• Disadvantages: Extra storage required for the index.