Hexaware Dbms
Hexaware Dbms
All modern database management systems like SQL, MS SQL Server, IBM DB2,
ORACLE, My-SQL and Microsoft Access are based on RDBMS.
How it works
Data is represented in terms of tuples (rows) in RDBMS.
Due to a collection of organized set of tables, data can be accessed easily in RDBMS.
What is table
The RDBMS database uses tables to store data. A table is a collection of related data
entries and contains rows and columns to store data.
1 Ajeet 24 B.Tech
2 aryan 20 C.A
3 Mahesh 21 BCA
4 Ratan 22 MCA
5 Vimal 26 BSC
What is field
Field is a smaller entity of the table which contains specific information about every
record in the table. In the above example, the field in the student table consist of id,
name, age, course.
1 Ajeet 24 B.Tech
What is column
A column is a vertical entity in the table which contains all information associated
with a specific field in a table. For example: "name" is a column in the above table
which contains all information about student's name.
Ajeet
Aryan
Mahesh
Ratan
Vimal
NULL Values
The NULL value of the table specifies that the field has been left blank during record
creation. It is totally different from the value filled with zero or a field that contains
space.
Data Integrity
There are the following categories of data integrity exist with each RDBMS:
Domain integrity: It enforces valid entries for a given column by restricting the type,
the format, or the range of values.
Referential integrity: It specifies that rows cannot be deleted, which are used by
other records.
No DBMS RDBMS
.
2) In DBMS, data is generally stored in In RDBMS, the tables have an identifier called primary
either a hierarchical form or a key and the data values are stored in the form of
navigational form. tables.
5) DBMS uses file system to store data, in RDBMS, data values are stored in the form of tables,
so there will be no relation between so a relationship between these data values will be
the tables. stored in the form of a table as well.
6) DBMS has to provide some uniform RDBMS system supports a tabular structure of the data
methods to access the stored and a relationship between them to access the stored
information. information.
The main differences between DBMS and RDBMS are given below:
After observing the differences between DBMS and RDBMS, you can say that RDBMS
is an extension of DBMS. There are many software products in the market today who
are compatible for both DBMS and RDBMS. Means today a RDBMS application is
DBMS application and vice-versa.
Some fields are duplicated in more than one file, which leads to data redundancy. So
to overcome this problem, we need to create a centralized system, i.e. DBMS
approach.
DBMS:
A database approach is a well-organized collection of data that are related in a
meaningful way which can be accessed by different users but stored only once in a
system. The various operations performed by the DBMS system are: Insertion,
deletion, selection, sorting etc.
There are the following differences between DBMS and File systems:
Sharing of data Due to the centralized approach, data Data is distributed in many files, an
sharing is easy. may be of different formats, so it isn't
to share data.
Data Abstraction DBMS gives an abstract view of data that The file system provides the detail o
hides the details. data representation and storage of da
Security and DBMS provides a good protection It isn't easy to protect a file under th
Protection mechanism. system.
Recovery DBMS provides a crash recovery The file system doesn't have a c
Mechanism mechanism, i.e., DBMS protects the user mechanism, i.e., if the system crashes w
from system failure. entering some data, then the conten
the file will be lost.
Manipulation DBMS contains a wide variety of The file system can't efficiently store
Techniques sophisticated techniques to store and retrieve the data.
retrieve the data.
Concurrency DBMS takes care of Concurrent access of In the File system, concurrent access
Problems data using some form of locking. many problems like redirecting the
while deleting some information
updating some information.
Where to use Database approach used in large systems File system approach used in large sys
which interrelate many files. which interrelate many files.
Cost The database system is expensive to design. The file system approach is cheape
design.
Data Redundancy Due to the centralization of the database, In this, the files and application prog
and Inconsistency the problems of data redundancy and are created by different programmer
inconsistency are controlled. that there exists a lot of duplicatio
data which may lead to inconsistency.
Structure The database structure is complex to The file system approach has a si
design. structure.
Data In this system, Data Independence exists, In the File system approach, there exis
Independence and it can be of two types. Data Independence.
Integrity Integrity Constraints are easy to apply. Integrity Constraints are difficult
Constraints implement in file system.
Data Models In the database approach, 3 types of data In the file system approach, there i
models exist: concept of data models exists.
Flexibility Changes are often a necessity to the The flexibility of the system is les
content of the data stored in any system, compared to the DBMS approach.
and these changes are more easily with a
database approach.
1. Internal Level
o The internal level has an internal schema which describes the physical storage
structure of the database.
o The internal schema is also known as a physical schema.
o It uses the physical data model. It is used to define that how the data will be stored in
a block.
o The physical level is used to describe complex low-level data structures in detail.
10 Sec
2. Conceptual Level
o The conceptual schema describes the design of a database at the conceptual level.
Conceptual level is also known as logical level.
o The conceptual schema describes the structure of the whole database.
o The conceptual level describes what data are to be stored in the database and also
describes what relationship exists among those data.
o In the conceptual level, internal details such as an implementation of the data
structure are hidden.
o Programmers and database administrators work at this level.
3. External Level
o At the external level, a database contains several schemas that sometimes called as
subschema. The subschema is used to describe the different view of the database.
o An external schema is also known as view schema.
o Each view schema describes the database part that a particular user group is
interested and hides the remaining database from that user group.
o The view schema describes the end user interaction with database systems.
The Conceptual/ Internal Mapping lies between the conceptual level and the internal
level. Its role is to define the correspondence between the records and fields of the
conceptual level and files and data structures of the internal level.
The external/Conceptual Mapping lies between the external level and the Conceptual
level. Its role is to define the correspondence betweena particular external and the
conceptual view.
Data Models
Data Model is the modeling of the data description, data semantics, and consistency
constraints of the data. It provides the conceptual tools for describing the design of a
database at each level of data abstraction. Therefore, there are following four data
models used for understanding the structure of the database:
1) Relational Data Model: This type of model designs the data in the form of rows
and columns within a table. Thus, a relational model uses tables for representing data
and in-between relationships. Tables are also called relations. This model was initially
described by Edgar F. Codd, in 1969. The relational data model is the widely used
model which is primarily used by commercial data processing applications.
4) Semistructured Data Model: This type of data model is different from the other three
data models (explained above). The semistructured data model allows the data specifications
at places where the individual data items of the same type may have different attributes sets.
The Extensible Markup Language, also known as XML, is widely used for representing the
semistructured data. Although XML was initially designed for including the markup
information to the text document, it gains importance because of its application in the
exchange of data.
A schema diagram can display only some aspects of a schema like the name of
record type, data type, and constraints. Other aspects can't be specified through the
schema diagram. For example, the given figure neither show the data type of each
data item nor the relationship among various files.
In the database, actual data changes quite frequently. For example, in the given
figure, the database changes whenever we add a new grade or add a student. The
data at a particular moment of time is called the instance of the database.
Data Independence
o Data independence can be explained using the three-schema architecture.
o Data independence refers characteristic of being able to modify the schema at
one level of the database system without altering the schema at the next
higher level.
Database Language
o A DBMS has appropriate languages and interfaces to express database queries and
updates.
o Database languages can be used to read, store and update the data in the database.
Types of Database Language
These commands are used to update the database schema that's why they come
under Data definition language.
2. Data Manipulation Language
DML stands for Data Manipulation Language. It is used for accessing and
manipulating data in a database. It handles user requests.
(But in Oracle database, the execution of data control language does not have
the feature of rolling back.)
There are the following operations which have the authorization of Revoke:
In this section, we will learn and understand about the ACID properties. We will learn
what these properties stand for and what does each property is used for. We will also
understand the ACID properties with the help of some examples.
ACID Properties
The expansion of the term ACID defines for:
1) Atomicity: The term atomicity defines that the data remains atomic. It means if
any operation is performed on the data, either it should be performed or executed
completely or should not be executed at all. It further means that the operation
should not break in between or execute partially. In the case of executing operations
on the transaction, the operation should be completely executed and not partially.
Example: If Remo has account A having $30 in his account from which he wishes to
send $10 to Sheero's account, which is B. In account B, a sum of $ 100 is already
present. When $10 will be transferred to account B, the sum will become $110. Now,
there will be two operations that will take place. One is the amount of $10 that Remo
wants to transfer will be debited from his account A, and the same amount will get
credited to account B, i.e., into Sheero's account. Now, what happens - the first
operation of debit executes successfully, but the credit operation, however, fails.
Thus, in Remo's account A, the value becomes $20, and to that of Sheero's account, it
remains $100 as it was previously present.
In the above diagram, it can be seen that after crediting $10, the amount is still $100
in account B. So, it is not an atomic transaction.
The below image shows that both debit and credit operations are done successfully.
Thus the transaction is atomic.
Thus, when the amount loses atomicity, then in the bank systems, this becomes a
huge issue, and so the atomicity is the main focus in the bank systems.
2) Consistency: The word consistency means that the value should remain
preserved always. In DBMS
, the integrity of the data should be maintained, which means if a change in the database is
made, it should remain preserved always. In the case of transactions, the integrity of the data
is very essential so that the database remains consistent before and after the transaction. The
data should always be correct.
Example:
In the above figure, there are three accounts, A, B, and C, where A is making a
transaction T one by one to both B & C. There are two operations that take place, i.e.,
Debit and Credit. Account A firstly debits $50 to account B, and the amount in
account A is read $300 by B before the transaction. After the successful transaction T,
the available amount in B becomes $150. Now, A debits $20 to account C, and that
time, the value read by C is $250 (that is correct as a debit of $50 has been
successfully done to B). The debit and credit operation from account A to C has been
done successfully. We can see that the transaction is done successfully, and the value
is also read correctly. Thus, the data is consistent. In case the value read by B and C is
$300, which means that data is inconsistent because when the debit operation
executes, it will not be consistent.
Example: If two operations are concurrently running on two different accounts, then
the value of both accounts should not get affected. The value should remain
persistent. As you can see in the below diagram, account A is making T1 and T2
transactions to account B and C, but both are executing independently without
affecting each other. It is known as Isolation.
Therefore, the ACID property of DBMS plays a vital role in maintaining the
consistency and availability of data in the database.
1. Entity:
An entity may be any object, class, person or place. In the ER diagram, an entity can
be represented as rectangles.
10 Sec
An entity that depends on another entity called a weak entity. The weak entity
doesn't contain any key attribute of its own. The weak entity is represented by a
double rectangle.
2. Attribute
The attribute is used to describe the property of an entity. Eclipse is used to
represent an attribute.
For example, id, age, contact number, name, etc. can be attributes of a student.
a. Key Attribute
b. Composite Attribute
An attribute can have more than one value. These attributes are known as a
multivalued attribute. The double oval is used to represent multivalued attribute.
For example, a student can have more than one phone number.
d. Derived Attribute
An attribute that can be derived from other attribute is known as a derived attribute.
It can be represented by a dashed ellipse.
For example, A person's age changes over time and can be derived from another
attribute like Date of birth.
3. Relationship
A relationship is used to describe the relation between entities. Diamond or rhombus
is used to represent the relationship.
Types of relationship are as follows:
a. One-to-One Relationship
When only one instance of an entity is associated with the relationship, then it is
known as one to one relationship.
For example, A female can marry to one male, and a male can marry to one female.
b. One-to-many relationship
When only one instance of the entity on the left, and more than one instance of an
entity on the right associates with the relationship then this is known as a one-to-
many relationship.
For example, Scientist can invent many inventions, but the invention is done by the
only specific scientist.
c.
Many-to-one relationship
When more than one instance of the entity on the left, and only one instance of an
entity on the right associates with the relationship then it is known as a many-to-one
relationship.
For example, Student enrolls for only one course, but a course can have many
students.
d. Many-to-many relationship
When more than one instance of the entity on the left, and more than one instance
of an entity on the right associates with the relationship then it is known as a many-
to-many relationship.
For example, Employee can assign by many projects and project can have many
employees.
Notation of ER diagram
Database can be represented using the notations. In ER diagram, many notations are
used to express the cardinality. These notations are as follows:
Fig: Notations of ER diagram
Mapping Constraints
o A mapping constraint is a data constraint that expresses the number of
entities to which another entity can be related via a relationship set.
o It is most useful in describing the relationship sets that involve more than two
entity sets.
o For binary relationship set R on an entity set A and B, there are four possible
mapping cardinalities. These are as follows:
1. One to one (1:1)
2. One to many (1:M)
3. Many to one (M:1)
4. Many to many (M:M)
One-to-one
In one-to-one mapping, an entity in E1 is associated with at most one entity in E2, and an
entity in E2 is associated with at most one entity in E1.
One-to-many
In one-to-many mapping, an entity in E1 is associated with any number of entities in E2, and
an entity in E2 is associated with at most one entity in E1.
Many-to-one
In one-to-many mapping, an entity in E1 is associated with at most one entity in E2, and an
entity in E2 is associated with any number of entities in E1.
Many-to-many
In many-to-many mapping, an entity in E1 is associated with any number of entities
in E2, and an entity in E2 is associated with any number of entities in E1.
Keys
o Keys play an important role in the relational database.
o It is used to uniquely identify any record or row of data from the table. It is also used
to establish and identify relationships between tables.
For example: In Student table, ID is used as a key because it is unique for each
student. In PERSON table, passport_number, license_number, SSN are keys since they
are unique for each person.
Types of key:
1. Primary key
o It is the first key which is used to identify one and only one instance of an entity
uniquely. An entity can contain multiple keys as we saw in PERSON table. The key
which is most suitable from those lists become a primary key.
o In the EMPLOYEE table, ID can be primary key since it is unique for each employee. In
the EMPLOYEE table, we can even select License_Number and Passport_Number as
primary key since they are also unique.
o For each entity, selection of the primary key is based on requirement and developers.
2. Candidate key
o A candidate key is an attribute or set of an attribute which can uniquely identify a
tuple.
o The remaining attributes except for primary key are considered as a candidate key.
The candidate keys are as strong as the primary key.
For example: In the EMPLOYEE table, id is best suited for the primary key. Rest of
the attributes like SSN, Passport_Number, and License_Number, etc. are considered
as a candidate key.
3. Super Key
Super key is a set of an attribute which can uniquely identify a tuple. Super key is a
superset of a candidate key.
4. Foreign key
o Foreign keys are the column of the table which is used to point to the primary key of
another table.
o In a company, every employee works in a specific department, and employee and
department are two different entities. So we can't store the information of the
department in the employee table. That's why we link these two tables through the
primary key of one table.
o We add the primary key of the DEPARTMENT table, Department_Id as a new attribute
in the EMPLOYEE table.
o Now in the EMPLOYEE table, Department_Id is the foreign key, and both the tables
are related.
Generalization
o Generalization is like a bottom-up approach in which two or more entities of
lower level combine to form a higher level entity if they have some attributes
in common.
o In generalization, an entity of a higher level can also combine with the entities
of the lower level to form a further higher level entity.
o Generalization is more like subclass and superclass system, but the only
difference is the approach. Generalization uses the bottom-up approach.
o In generalization, entities are combined to form a more generalized entity, i.e.,
subclasses are combined to make a superclass.
For example, Faculty and Student entities can be generalized and create a higher
level entity Person.
Next →← Prev
Specialization
o Specialization is a top-down approach, and it is opposite to Generalization. In
specialization, one higher level entity can be broken down into two lower level
entities.
o Specialization is used to identify the subset of an entity set that shares some
distinguishing characteristics.
o Normally, the superclass is defined first, the subclass and its related attributes
are defined next, and relationship set are then added.
For example: Center entity offers the Course entity act as a single entity in the
relationship which is in a relationship with another entity visitor. In the real world, if a
visitor visits a coaching center then he will never enquiry about the Course only or
just about the Center instead he will ask the enquiry about both.
Reduction of ER diagram to Table
The database can be represented using the notations, and these notations can be
reduced to a collection of tables.
In the database, every entity set or relationship set can be represented in tabular
form.
The ER diagram is given below:
There are some points for converting the ER diagram to the table:
10 Sec
SQL CREATE TABLE
In the given ER diagram, LECTURE, STUDENT, SUBJECT and COURSE forms individual
tables.
In the STUDENT table, Age is the derived attribute. It can be calculated at any point
of time by calculating the difference between current date and Date of Birth.
Using these rules, you can convert the ER diagram to tables and columns and assign
the mapping between the tables. Table structure for the given ER diagram is as
below:
1. One-to-one (1:1)
2. One-to-many (1:M)
3. Many-to-many (M:N)
1. One-to-one
o In a one-to-one relationship, one occurrence of an entity relates to only one
occurrence in another entity.
o A one-to-one relationship rarely exists in practice.
o For example: if an employee is allocated a company car then that car can only be
driven by that employee.
o Therefore, employee and company car have a one-to-one relationship.
2. One-to-many
o In a one-to-many relationship, one occurrence in an entity relates to many
occurrences in another entity.
o For example: An employee works in one department, but a department has many
employees.
o Therefore, department and employee have a one-to-many relationship.
3. Many-to-many
o In a many-to-many relationship, many occurrences in an entity relate to many
occurrences in another entity.
o Same as a one-to-one relationship, the many-to-many relationship rarely exists in
practice.
o For example: At the same time, an employee can work on several projects, and a
project has a team of many employees.
o Therefore, employee and project have a many-to-many relationship.
Attribute: It contains the name of a column in a particular table. Each attribute Ai must have
a domain, dom(Ai)
Relational schema: A relational schema contains the name of the relation and name
of all columns or attributes.
Relational key: In the relational key, each row has one or more attributes. It can
identify the row in the relation uniquely.
o In the given table, NAME, ROLL_NO, PHONE_NO, ADDRESS, and AGE are the
attributes.
o The instance of schema STUDENT has 5 tuples.
o t3 = <Laxman, 33289, 8583287182, Gurugram, 20>
Properties of Relations
o Name of the relation is distinct from all other relations.
o Each relation cell contains exactly one atomic (single) value
o Each attribute contains a distinct name
o Attribute domain has no significance
o tuple has no duplicate value
o Order of tuple can have a different sequence
Relational Algebra
Relational algebra is a procedural query language. It gives a step by step process to
obtain the result of the query. It uses operators to perform queries.
1. Notation: σ p(r)
Where:
Input:
1. σ BRANCH_NAME="perryride" (LOAN)
Output:
2. Project Operation:
o This operation shows the list of those attributes that we wish to appear in the result.
Rest of the attributes are eliminated from the table.
o It is denoted by ∏.
1. Notation: ∏ A1, A2, An (r)
Where
Input:
1. ∏ NAME, CITY (CUSTOMER)
Output:
NAME CITY
Jones Harrison
Smith Rye
Hays Harrison
Curry Rye
Johnson Brooklyn
Brooks Brooklyn
3. Union Operation:
o Suppose there are two tuples R and S. The union operation contains all the tuples
that are either in R or S or both in R & S.
o It eliminates the duplicate tuples. It is denoted by ∪.
1. Notation: R ∪ S
Example:
DEPOSITOR RELATION
CUSTOMER_NAME ACCOUNT_NO
Johnson A-101
Smith A-121
Mayes A-321
Turner A-176
Johnson A-273
Jones A-472
Lindsay A-284
BORROW RELATION
CUSTOMER_NAME LOAN_NO
Jones L-17
Smith L-23
Hayes L-15
Jackson L-14
Curry L-93
Smith L-11
Williams L-17
Input:
1. ∏ CUSTOMER_NAME (BORROW) ∪ ∏ CUSTOMER_NAME (DEPOSITOR)
Output:
CUSTOMER_NAME
Johnson
Smith
Hayes
Turner
Jones
Lindsay
Jackson
Curry
Williams
Mayes
4. Set Intersection:
o Suppose there are two tuples R and S. The set intersection operation contains all
tuples that are in both R & S.
o It is denoted by intersection ∩.
1. Notation: R ∩ S
nput:
1. ∏ CUSTOMER_NAME (BORROW) ∩ ∏ CUSTOMER_NAME (DEPOSITOR)
Output:
CUSTOMER_NAME
Smith
Jones
5. Set Difference:
o Suppose there are two tuples R and S. The set intersection operation contains all
tuples that are in R but not in S.
o It is denoted by intersection minus (-).
1. Notation: R - S
Input:
1. ∏ CUSTOMER_NAME (BORROW) - ∏ CUSTOMER_NAME (DEPOSITOR)
Output:
CUSTOMER_NAME
Jackson
Hayes
Willians
Curry
6. Cartesian product
o The Cartesian product is used to combine each row in one table with each row in the
other table. It is also known as a cross product.
o It is denoted by X.
1. Notation: E X D
Example:
EMPLOYEE
1 Smith A
2 Harry C
3 John B
DEPARTMENT
DEPT_NO DEPT_NAME
A Marketing
B Sales
C Legal
Input:
1. EMPLOYEE X DEPARTMENT
Output:
1 Smith A A Marketing
1 Smith A B Sales
1 Smith A C Legal
2 Harry C A Marketing
2 Harry C B Sales
2 Harry C C Legal
3 John B A Marketing
3 John B B Sales
3 John B C Legal
7. Rename Operation:
The rename operation is used to rename the output relation. It is denoted by rho (ρ).
1. ρ(STUDENT1, STUDENT)
Join Operations:
A Join operation combines related tuples from different relations, if and only if a
given join condition is satisfied. It is denoted by ⋈.
Example:
EMPLOYEE
EMP_CODE EMP_NAME
101 Stephan
102 Jack
103 Harry
SALARY
EMP_CODE SALARY
101 50000
102 30000
103 25000
1. Operation: (EMPLOYEE ⋈ SALARY)
Result:
Input:
1. ∏EMP_NAME, SALARY (EMPLOYEE ⋈ SALARY)
Output:
EMP_NAME SALARY
Stephan 50000
Jack 30000
Harry 25000
2. Outer Join:
The outer join operation is an extension of the join operation. It is used to deal with
missing information.
Example:
EMPLOYEE
FACT_WORKERS
Input:
1. (EMPLOYEE ⋈ FACT_WORKERS)
Output:
Input:
1. EMPLOYEE ⟕ FACT_WORKERS
Input:
1. EMPLOYEE ⟖ FACT_WORKERS
Output:
Input:
1. EMPLOYEE ⟗ FACT_WORKERS
Output:
3. Equi join:
It is also known as an inner join. It is the most common join. It is based on matched
data as per the equality condition. The equi join uses the comparison operator(=).
Example:
CUSTOMER RELATION
CLASS_ID NAME
1 John
2 Harry
3 Jackson
PRODUCT
PRODUCT_ID CITY
1 Delhi
2 Mumbai
3 Noida
Input:
1. CUSTOMER ⋈ PRODUCT
Output:
1 John 1 Delhi
2 Harry 2 Mumbai
3 Harry 3 Noida
Integrity Constraints
o Integrity constraints are a set of rules. It is used to maintain the quality of
information.
o Integrity constraints ensure that the data insertion, updating, and other processes
have to be performed in such a way that data integrity is not affected.
o Thus, integrity constraint is used to guard against accidental damage to the database.
Example:
Example:
4. Key constraints
o Keys are the entity set that is used to identify an entity within its entity set uniquely.
o An entity set can have multiple keys, but out of which one key will be the primary key.
A primary key can contain a unique and null value in the relational table.
Example:
Next →← Prev
Relational Calculus
o Relational calculus is a non-procedural query language. In the non-procedural query
language, the user is concerned with the details of how to obtain the end results.
o The relational calculus tells what to do but never explains how to do.
Notation:
1. {T | P (T)} or {T | Condition (T)}
Where
For example:
1. { T.name | Author(T) AND T.article = 'database' }
OUTPUT: This query selects the tuples from the AUTHOR relation. It returns a tuple
with 'name' from Author who has written an article on 'database'.
TRC (tuple relation calculus) can be quantified. In TRC, we can use Existential (∃) and
Universal Quantifiers (∀).
For example:
1. { R| ∃T ∈ Authors(T.article='database' AND R.name=T.name)}
Output: This query will yield the same result as the previous one.
Notation:
1. { a1, a2, a3, ..., an | P (a1, a2, a3, ... ,an)}
Where
a1,a2 areattributes
P stands for formula built by inner attributes
For example:
1. {< article, page, subject > | ∈ javatpoint ∧ subject = 'database'}
Output: This query will yield the article, page, and subject from the relational
javatpoint, where the subject is a database.
SQL
o SQL stands for Structured Query Language. It is used for storing and managing data
in relational database management system (RDMS).
o It is a standard language for Relational Database System. It enables a user to create,
read, update and delete relational databases and tables.
o All the RDBMS like MySQL, Informix, Oracle, MS Access and SQL Server use SQL as
their standard database language.
o SQL allows users to query the database in a number of ways, using English-like
statements.
Rules:
SQL follows the following rules:
o Structure query language is not case sensitive. Generally, keywords of SQL are written
in uppercase.
o Statements of SQL are dependent on text lines. We can use a single SQL statement
on one or multiple text line.
o Using the SQL statements, you can perform most of the actions in a database.
o SQL depends on tuple relational calculus and relational algebra.
SQL process:
o When an SQL command is executing for any RDBMS, then the system figure out the
best way to carry out the request and the SQL engine determines that how to
interpret the task.
o In the process, various components are included. These components can be
optimization Engine, Query engine, Query dispatcher, classic, etc.
o All the non-SQL queries are handled by the classic query engine, but SQL query
engine won't handle logical files.
Characteristics of SQL
o SQL is easy to learn.
o SQL is used to access data from relational database management systems.
o SQL can execute queries against the database.
o SQL is used to describe the data.
o SQL is used to define the data in the database and manipulate it when
needed.
o SQL is used to create and drop the database and table.
o SQL is used to create a view, stored procedure, function in a database.
o SQL allows users to set permissions on tables, procedures, and views.
Normalization:
Functional Dependency
The functional dependency is a relationship that exists between two attributes. It
typically exists between the primary key and non-key attribute within a table.
1. X → Y
The left side of FD is known as a determinant, the right side of the production is
known as a dependent.
For example:
Here Emp_Id attribute can uniquely identify the Emp_Name attribute of employee
table because if we know the Emp_Id, we can tell that employee name associated
with it.
1. Emp_Id → Emp_Name
Example:
1. Consider a table with two columns Employee_Id and Employee_Name.
2. {Employee_id, Employee_Name} → Employee_Id is a trivial functional dependency
as
3. Employee_Id is a subset of {Employee_Id, Employee_Name}.
4. Also, Employee_Id → Employee_Id and Employee_Name → Employee_Name are tri
vial dependencies too.
2. Non-trivial functional dependency
o A → B has a non-trivial functional dependency if B is not a subset of A.
o When A intersection B is NULL, then A → B is called as complete non-trivial.
Example:
1. ID → Name,
2. Name → DOB
1. If X ⊇ Y then X → Y
Example:
1. X = {a, b, c, d, e}
2. Y = {a, b, c}
1. If X → Y then XZ → YZ
Example:
1. For R(ABCD), if A → B then AC → BC
1. If X → Y and Y → Z then X → Z
1. If X → Y and X → Z then X → YZ
Proof:
1. X → Y (given)
2. X → Z (given)
3. X → XY (using IR2 on 1 by augmentation with X. Where XX = X)
4. XY → YZ (using IR2 on 2 by augmentation with Y)
5. X → YZ (using IR3 on 3 and 4)
1. If X → YZ then X → Y and X → Z
Proof:
1. X → YZ (given)
2. YZ → Y (using IR1 Rule)
3. X → Y (using IR3 on 1 and 2)
1. If X → Y and YZ → W then XZ → W
Proof:
1. X → Y (given)
2. WY → Z (given)
3. WX → WY (using IR2 on 1 by augmenting with W)
4. WX → Z (using IR3 on 3 and 2)
Normalization
o Normalization is the process of organizing the data in the database.
o Normalization is used to minimize the redundancy from a relation or set of
relations. It is also used to eliminate the undesirable characteristics like
Insertion, Update and Deletion Anomalies.
o Normalization divides the larger table into the smaller table and links them
using relationship.
o The normal form is used to reduce redundancy from the database table.
Types of Normal Forms
There are the four types of normal forms:
Normal Description
Form
2NF A relation will be in 2NF if it is in 1NF and all non-key attributes are fully functional
dependent on the primary key.
4NF A relation will be in 4NF if it is in Boyce Codd normal form and has no multi-valued
dependency.
5NF A relation is in 5NF if it is in 4NF and not contains any join dependency and joining
should be lossless.
EMPLOYEE table:
14 John 7272826385, UP
9064738238
The decomposition of the EMPLOYEE table into 1NF has been shown below:
EMP_ID EMP_NAME EMP_PHONE EMP_STATE
14 John 7272826385 UP
14 John 9064738238 UP
Example: Let's assume, a school can store the data of teachers and the subjects they
teach. In a school, a teacher can teach more than one subject.
TEACHER table
25 Chemistry 30
25 Biology 30
47 English 35
83 Math 38
83 Computer 38
In the given table, non-prime attribute TEACHER_AGE is dependent on TEACHER_ID
which is a proper subset of a candidate key. That's why it violates the rule for 2NF.
To convert the given table into 2NF, we decompose it into two tables:
TEACHER_DETAIL table:
TEACHER_ID TEACHER_AGE
25 30
47 35
83 38
TEACHER_ID SUBJECT
25 Chemistry
25 Biology
47 English
83 Math
83 Computer
TEACHER_SUBJECT table:
A relation is in third normal form if it holds atleast one of the following conditions for
every non-trivial function dependency X → Y.
1. X is a super key.
2. Y is a prime attribute, i.e., each element of Y is part of some candidate key.
Example:
EMPLOYEE_DETAIL table:
1. {EMP_ID}, {EMP_ID, EMP_NAME}, {EMP_ID, EMP_NAME, EMP_ZIP}....so o
n
Candidate key: {EMP_ID}
Non-prime attributes: In the given table, all attributes except EMP_ID are
non-prime.
That's why we need to move the EMP_CITY and EMP_STATE to the new
<EMPLOYEE_ZIP> table, with EMP_ZIP as a Primary key.
EMPLOYEE table:
EMPLOYEE_ZIP table:
201010 UP Noida
02228 US Boston
60007 US Chicago
06389 UK Norwich
462007 MP Bhopal
EMPLOYEE table:
EMPLOYEE table:
1. EMP_ID → EMP_COUNTRY
2. EMP_DEPT → {DEPT_TYPE, EMP_DEPT_NO}
The table is not in BCNF because neither EMP_DEPT nor EMP_ID alone are keys.
To convert the given table into BCNF, we decompose it into three tables:
EMP_COUNTRY table:
EMP_ID EMP_COUNTRY
264 India
264 India
EMP_DEPT table:
EMP_DEPT_MAPPING table:
EMP_ID EMP_DEPT
D394 283
D394 300
D283 232
D283 549
Functional dependencies:
1. EMP_ID → EMP_COUNTRY
2. EMP_DEPT → {DEPT_TYPE, EMP_DEPT_NO}
Candidate keys:
Now, this is in BCNF because left side part of both the functional dependencies is a
key.
Example
STUDENT
STU_ID COURSE HOBBY
21 Computer Dancing
21 Math Singing
34 Chemistry Dancing
74 Biology Cricket
59 Physics Hockey
The given STUDENT table is in 3NF, but the COURSE and HOBBY are two
independent entity. Hence, there is no relationship between COURSE and HOBBY.
So to make the above table into 4NF, we can decompose it into two tables:
Keep Watching
STUDENT_COURSE
STU_ID COURSE
21 Computer
21 Math
34 Chemistry
74 Biology
59 Physics
STUDENT_HOBBY
STU_ID HOBBY
21 Dancing
21 Singing
34 Dancing
74 Cricket
59 Hockey
Example
SUBJECT LECTURER SEMESTER
In the above table, John takes both Computer and Math class for Semester 1 but he
doesn't take Math class for Semester 2. In this case, combination of all these fields
required to identify a valid data.
Suppose we add a new Semester as Semester 3 but do not know about the subject
and who will be taking that subject so we leave Lecturer and Subject as NULL. But all
three columns together acts as a primary key, so we can't leave other two columns
blank.
So to make the above table into 5NF, we can decompose it into three relations P1, P2
& P3:
P1
SEMESTER SUBJECT
Semester 1 Computer
Semester 1 Math
Semester 1 Chemistry
Semester 2 Math
P2
SUBJECT LECTURER
Computer Anshika
Computer John
Math John
Math Akash
Chemistry Praveen
P3
SEMSTER LECTURER
Semester 1 Anshika
Semester 1 John
Semester 1 John
Semester 2 Akash
Semester 1 Praveen
Relational Decomposition
o When a relation in the relational model is not in appropriate normal form then the
decomposition of a relation is required.
o In a database, it breaks the table into multiple tables.
o If the relation has no proper decomposition, then it may lead to problems like loss of
information.
o Decomposition is used to eliminate some of the problems of bad design like
anomalies, inconsistencies, and redundancy.
Types of Decomposition
Lossless Decomposition
o If the information is not lost from the relation that is decomposed, then the
decomposition will be lossless.
o The lossless decomposition guarantees that the join of relations will result in the
same relation as it was decomposed.
o The relation is said to be lossless decomposition if natural joins of all the
decomposition give the original relation.
Example:
EMPLOYEE_DEPARTMENT table:
The above relation is decomposed into two relations EMPLOYEE and DEPARTMENT
EMPLOYEE table:
22 Denim 28 Mumbai
33 Alina 25 Delhi
46 Stephan 30 Bangalore
52 Katherine 36 Mumbai
60 Jack 40 Noida
DEPARTMENT table
827 22 Sales
438 33 Marketing
869 46 Finance
575 52 Production
678 60 Testing
Now, when these two relations are joined on the common column "EMP_ID", then
the resultant relation will look like:
Employee ⋈ Department
Dependency Preserving
o It is an important constraint of the database.
o In the dependency preservation, at least one decomposed table must satisfy every
dependency.
o If a relation R is decomposed into relation R1 and R2, then the dependencies of R
either must be a part of R1 or R2 or must be derivable from the combination of
functional dependencies of R1 and R2.
o For example, suppose there is a relation R (A, B, C, D) with functional dependency set
(A->BC). The relational R is decomposed into R1(ABC) and R2(AD) which is
dependency preserving because FD A->BC is a part of relation R1(ABC).
Multivalued Dependency
o Multivalued dependency occurs when two attributes in a table are
independent of each other but, both depend on a third attribute.
o A multivalued dependency consists of at least two attributes that are
dependent on a third attribute that's why it always requires at least three
attributes.