Unit 1 ffgggg
Unit 1 ffgggg
RDBMS
RDBMS vs DBMS
No DBMS RDBMS
1) DBMS applications store data as files. RDBMS applications store data in a tabular form.
2) In DBMS, data is generally stored in either a In RDBMS, the tables have an identifier called
hierarchical form or a navigational form. primary key and the data values are stored in the
form of tables.
4) DBMS does not apply any security with RDBMS defines the integrity constraint for the
regards to data manipulation. purpose of ACID (Atomocity, Consistency, Isolation
and Durability) property.
5) DBMS uses file system to store data, so there in RDBMS, data values are stored in the form of
will be no relation between the tables. tables, so a relationship between these data values
will be stored in the form of a table as well.
6) DBMS has to provide some uniform methods RDBMS system supports a tabular structure of the
to access the stored information. data and a relationship between them to access the
stored information.
7) DBMS does not support distributed database. RDBMS supports distributed database.
8) DBMS is meant to be for small organization RDBMS is designed to handle large amount of data.
and deal with small data. it supports single it supports multiple users.
user.
9) Examples of DBMS are file systems, xml etc. Example of RDBMS are mysql, postgre, sql server,
oracle etc.
Advantages of DBMS
Disadvantages of DBMS
● High Cost and Extensive Hardware and Software Support: Huge costs and
setups are required to make these systems functional.
● Scalability: In case of addition of more data, servers along with additional power,
and memory are required.
● Complexity: Voluminous data creates complexity in understanding of relations
and may lower down the performance.
● Structured Limits: The fields or columns of a relational database system is
enclosed within various limits, which may lead to loss of data
DBMS Architecture
● The DBMS design depends upon its architecture. The basic client/server
architecture is used to deal with a large number of PCs, web servers, database
servers and other components that are connected with networks.
● The client/server architecture consists of many PCs and a workstation which are
connected via the network.
● DBMS architecture depends upon how users are connected to the database to
get their request done.
1 Tier Architecture
● In this architecture, the database is directly available to the user. It means the
user can directly sit on the DBMS and uses it.
● Any changes done here will directly be done on the database itself. It doesn't
provide a handy tool for end users.
● The 1-Tier architecture is used for development of the local application, where
programmers can directly communicate with the database for the quick
response.
● Simple Architecture: 1-Tier Architecture is the most simple architecture to set up,
as only a single machine is required to maintain it.
● Cost-Effective: No additional hardware is required for implementing 1-Tier
Architecture, which makes it cost-effective.
● Easy to Implement: 1-Tier Architecture can be easily deployed, and hence it is
mostly used in small projects.
2 Tier Architecture
● Easy to Access: 2-Tier Architecture makes easy access to the database, which
makes fast retrieval.
● Scalable: We can scale the database easily, by adding clients or by upgrading
hardware.
● Low Cost: 2-Tier Architecture is cheaper than 3-Tier Architecture and Multi-Tier
Architecture.
● Easy Deployment: 2-Tier Architecture is easy to deploy than 3-Tier Architecture.
● Simple: 2-Tier Architecture is easily understandable as well as simple because of
only two components.
3 Tier Architecture
● The 3-Tier architecture contains another layer between the client and server. In
this architecture, client can't directly communicate with the server.
● The application on the client-end interacts with an application server which
further communicates with the database system.
● End user has no idea about the existence of the database beyond the application
server. The database also has no idea about any other user beyond the
application.
● The 3-Tier architecture is used in case of large web application.
● The internal level has an internal schema which describes the physical storage
structure of the database.
● The internal schema is also known as a physical schema.
● It uses the physical data model. It is used to define how the data will be stored in
a block.
● The physical level is used to describe complex low-level data structures in detail.
2. Conceptual Level
● At the external level, a database contains several schemas that are sometimes
called subschemas. The subschema is used to describe the different views of
the database.
● An external schema is also known as view schema.
● Each view schema describes the database part that a particular user group is
interested and hides the remaining database from that user group.
● The view schema describes the end user interaction with database systems.
The main objective of three level architecture is to enable multiple users to access the
same data with a personalized view while storing the underlying data only once. Thus it
separates the user's view from the physical structure of the database. This separation is
desirable for the following reasons:
The three levels of DBMS architecture don't exist independently of each other. There
must be correspondence between the three levels i.e. how they actually correspond with
each other. DBMS is responsible for correspondence between the three types of
schema. This correspondence is called Mapping.
The Conceptual/ Internal Mapping lies between the conceptual level and the internal
level. Its role is to define the correspondence between the records and fields of the
conceptual level and files and data structures of the internal level.
The external/Conceptual Mapping lies between the external level and the Conceptual
level. Its role is to define the correspondence between a particular external and the
conceptual view.
Integrity Constraints
1. Domain Constraints
● Domain constraints can be defined as the definition of a valid set of values for an
attribute.
● The data type of domain includes string, character, integer, time, date, currency,
etc. The value of the attribute must be available in the corresponding domain.
2. Entity integrity constraints
● The entity integrity constraint states that primary key value can't be null.
● This is because the primary key value is used to identify individual rows in
relation and if the primary key has a null value, then we can't identify those rows.
● A table can contain a null value other than the primary key field.
● Keys are the entity set that is used to identify an entity within its entity set
uniquely.
● An entity set can have multiple keys, but out of which one key will be the primary
key. A primary key can contain a unique and null value in the relational table.
Extended ER Diagram
The Entity Relationship Diagram explains the relationship among the entities present in
the database. ER models are used to model real-world objects like a person, a car, or a
company and the relation between these real-world objects. In short, ER Diagram is the
structural format of the database.
But today the complexity of the data is increasing so it becomes more and more difficult
to use the traditional ER model for database modeling. To reduce this complexity of
modeling we have to make improvements or enhancements to the existing ER model to
make it able to handle the complex application in a better way.
The link between subclasses and superclasses introduces the idea of inheritance. The
'd' symbol is used to indicate the relationship between subclasses and superclasses.
SuperClass
A superclass is a type of entity that is connected to one or more subtypes. And, also
note that a database entity cannot be created just by belonging to a superclass.
For example: The superclass of shapes includes the subgroups like Triangle, Circles,
and Squares.
SubClass
For example: Triangles, Circles, and squares are the subclass of the Shape superclass.
Specialization
Specialization is a procedure that defines a set of entities that are divided into
subgroups based on their characteristics. The Enhanced ER model was designed in a
top to bottom approach using the Specialization. In this model, the superclass or parent
object is defined first by utilizing a box i.e. a rectangle box. After this, it is separated into
subclasses, which are comparable entity types.
Let's take an example of a scenario, that handles, stores, and processes a large amount
of data, for a company that manufactures automobiles. The primary feature of this
company is the Vehicle, also considered as a superclass. All the other attributes of the
superclass are the type of vehicle, the color of the vehicle, the average of the vehicle,
etc.
Now, the vehicle which is a superclass can further be subdivided into various
subclasses, for example, Cars and Trucks. Here, each of the above-mentioned
subclasses inherits all of the attributes of the superclass i.e. Vehicle superclass. Also,
note that a subclass can have its properties in addition to inherited ones. The below
diagram is a representation of the above-given scenario.
Generalization
Let's take the above same example of the data handling scenario for a company that
manufactures automobiles. The car and truck are the three primary entities in the
Enhanced ER diagram in the given example. These entities can include attributes like
registration number, license period, insurance number, and so on, and they can be used
as subclasses for both Commercial and Private vehicle superclasses. The attributes
belong to the subclasses Car and truck and are included in their respective
superclasses due to their commonality. This process of taking the shared attributes and
reaching the fundamental primary root is known as Generalization.
Category or Union
Let's consider an example of a car and its owner. The owner can be considered as a
subclass and the superclass can be an individual, a company, or a bank. As shown in
the below EER model in DBMS, the subclass i.e. a car owner in the car booking model
can be any of the superclasses i.e. an individual, a company, or a bank.
Aggregation
Relational Algebra
Input Table:
2. Projection(π)
This operation shows the list of those attributes that we wish to appear in the
result. Rest of the attributes are eliminated from the table.
It is denoted by ∏.
Notation: ∏ A1, A2, An (r)
Where:
Input Table:
Output Table:
NAME CITY
Jones Harrison
Smith Rye
3. Union(U)
Suppose there are two tuples R and S. The union operation contains all the tuples
that are either in R or S or both in R & S.
It eliminates the duplicate tuples. It is denoted by ∪.
Notation: R ∪ S
Where:
A union operation must hold the following condition:
R and S must have the attribute of the same number.
Duplicate tuples are eliminated automatically.
Input Table:
Depositor Table
CUSTOMER_NAME ACCOUNT_NO
Johnson A-101
Borrow Table
CUSTOMER_NAME LOAN_NO
Jones L-17
Output Table:
CUSTOMER_NAME
Johnson
Jones
4. Set Intersection(∩)
Suppose there are two tuples R and S. The set intersection operation contains all
tuples that are in both R & S.
It is denoted by intersection ∩.
Notation: R ∩ S
Example: ∏ CUSTOMER_NAME (BORROW) ∩ ∏ CUSTOMER_NAME (DEPOSITOR)
Input Table:
Depositor Table
CUSTOMER_NAME ACCOUNT_NO
Johnson A-101
Smith A-101
Borrow Table
CUSTOMER_NAME LOAN_NO
Jones L-17
Smith A-101
Output Table:
CUSTOMER_NAME
Smith
5. Set Difference(-)
Suppose there are two tuples R and S. The set difference operation contains all
tuples that are in R but not in S.
It is denoted by intersection minus (-).
Notation: R - S
Example: ∏ CUSTOMER_NAME (BORROW) - ∏ CUSTOMER_NAME (DEPOSITOR)
Input Table:
Depositor Table
CUSTOMER_NAME ACCOUNT_NO
Johnson A-101
Jones L-17
Borrow Table
CUSTOMER_NAME LOAN_NO
Jones L-17
Output Table:
CUSTOMER_NAME
Johnson
6. Cartesian Product(X)
The Cartesian product is used to combine each row in one table with each row in
the other table. It is also known as a cross product.
It is denoted by X.
Notation: E X D
Example: EMPLOYEE X DEPARTMENT
Input Table:
Employee Table
1 Smith A
2 Harry C
3 John B
Department Table
DEPT_NO DEPT_NAME
A Marketing
B Sales
C Legal
Output Table:
1 Smith A A Marketing
1 Smith A B Sales
1 Smith A C Legal
2 Harry C A Marketing
2 Harry C B Sales
2 Harry C C Legal
3 John B A Marketing
3 John B B Sales
3 John B C Legal
7. Rename(ρ)
The rename operation is used to rename the output relation. It is denoted by rho
(ρ).
Example: We can use the rename operator to rename STUDENT relation to
STUDENT1.
ρ(STUDENT1, STUDENT)
Relational Calculus
Many of the calculus expressions involves the use of Quantifiers. There are two types of
quantifiers:
Tuple Relational Calculus in DBMS uses a tuple variable (t) that goes to each row
of the table and checks if the predicate is true or false for the given row.
Depending on the given predicate condition, it returns the row or part of the row.
Where: t is the tuple variable that runs over every Row, and P(t) is the predicate
logic expression or condition.
Let's take an example of a Customer Database and try to see how TRC
expressions work.
Customer Table:
1 Rohit 12345
2 Rahul 13245
3 Rohit 56789
4 Amit 12345.
Example : Write a TRC query to get all the data of customers whose zip code is
12345.
Workflow of query: The tuple variable "t" will go through every tuple of the
Customer table. Each row will check whether the Cust_Zipcode is 12345 or not
and only return those rows that satisfies the Predicate expression condition.
The TRC expression above can be read as "Return all the tuple which belongs to
the Customer Table and whose Zipcode is equal to 12345."
1 Rohit 12345
4. Amit 12345
2. Domain Relational Calculus (DRC)
Domain Relational Calculus uses domain Variables to get the column values
required from the database based on the predicate expression or condition.
The Domain realtional calculus expression Syntax: {<x1,x2,x3,x4...> \|
P(x1,x2,x3,x4...)}
Where: <x1,x2,x3,x4...> are domain variables used to get the column values
required, and P(x1,x2,x3...) is predicate expression or condition.
Let's take the example of Customer Database and try to understand DRC queries
with example.
Customer Table:
1 Rohit 12345
2 Rahul 13245
3 Rohit 56789
4 Amit 12345
Example : Write a DRC query to get the data of all customers with Zip code
12345.
Workflow of Query: In the above query x1,x2,x3 (ordered) refers to the attribute or
column which we need in the result, and the predicate condition is that the first
two domain variables x1 and x2 should be present while matching the condition
for each row and the third domain variable x3 should be equal to 12345.
4 Amit 12345
It reflects traditional
It is more similar to logic as a
7. Similarity pre-relational file
modeling language.
structures.
Basis of
S.NO Relational Algebra Relational Calculus
Comparison
Functional Dependency
X → Y
The left side of FD is known as a determinant, the right side of the production is known
as a dependent.
For example:
Here Emp_Id attribute can uniquely identify the Emp_Name attribute of employee table
because if we know the Emp_Id, we can tell that employee name associated with it.
Example:
42 abc 17
43 pqr 18
44 xyz 18
Example:
42 abc 17
43 pqr 18
44 xyz 18
Example:
42 abc 17
43 pqr 18
44 xyz 18
45 abc 19
42 abc CO 4
43 pqr EC 2
44 xyz IT 1
45 abc EC 2
Here, enrol_no → dept and dept → building_no. Hence, according to the axiom of
transitivity, enrol_no → building_no is a valid functional dependency. This is an
indirect functional dependency, hence called Transitive functional dependency.
1. Reflexive Rule
In the reflexive rule, if Y is a subset of X, then X determines Y.
If X ⊇ Y then X → Y
Example:
X = {a, b, c, d, e}
Y = {a, b, c}
2. Augmentation Rule
The augmentation is also called as a partial dependency. In augmentation, if X
determines Y, then XZ determines YZ for any Z.
If X → Y then XZ → YZ
Example: For R(ABCD), if A → B then AC → BC
3. Transitive Rule
In the transitive rule, if X determines Y and Y determine Z, then X must also
determine Z.
If X → Y and Y → Z then X → Z
4. Union Rule
Union rule says if X determines Y and X determines Z, then X must also
determine Y and Z.
If X → Y and X → Z then X → YZ
5. Decomposition Rule
Decomposition rule is also known as project rule. It is the reverse of union rule.
This Rule says, if X determines Y and Z, then X determines Y and X determines Z
separately.
If X → YZ then X → Y and X → Z
Join Dependency
Let R be a relation schema and R1, R2, ..., Rn be the decomposition of R, R is said to
satisfy the join dependency ( R1, R2, ..., Rn ), if and only if every legal instance r ( R ) is
equal to join of its projections on R1, R2, ..., Rn
● We can break, or decompose the above table into three tables, this would mean
that the table is not in 5NF!
● The three decomposed tables would be:
E_Name Company
Rohan Comp1
Harpreet Comp2
Anant Comp3
E_Name Product
Rohan Jeans
Harpreet Jacket
Anant TShirt
Comp1 Jeans
Comp2 Jacket
Comp3 TShirt
Note: If the natural join of all three tables yields the relation table R, the
relation will be said to have join dependency.
Step 2: Next, let's perform the natural join of the above table with R3:
Therefore, our join dependency comes out to be: {(E_Name, Company ), (E_Name,
Product), (Company, Product)}
Because the above-mentioned relations are joined dependent, they are not 5NF.
That is, a join relation of the three relations above is equal to our initial relation
table R.
● If a relation is in 4NF and does not contain any join dependencies, it is in 5NF.
● To avoid redundancy, 5NF is satisfied when all tables are divided into as many
tables as possible.
MVDs exist within a single relation and JDs involve multiple relations and
Dependency
are defined between attributes of that are defined based on the join of
Relation
relation. those relations.
MVDs are violated if for a given JDs are violated if the tuples from
combination of values in one set of the joined relations do not satisfy
Violation and attributes, there exist multiple the join dependency. JD violations
Resolution combinations of values in the other set can be resolved by decomposing the
of attributes. MVD violations can be relation or splitting the join into
resolved by decomposing the relation. multiple relations.
Normalization
EmployeeProjectDetail Table:
In the above table, the prime attributes of the table are Employee Code and
Project ID. We have partial dependencies in this table because Employee Name
can be determined by Employee Code and Project Name can be determined by
Project ID. Thus, the above relational table violates the rule of 2NF.
To remove partial dependencies from this table and normalize it into second
normal form, we can decompose the <EmployeeProjectDetail> table into the
following three tables:
EmployeeDetail Table:
101 John
101 John
102 Ryan
103 Stephanie
EmployeeProject Table:
101 P03
101 P01
102 P04
103 P02
ProjectDetail Table:
P03 Project103
P01 Project101
P04 Project104
P02 Project102
The relations in 2NF are clearly less redundant than relations in 1NF. However,
the decomposed relations may still suffer from one or more anomalies due to the
transitive dependency. We will remove the transitive dependencies in the Third
Normal Form.
● X -> Y
● Y does not -> X
● Y -> Z
For a relational table to be in third normal form, it must satisfy the following rules:
EmployeeDetail Table:
The above table is not in 3NF because it has Employee Code -> Employee City
transitive dependency because:
EmployeeDetail Table:
EmployeeLocation Table:
110044 Badarpur
110028 Naraina
A superkey is a set of one or more attributes that can uniquely identify a row in a
database table.
Example: Let us take an example of the following <EmployeeProjectLead> table
to understand how to normalize the table to the BCNF:
EmployeeProjectLead Table:
The above table satisfies all the normal forms till 3NF, but it violates the rules of
BCNF because the candidate key of the above table is {Employee Code, Project
ID}. For the non-trivial functional dependency, Project Leader -> Project ID, Project
ID is a prime attribute but Project Leader is a non-prime attribute. This is not
allowed in BCNF.
To convert the given table into BCNF, we decompose it into two tables:
EmployeeProject Table:
101 P03
101 P01
102 P04
103 P02
ProjectLead Table:
Grey P03
Christian P01
Hudson P04
Petro P02
Student Table:
21 Computer Dancing
21 Math Singing
34 Chemistry Dancing
74 Biology Cricket
59 Physics Hockey
The given STUDENT table is in 3NF, but the COURSE and HOBBY are two
independent entity. Hence, there is no relationship between COURSE and HOBBY.
So to make the above table into 4NF, we can decompose it into two tables:
StudentCourse Table:
STU_ID COURSE
21 Computer
21 Math
34 Chemistry
74 Biology
59 Physics
StudentHobby Table:
STU_ID HOBBY
21 Dancing
21 Singing
34 Dancing
74 Cricket
59 Hockey
Example:
In the above table, John takes both Computer and Math class for Semester 1 but
he doesn't take Math class for Semester 2. In this case, combination of all these
fields required to identify a valid data.
Suppose we add a new Semester as Semester 3 but do not know about the
subject and who will be taking that subject so we leave Lecturer and Subject as
NULL. But all three columns together acts as a primary key, so we can't leave
other two columns blank.
So to make the above table into 5NF, we can decompose it into three relations P1,
P2 & P3:
P1 Table:
SEMESTER SUBJECT
Semester 1 Computer
Semester 1 Math
Semester 1 Chemistry
Semester 2 Math
P2 Table:
SUBJECT LECTURER
Computer Anshika
Computer John
Math John
Math Akash
Chemistry Praveen
P3 Table:
SEMESTER LECTURER
Semester 1 Anshika
Semester 1 John
Semester 1 John
Semester 2 Akash
Semester 1 Praveen
Relational Decomposition
● When a relation in the relational model is not in appropriate normal form then the
decomposition of a relation is required.
● In a database, it breaks the table into multiple tables.
● If the relation has no proper decomposition, then it may lead to problems like
loss of information.
● Decomposition is used to eliminate some of the problems of bad design like
anomalies, inconsistencies, and redundancy.
Whenever we decompose a relation, there are certain properties that must be satisfied
to ensure no information is lost while decomposing the relations. These properties are:
We can follow certain rules to ensure that the decomposition is a lossless join
decomposition Let’s say we have a relation R and we decomposed it into R1 and R2,
then the rules are:
1. The union of attributes of both the sub relations R1 and R2 must contain
all the attributes of original relation R.
R1 ∪ R2 = R
2. The intersection of attributes of both the sub relations R1 and R2 must not
be null, i.e., there should be some attributes that are present in both R1
and R2.
R1 ∩ R2 ≠ ∅
3. The intersection of attributes of both the sub relations R1 and R2 must be
the superkey of R1 or R2, or both R1 and R2.
R1 ∩ R2 = Super key of R1 or R2
Example: Let’s see an example of a lossless join decomposition. Suppose we have the
following relation EmployeeProjectDetail as:
EmployeeProjectDetail Table:
Now, we decompose this relation into EmployeeProject and ProjectDetail relations as:
EmployeeProject Table:
ProjectDetail Table:
Project_ID Project_Name
P03 Project103
P01 Project101
P04 Project104
P02 Project102
Project_ID
P03
P01
P04
P02
As we can see this is not null, so the the second condition holds as well. Also the
EmployeeProject ∩ ProjectDetail = Project_Id. This is the super key of the ProjectDetail
relation, so the third condition holds as well.
Now, since all three conditions hold for our decomposition, this is a lossless join
decomposition.
Lossy Decomposition
In a lossy decomposition, one or more of these conditions would fail and we will not be
able to recover Complete information as present in the original relation.
EmployeeProject Table:
ProjectDetail Table:
Project_ID Project_Name
P03 Project103
P01 Project101
P04 Project104
P02 Project102
The primary key of the above relation is {Project_ID}.
Now, the intersection EmployeeProject ∩ ProjectDetail is null. Therefore there is no way
for us to map a project to its employees. Thus this is a lossy decomposition.
Dependency Preserving
EmployeeProjectDetail Table:
EmployeeProject Table:
Employee_Code Project_ID Employee_Name Employee_Email
ProjectDetail Table:
Project_ID Project_Name
P03 Project103
P01 Project101
P04 Project104
P02 Project102
Reconstruction of the
Allows for the complete
original data is not
reconstruction of the
Reconstruction possible since some
original data without any
information is intentionally
loss or distortion.
discarded or compressed.
Questions Answers
Q1. How does BCNF differ from 3NF? Prove that BCNF is stronger than 3NF with
example.
Sol.
Q.2 What do you mean by functional dependency set and attribute closure
Sol. Functional Dependency Set: Functional Dependency set or FD set of a relation is
the set of all FDs present in the relation. For Example, FD set for relation STUDENT
shown in table 1 is:
Result:= a;
while (changes to Result) do
for each B → Y in F do
Begin
if B ⊆ Result then Result := Result ∪ Y
End
Q.5 A set of FD’s for the relation R{A,B,C,D} is A->B, B ->C, D->ABC, AC-> D
Sol. We can find the minimal cover by following the 3 simple steps.
Step: 1 First split the all left-hand attributes of all FDs (functional dependencies).
A->B, B->C, D->A, D->B, D->C, AC->D
[Note: We can't split AC->D as A->D, C->D]
Step: 2 Now remove all redundant FDs.
[Redundant FD is if we derive one FD from another FD ]
Let, 's test the redundance of A->B
A+ = A (A is only closure contains to A, simply we can derive A from A)
So, A->B is not redundant.
Similarly, B->C is not redundant.
But, D->B and D->C is redundant because D+= A and A+=B, So D+=B can be
derived which means D->B is redundant.
So, We remove D->B from the FDs set.
Now, check for D->C, it is not redundant. because we can't D+=B and B+=C as we
remove D->B from the list.
At last, we check for AC->D. This is also not redundant.
AC+=AC
So, the final FDs are: A->B, B->C, D->A,D->C, AC->D
Step: 3 Find the Extraneous attribute and remove it.
In this case, we should only check ->D. Simply we can say the right-hand
attributes are pointed by only one attribute at one time.
AC->D, either A or C, or none can be extraneous.
If A=+ C then C is extraneous and it can be removed.
If C+=A then A is extraneous and it can be removed.
So, the final FDs are: A->B, B->C, D->A,D->C, AC->D
Hence, we can write it as A->B, B->C, D->AC, AC->D this is the minimum cover.