0% found this document useful (0 votes)
23 views

Unit 1 ffgggg

Uploaded by

Atul Kushwaha
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
23 views

Unit 1 ffgggg

Uploaded by

Atul Kushwaha
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 68

Unit 1

RDBMS

● RDBMS stands for Relational Database Management System.


● All modern database management systems like SQL, MS SQL Server, IBM DB2,
ORACLE, My-SQL, and Microsoft Access are based on RDBMS.
● It is called Relational Database Management System (RDBMS) because it is
based on the relational model introduced by E.F. Codd.

RDBMS vs DBMS

No DBMS RDBMS

1) DBMS applications store data as files. RDBMS applications store data in a tabular form.

2) In DBMS, data is generally stored in either a In RDBMS, the tables have an identifier called
hierarchical form or a navigational form. primary key and the data values are stored in the
form of tables.

3) Normalization is not present in DBMS. Normalization is present in RDBMS.

4) DBMS does not apply any security with RDBMS defines the integrity constraint for the
regards to data manipulation. purpose of ACID (Atomocity, Consistency, Isolation
and Durability) property.

5) DBMS uses file system to store data, so there in RDBMS, data values are stored in the form of
will be no relation between the tables. tables, so a relationship between these data values
will be stored in the form of a table as well.

6) DBMS has to provide some uniform methods RDBMS system supports a tabular structure of the
to access the stored information. data and a relationship between them to access the
stored information.
7) DBMS does not support distributed database. RDBMS supports distributed database.

8) DBMS is meant to be for small organization RDBMS is designed to handle large amount of data.
and deal with small data. it supports single it supports multiple users.
user.

9) Examples of DBMS are file systems, xml etc. Example of RDBMS are mysql, postgre, sql server,
oracle etc.

Advantages of DBMS

● Easy to manage: Each table can be independently manipulated without affecting


others.
● Security: It is more secure consisting of multiple levels of security. Access of
data shared can be limited.
● Flexible: Updating of data can be done at a single point without making
amendments at multiple files. Databases can easily be extended to incorporate
more records, thus providing greater scalability. Also, facilitates easy application
of SQL queries.
● Users: RDBMS supports client-side architecture storing multiple users together.

Disadvantages of DBMS

● High Cost and Extensive Hardware and Software Support: Huge costs and
setups are required to make these systems functional.
● Scalability: In case of addition of more data, servers along with additional power,
and memory are required.
● Complexity: Voluminous data creates complexity in understanding of relations
and may lower down the performance.
● Structured Limits: The fields or columns of a relational database system is
enclosed within various limits, which may lead to loss of data
DBMS Architecture

● The DBMS design depends upon its architecture. The basic client/server
architecture is used to deal with a large number of PCs, web servers, database
servers and other components that are connected with networks.
● The client/server architecture consists of many PCs and a workstation which are
connected via the network.
● DBMS architecture depends upon how users are connected to the database to
get their request done.

Types of DBMS Architecture

1 Tier Architecture
● In this architecture, the database is directly available to the user. It means the
user can directly sit on the DBMS and uses it.
● Any changes done here will directly be done on the database itself. It doesn't
provide a handy tool for end users.
● The 1-Tier architecture is used for development of the local application, where
programmers can directly communicate with the database for the quick
response.

Advantages of 1 Tier Architecture

● Simple Architecture: 1-Tier Architecture is the most simple architecture to set up,
as only a single machine is required to maintain it.
● Cost-Effective: No additional hardware is required for implementing 1-Tier
Architecture, which makes it cost-effective.
● Easy to Implement: 1-Tier Architecture can be easily deployed, and hence it is
mostly used in small projects.

2 Tier Architecture

● The 2-Tier architecture is same as basic client-server. In the two-tier architecture,


applications on the client end can directly communicate with the database at the
server side. For this interaction, API's like: ODBC, JDBC are used.
● The user interfaces and application programs are run on the client-side.
● The server side is responsible to provide the functionalities like: query processing
and transaction management.
● To communicate with the DBMS, client-side application establishes a connection
with the server side.

Advantages of 2 Tier Architecture

● Easy to Access: 2-Tier Architecture makes easy access to the database, which
makes fast retrieval.
● Scalable: We can scale the database easily, by adding clients or by upgrading
hardware.
● Low Cost: 2-Tier Architecture is cheaper than 3-Tier Architecture and Multi-Tier
Architecture.
● Easy Deployment: 2-Tier Architecture is easy to deploy than 3-Tier Architecture.
● Simple: 2-Tier Architecture is easily understandable as well as simple because of
only two components.

3 Tier Architecture

● The 3-Tier architecture contains another layer between the client and server. In
this architecture, client can't directly communicate with the server.
● The application on the client-end interacts with an application server which
further communicates with the database system.
● End user has no idea about the existence of the database beyond the application
server. The database also has no idea about any other user beyond the
application.
● The 3-Tier architecture is used in case of large web application.

Advantages of 3 Tier Architecture

● Enhanced scalability: Scalability is enhanced due to distributed deployment of


application servers. Now, individual connections need not be made between the
client and server.
● Data Integrity: 3-Tier Architecture maintains Data Integrity. Since there is a
middle layer between the client and the server, data corruption can be
avoided/removed.
● Security: 3-Tier Architecture Improves Security. This type of model prevents
direct interaction of the client with the server thereby reducing access to
unauthorized data.
Disadvantages of 3 Tier Architecture

● More Complex: 3-Tier Architecture is more complex in comparison to 2-Tier


Architecture. Communication Points are also doubled in 3-Tier Architecture.
● Difficult to Interact: It becomes difficult for this sort of interaction to take place
due to the presence of middle layers.

Three Schema Architecture of DBMS

● The three schema architecture is also called ANSI/SPARC architecture or


three-level architecture.
● This framework is used to describe the structure of a specific database system.
● The three schema architecture is also used to separate the user applications and
physical database.
● The three schema architecture contains three-levels. It breaks the database down
into three different categories.
1. Internal Level

● The internal level has an internal schema which describes the physical storage
structure of the database.
● The internal schema is also known as a physical schema.
● It uses the physical data model. It is used to define how the data will be stored in
a block.
● The physical level is used to describe complex low-level data structures in detail.

The internal level is generally is concerned with the following activities:

● Storage space allocations.


For Example: B-Trees, Hashing etc.
● Access paths.
For Example: Specification of primary and secondary keys, indexes, pointers and
sequencing.
● Data compression and encryption techniques.
● Optimization of internal structures.
● Representation of stored fields.

2. Conceptual Level

● The conceptual schema describes the design of a database at the conceptual


level. Conceptual level is also known as logical level.
● The conceptual schema describes the structure of the whole database.
● The conceptual level describes what data are to be stored in the database and
also describes what relationship exists among those data.
● In the conceptual level, internal details such as an implementation of the data
structure are hidden.
● Programmers and database administrators work at this level.
3. External Level

● At the external level, a database contains several schemas that are sometimes
called subschemas. The subschema is used to describe the different views of
the database.
● An external schema is also known as view schema.
● Each view schema describes the database part that a particular user group is
interested and hides the remaining database from that user group.
● The view schema describes the end user interaction with database systems.

Objective of Three Schema Architecture

The main objective of three level architecture is to enable multiple users to access the
same data with a personalized view while storing the underlying data only once. Thus it
separates the user's view from the physical structure of the database. This separation is
desirable for the following reasons:

● Different users need different views of the same data.


● The approach in which a particular user needs to see the data may change over
time.
● The users of the database should not worry about the physical implementation
and internal workings of the database such as data compression and encryption
techniques, hashing, optimization of the internal structures etc.
● All users should be able to access the same data according to their
requirements.
● DBA should be able to change the conceptual structure of the database without
affecting the user's
● Internal structure of the database should be unaffected by changes to physical
aspects of the storage.

Mapping Between Views

The three levels of DBMS architecture don't exist independently of each other. There
must be correspondence between the three levels i.e. how they actually correspond with
each other. DBMS is responsible for correspondence between the three types of
schema. This correspondence is called Mapping.

There are basically two types of mapping in the database architecture:

● Conceptual/ Internal Mapping


● External / Conceptual Mapping

Conceptual/ Internal Mapping

The Conceptual/ Internal Mapping lies between the conceptual level and the internal
level. Its role is to define the correspondence between the records and fields of the
conceptual level and files and data structures of the internal level.

External/ Conceptual Mapping

The external/Conceptual Mapping lies between the external level and the Conceptual
level. Its role is to define the correspondence between a particular external and the
conceptual view.
Integrity Constraints

● Integrity constraints are a set of rules. It is used to maintain the quality of


information.
● Integrity constraints ensure that the data insertion, updating, and other processes
have to be performed in such a way that data integrity is not affected.
● Thus, integrity constraint is used to guard against accidental damage to the
database.

1. Domain Constraints

● Domain constraints can be defined as the definition of a valid set of values for an
attribute.
● The data type of domain includes string, character, integer, time, date, currency,
etc. The value of the attribute must be available in the corresponding domain.
2. Entity integrity constraints

● The entity integrity constraint states that primary key value can't be null.
● This is because the primary key value is used to identify individual rows in
relation and if the primary key has a null value, then we can't identify those rows.
● A table can contain a null value other than the primary key field.

3. Referential Integrity Constraints

● A referential integrity constraint is specified between two tables.


● In the Referential integrity constraints, if a foreign key in Table 1 refers to the
Primary Key of Table 2, then every value of the Foreign Key in Table 1 must be
null or be available in Table 2.
4. Key constraints

● Keys are the entity set that is used to identify an entity within its entity set
uniquely.
● An entity set can have multiple keys, but out of which one key will be the primary
key. A primary key can contain a unique and null value in the relational table.
Extended ER Diagram

The Entity Relationship Diagram explains the relationship among the entities present in
the database. ER models are used to model real-world objects like a person, a car, or a
company and the relation between these real-world objects. In short, ER Diagram is the
structural format of the database.

But today the complexity of the data is increasing so it becomes more and more difficult
to use the traditional ER model for database modeling. To reduce this complexity of
modeling we have to make improvements or enhancements to the existing ER model to
make it able to handle the complex application in a better way.

Enhanced entity-relationship diagrams are advanced database diagrams very similar to


regular ER diagrams which represent the requirements and complexities of complex
databases.

In addition to ER model concepts EE-R includes −


● Subclasses and Super classes.
● Specialization and Generalization.
● Category or union type.
● Aggregation.

Subclass and SuperClass

The link between subclasses and superclasses introduces the idea of inheritance. The
'd' symbol is used to indicate the relationship between subclasses and superclasses.

SuperClass

A superclass is a type of entity that is connected to one or more subtypes. And, also
note that a database entity cannot be created just by belonging to a superclass.

For example: The superclass of shapes includes the subgroups like Triangle, Circles,
and Squares.
SubClass

A subclass is a collection of objects with special characteristics. The traits and


properties of a subclass are inherited from its superclass.

For example: Triangles, Circles, and squares are the subclass of the Shape superclass.

Constraint on SubClass Relationship

1. Total or Partial: A sub-classing relationship is total if every super-class entity is to


be associated with some sub-class entity, otherwise partial. Sub-class “job type
based employee category” is partial sub-classing – not necessary every
employee is one of (secretary, engineer, and technician), i.e. union of these three
types is a proper subset of all employees. Whereas other sub-classing “Salaried
Employee AND Hourly Employee” is total; the union of entities from sub-classes
is equal to the total employee set, i.e. every employee necessarily has to be one
of them.
2. Overlapped or Disjoint: If an entity from a super-set can be related (can occur) in
multiple sub-class sets, then it is overlapped sub-classing, otherwise disjoint.
Both the examples: job-type based and salaries/hourly employee sub-classing
are disjoint.

Specialization

Specialization is a procedure that defines a set of entities that are divided into
subgroups based on their characteristics. The Enhanced ER model was designed in a
top to bottom approach using the Specialization. In this model, the superclass or parent
object is defined first by utilizing a box i.e. a rectangle box. After this, it is separated into
subclasses, which are comparable entity types.

Let's take an example of a scenario, that handles, stores, and processes a large amount
of data, for a company that manufactures automobiles. The primary feature of this
company is the Vehicle, also considered as a superclass. All the other attributes of the
superclass are the type of vehicle, the color of the vehicle, the average of the vehicle,
etc.

Now, the vehicle which is a superclass can further be subdivided into various
subclasses, for example, Cars and Trucks. Here, each of the above-mentioned
subclasses inherits all of the attributes of the superclass i.e. Vehicle superclass. Also,
note that a subclass can have its properties in addition to inherited ones. The below
diagram is a representation of the above-given scenario.
Generalization

Extraction of the shared characteristics or traits of entities to compile them into a


superclass is the process of Generalization. It is a reverse process of Specialization. In
short, it converts subclasses to superclasses. However, generalization is a process that
combines only those entity sets that share the same features into higher-level entities.

As generalization is a reverse process of specialization, it follows a bottom-up approach


i.e. the lower-level entities combine to form a higher-level entity.

Let's take the above same example of the data handling scenario for a company that
manufactures automobiles. The car and truck are the three primary entities in the
Enhanced ER diagram in the given example. These entities can include attributes like
registration number, license period, insurance number, and so on, and they can be used
as subclasses for both Commercial and Private vehicle superclasses. The attributes
belong to the subclasses Car and truck and are included in their respective
superclasses due to their commonality. This process of taking the shared attributes and
reaching the fundamental primary root is known as Generalization.
Category or Union

A Category or Union represents a single sub-class or super-class relationship between


two or more super-classes. However, the participation of the superclasses can be
partial or total. In short, a category or union represents a relationship of "either" type.

Let's consider an example of a car and its owner. The owner can be considered as a
subclass and the superclass can be an individual, a company, or a bank. As shown in
the below EER model in DBMS, the subclass i.e. a car owner in the car booking model
can be any of the superclasses i.e. an individual, a company, or a bank.
Aggregation

Aggregation is a high-level data modeling technique that makes use of both


Generalization and Specialization. It is widely used to connect distinct entity types
based on a shared relationship. This idea is commonly used to represent execution
elements, operational lines, or functional behavior of a similar sort in terms of shared
properties.

Aggregation is used to simplify the details of a given database by converting ternary


relationships to binary ones. Ternary relationships are merely one sort of relationship
that exists between three entities. Let's take an example to understand aggregation. For
example: Center entity offers the Course entity act as a single entity in the relationship
which is in a relationship with another entity visitor. In the real world, if a visitor visits a
coaching center then he will never enquiry about the Course only or just about the
Center instead he will ask the enquiry about both.

Mapping EER to Relations

1. Mapping of regular entity types


● for each regular entity type E in the ER schema, create a relation R that
includes all the simple attributes of E
● include only the simple component attributes of a composite attribute
● choose one of the key attributes of E as primary key for R
● if the chosen key of E is composite, the set of simple attributes that form it
will together form the primary key of R
2. Mapping of weak entity types
● for each weak entity type W in the ER schema with owner entity type E,
create a relation R and include all simple attributes of W as attributes of R
● include as foreign key attributes of R the primary key attribute(s) of the
relation(s) that correspond to the owner entity type(s)
● the primary key of R is the combination of the primary key(s) of the
owner(s) and the partial key of the weak entity type W, if any
● if there is a weak entity type E2 whose owner is also a weak entity type E1,
then E1 should be mapped before E2 to determine its primary key first
3. Mapping of binary 1:1 relation types
● for each binary 1:1 relationship type R in the ER schema, identify the
relations S and T that correspond to the entity types participating in R
● foreign key approach
○ choose one of the relations, S, and include as a foreign key in S the
primary key of T
○ include all the simple attributes of R as attributes of S
● merged relation option
○ merge the two entity types and the relationship into a single relation
● relationship relation option
○ set up a third relation R for the purpose of cross-referencing the
primary keys of S and T
4. Mapping of binary 1:N relationship types
● for each binary 1:N relationship type R, identify the relation S that
represents the participating entity type at the N-side of the relationship
type
● include as foreign key in S the primary key of the relation T that represents
the other entity type participating in R
● include any simple attributes of the 1:N relationship type as attributes of S
5. Mapping of binary M:N relationship types
● for each binary M:N relationship type R, create a new relation S to
represent R
● include as foreign key attributes in S the primary keys of the relations that
represent the participating entity types
● their combination will form the primary key of S
● include any simple attributes of R as attributes of S
6. Mapping of multivalued attributes
● for each multivalued attribute A, create a new relation R
● R will include an attribute corresponding to A, plus the primary key
attribute K - as a foreign key in R – of the relation that represents the entity
type or relationship type that has A as an attribute
● the primary key of R is the combination of A and K
● if the multivalued attribute is composite, include its simple components
7. Mapping of N-ary relationship types
● for each n-ary relationship type R, where n > 2, create a new relation S to
represent R
● include as foreign key attributes in S the primary keys of the relations that
represent the participating entity types
● include any simple attributes of R as attributes of S
● the primary key of S is usually a combination of all the foreign keys that
reference the relations representing the participating entity types
8. Options for mapping specialization or generalization
● convert each specialization with m subclasses {S1, S2, …, Sm} and
superclass C, where the attributes of C are {k, a1, …, an} and k is the key,
into relation schemas using one of the following options:
○ option 8A: multiple relations-superclass and subclasses
■ create a relation L for C with attributes Attrs(L) = {k, a1, …,
an} and PK(L) = k
■ create a relation Li for each subclass Si, with the attributes
Attrs(Li) = {k} {attributes of Si} and PK(Li) = k
■ works for any specialization (total or partial, disjoint or
overlapping)
○ option 8B: multiple relations-subclass relations only
■ create a relation Li for each subclass Si, with the attributes
Attrs(Li) = {attributes of Si} {k, a1, …, an} and PK(Li) = k
■ only works for a specialization whose subclasses are total
9. Mapping of union types (categories)
● shared subclass: a subclass of several superclasses, indicating multiple
inheritance
● apply any of the options in step 8 to a shared subclass

Relational Algebra and Relational Calculus

Relational Algebra

Relational Algebra is a procedural query language. Relational algebra mainly provides a


theoretical foundation for relational databases and SQL. The main purpose of using
Relational Algebra is to define operators that transform one or more input relations into
an output relation. Given that these operators accept relations as input and produce
relations as output, they can be combined and used to express potentially complex
queries that transform potentially many input relations (whose data are stored in the
database) into a single output relation (the query results). As it is pure mathematics,
there is no use of English Keywords in Relational Algebra and operators are represented
using symbols.
1. Selection(σ)
The select operation selects tuples that satisfy a given predicate.
It is denoted by sigma (σ).
Notation: σ p(r)
Where:
σ is used for selection prediction
r is used for relation(table name)
p is used as a propositional logic formula which may use connectors like:
AND OR and NOT. These relational can use as relational operators like =, ≠,
≥, <, >, ≤.

Example: σ BRANCH_NAME="perryride" (LOAN)

Input Table:

BRANCH_NAME LOAN_NO AMOUNT

Redwood L-23 2000

Perryride L-15 1500

Mianus L-13 500

Perryride L-16 1300


Output Table:

BRANCH_NAME LOAN_NO AMOUNT

Perryride L-15 1500

Perryride L-16 1300

2. Projection(π)
This operation shows the list of those attributes that we wish to appear in the
result. Rest of the attributes are eliminated from the table.
It is denoted by ∏.
Notation: ∏ A1, A2, An (r)

Where:

A1, A2, A3 is used as an attribute name of relation r.

Example: ∏ NAME, CITY (CUSTOMER)

Input Table:

NAME STREET CITY

Jones Main Harrison

Smith North Rye

Output Table:

NAME CITY
Jones Harrison

Smith Rye

3. Union(U)
Suppose there are two tuples R and S. The union operation contains all the tuples
that are either in R or S or both in R & S.
It eliminates the duplicate tuples. It is denoted by ∪.
Notation: R ∪ S
Where:
A union operation must hold the following condition:
R and S must have the attribute of the same number.
Duplicate tuples are eliminated automatically.

Example: ∏ CUSTOMER_NAME (BORROW) ∪ ∏ CUSTOMER_NAME


(DEPOSITOR)

Input Table:

Depositor Table

CUSTOMER_NAME ACCOUNT_NO

Johnson A-101

Borrow Table

CUSTOMER_NAME LOAN_NO

Jones L-17

Output Table:
CUSTOMER_NAME

Johnson

Jones

4. Set Intersection(∩)
Suppose there are two tuples R and S. The set intersection operation contains all
tuples that are in both R & S.
It is denoted by intersection ∩.
Notation: R ∩ S
Example: ∏ CUSTOMER_NAME (BORROW) ∩ ∏ CUSTOMER_NAME (DEPOSITOR)

Input Table:

Depositor Table

CUSTOMER_NAME ACCOUNT_NO

Johnson A-101

Smith A-101

Borrow Table

CUSTOMER_NAME LOAN_NO

Jones L-17

Smith A-101
Output Table:

CUSTOMER_NAME

Smith

5. Set Difference(-)
Suppose there are two tuples R and S. The set difference operation contains all
tuples that are in R but not in S.
It is denoted by intersection minus (-).
Notation: R - S
Example: ∏ CUSTOMER_NAME (BORROW) - ∏ CUSTOMER_NAME (DEPOSITOR)

Input Table:

Depositor Table

CUSTOMER_NAME ACCOUNT_NO

Johnson A-101

Jones L-17

Borrow Table

CUSTOMER_NAME LOAN_NO

Jones L-17
Output Table:

CUSTOMER_NAME

Johnson

6. Cartesian Product(X)
The Cartesian product is used to combine each row in one table with each row in
the other table. It is also known as a cross product.
It is denoted by X.
Notation: E X D
Example: EMPLOYEE X DEPARTMENT

Input Table:

Employee Table

EMP_ID EMP_NAME EMP_DEPT

1 Smith A

2 Harry C

3 John B

Department Table

DEPT_NO DEPT_NAME

A Marketing
B Sales

C Legal

Output Table:

EMP_ID EMP_NAME EMP_DEPT DEPT_NO DEPT_NAME

1 Smith A A Marketing

1 Smith A B Sales

1 Smith A C Legal

2 Harry C A Marketing

2 Harry C B Sales

2 Harry C C Legal

3 John B A Marketing

3 John B B Sales

3 John B C Legal

7. Rename(ρ)
The rename operation is used to rename the output relation. It is denoted by rho
(ρ).
Example: We can use the rename operator to rename STUDENT relation to
STUDENT1.
ρ(STUDENT1, STUDENT)

Relational Calculus

It is based on Predicate calculus, a name derived from branch of symbolic language. A


predicate is a truth-valued function with arguments. On substituting values for the
arguments, the function result in an expression called a proposition. It can be either true
or false. It is a tailored version of a subset of the Predicate Calculus to communicate
with the relational database.

Many of the calculus expressions involves the use of Quantifiers. There are two types of
quantifiers:

● Universal Quantifiers: The universal quantifier denoted by ∀ is read as for all


which means that in a given set of tuples exactly all tuples satisfy a given
condition.
● Existential Quantifiers: The existential quantifier denoted by ∃ is read as for all
which means that in a given set of tuples there is at least one occurrences whose
value satisfy a given condition.

Types of Relational Calculus

1. Tuple Relational Calculus (TRC)

Tuple Relational Calculus in DBMS uses a tuple variable (t) that goes to each row
of the table and checks if the predicate is true or false for the given row.
Depending on the given predicate condition, it returns the row or part of the row.

The Tuple Relational Calculus expression Syntax: {t \| P(t)}

Where: t is the tuple variable that runs over every Row, and P(t) is the predicate
logic expression or condition.
Let's take an example of a Customer Database and try to see how TRC
expressions work.

Customer Table:

Customer_id Name Zip code

1 Rohit 12345

2 Rahul 13245

3 Rohit 56789

4 Amit 12345.

Example : Write a TRC query to get all the data of customers whose zip code is
12345.

TRC Query: {t \| t ∈ Customer ∧ t.Zipcode = 12345} or TRC Query: {t \|


Customer(t) ∧ t[Zipcode] = 12345 }

Workflow of query: The tuple variable "t" will go through every tuple of the
Customer table. Each row will check whether the Cust_Zipcode is 12345 or not
and only return those rows that satisfies the Predicate expression condition.

The TRC expression above can be read as "Return all the tuple which belongs to
the Customer Table and whose Zipcode is equal to 12345."

Result of the TRC expression above:

Customer_id Name Zip code

1 Rohit 12345

4. Amit 12345
2. Domain Relational Calculus (DRC)

Domain Relational Calculus uses domain Variables to get the column values
required from the database based on the predicate expression or condition.
The Domain realtional calculus expression Syntax: {<x1,x2,x3,x4...> \|
P(x1,x2,x3,x4...)}

Where: <x1,x2,x3,x4...> are domain variables used to get the column values
required, and P(x1,x2,x3...) is predicate expression or condition.

Let's take the example of Customer Database and try to understand DRC queries
with example.

Customer Table:

Customer_id Name Zip code

1 Rohit 12345

2 Rahul 13245

3 Rohit 56789

4 Amit 12345

Example : Write a DRC query to get the data of all customers with Zip code
12345.

DRC query: {<x1,x2,x3> \| <x1,x2> ∈ Customer ∧ x3 = 12345 }

Workflow of Query: In the above query x1,x2,x3 (ordered) refers to the attribute or
column which we need in the result, and the predicate condition is that the first
two domain variables x1 and x2 should be present while matching the condition
for each row and the third domain variable x3 should be equal to 12345.

Result of the DRC query will be:

Customer_id Name Zip code


1 Rohit 12345

4 Amit 12345

Difference in TRC and DRC

Basis of Tuple Relational Domain Relational Calculus


S. No
Comparison Calculus (TRC) (DRC)

The Tuple Relational


The Domain Relational
Calculus (TRC) is used
Calculus (DRC) employs a list
to select tuples from a
of attributes from which to
relation. The tuples with
1. Definition choose based on the
specific range values,
condition. It’s similar to TRC,
tuples with certain
but instead of selecting entire
attribute values, and so
tuples, it selects attributes.
on can be selected.

In TRC, the variables In DRC, the variables


Representation
2. represent the tuples represent the value drawn
of variables
from specified relations. from a specified domain.

A tuple is a single A domain is equivalent to


element of relation. In column data type and any
3. Tuple/ Domain
database terms, it is a constraints on the value of
row. data.

This filtering variable This filtering is done based on


4. Filtering
uses a tuple of relations. the domain of attributes.
The predicate
DRC takes advantage of
expression condition
domain variables and, based
associated with the TRC
on the condition set, returns
5. Return Value is used to test every row
the required attribute or
using a tuple variable
column that satisfies the
and return those tuples
criteria of the condition.
that met the condition.

The query cannot be The query can be expressed


Membership
5. expressed using a using a membership
condition
membership condition. condition.

The QUEL or Query


The QBE or Query-By-Example
6. Query Language Language is a query
is query language related to it.
language related to it,

It reflects traditional
It is more similar to logic as a
7. Similarity pre-relational file
modeling language.
structures.

Notation: {T | P (T)} or Notation: { a1, a2, a3, …, an | P


8. Syntax
{T | Condition (T)} (a1, a2, a3, …, an)}

{T | EMPLOYEE (T) AND { | < EMPLOYEE > DEPT_ID =


9. Example
T.DEPT_ID = 10} 10 }

Focuses on selecting Focuses on selecting values


10. Focus
tuples from a relation from a relation

Uses tuple variables Uses scalar variables (e.g., a1,


11. Variables
(e.g., t) a2, …, an)

12. Expressiveness Less expressive More expressive


Easier to use for simple More difficult to use for
13. Ease of use
queries. simple queries.

Useful for selecting


Useful for selecting specific
tuples that satisfy a
values or for constructing
14. Use case certain condition or for
more complex queries that
retrieving a subset of a
involve multiple relations.
relation.

Difference in relational Algebra and Relational Calculus

Basis of
S.NO Relational Algebra Relational Calculus
Comparison

Relational Calculus is a Declarative


1. Language Type It is a Procedural language.
(non-procedural) language.

Relational Algebra means Relational Calculus means what


2. Procedure
how to obtain the result. result we have to obtain.

In Relational Algebra, the


order is specified in which the In Relational Calculus, the order is
3. Order
operations have to be not specified.
performed.

Relation Calculus can be


Relational Algebra is
4. Domain domain-dependent because of
independent of the domain.
domain relational calculus.
Relational Calculus is not nearer to
Programming Relational Algebra is nearer
5. programming language but to
language to a programming language.
natural language.

The SQL includes only some


SQL is based to a greater extent on
6. Inclusion in SQL features from the relational
the tuple relational calculus.
algebra.

Relational Algebra is one of


the languages in which
For a database language to be
queries can be expressed but
Relationally relationally complete, the query
7. the queries should also be
completeness written in it must be expressible in
expressed in relational
relational calculus.
calculus to be relationally
complete.

The evaluation of the query


relies on the order The order of operations does not
Query
8. specification in which the matter in relational calculus for the
Evaluation
operations must be evaluation of queries.
performed.

For accessing the database,


For accessing the database,
relational algebra provides a
relational calculus provides a
Database solution in terms of what is
9. solution in terms as simple as
access required and how to get that
what is required and lets the
information by following a
system find the solution for that.
step-by-step description.
The completeness of a language is
measured in the manner that it is
The expressiveness of any least as powerful as calculus. That
given language is judged implies relation defined using
10. Expressiveness
using relational algebra some expression of the calculus is
operations as a standard. also definable by some other
expression, the language is in
question.

Functional Dependency

The functional dependency is a relationship that exists between two attributes. It


typically exists between the primary key and non-key attribute within a table.

X → Y

The left side of FD is known as a determinant, the right side of the production is known
as a dependent.

For example:

Assume we have an employee table with attributes: Emp_Id, Emp_Name, Emp_Address.

Here Emp_Id attribute can uniquely identify the Emp_Name attribute of employee table
because if we know the Emp_Id, we can tell that employee name associated with it.

Functional dependency can be written as: Emp_Id → Emp_Name

We can say that Emp_Name is functionally dependent on Emp_Id.


Types of Functional Dependency

1. Trivial Functional Dependency


In Trivial Functional Dependency, a dependent is always a subset of the
determinant. i.e. If X → Y and Y is the subset of X, then it is called trivial
functional dependency

Example:

roll_no name age

42 abc 17

43 pqr 18

44 xyz 18

Here, {roll_no, name} → name is a trivial functional dependency, since the


dependent name is a subset of determinant set {roll_no, name}. Similarly, roll_no
→ roll_no is also an example of trivial functional dependency.

2. Non Trivial Functional Dependency


In Non-trivial functional dependency, the dependent is strictly not a subset of the
determinant. i.e. If X → Y and Y is not a subset of X, then it is called Non-trivial
functional dependency.

Example:

roll_no name age

42 abc 17

43 pqr 18
44 xyz 18

Here, roll_no → name is a non-trivial functional dependency, since the dependent


name is not a subset of determinant roll_no. Similarly, {roll_no, name} → age is
also a non-trivial functional dependency, since age is not a subset of {roll_no,
name}

3. Multivalued functional dependency


In Multivalued functional dependency, entities of the dependent set are not
dependent on each other. i.e. If a → {b, c} and there exists no functional
dependency between b and c, then it is called a multivalued functional
dependency.

Example:

roll_no name age

42 abc 17

43 pqr 18

44 xyz 18

45 abc 19

Here, roll_no → {name, age} is a multivalued functional dependency, since the


dependents name & age are not dependent on each other(i.e. name → age or age
→ name doesn’t exist !)

4. Transitive functional dependency

In transitive functional dependency, dependent is indirectly dependent on


determinant. i.e. If a → b & b → c, then according to axiom of transitivity, a → c.
This is a transitive functional dependency.
Example:

enrol_no name dept building_no

42 abc CO 4

43 pqr EC 2

44 xyz IT 1

45 abc EC 2

Here, enrol_no → dept and dept → building_no. Hence, according to the axiom of
transitivity, enrol_no → building_no is a valid functional dependency. This is an
indirect functional dependency, hence called Transitive functional dependency.

Inference Rules of Functional Dependency

1. Reflexive Rule
In the reflexive rule, if Y is a subset of X, then X determines Y.
If X ⊇ Y then X → Y
Example:
X = {a, b, c, d, e}
Y = {a, b, c}

2. Augmentation Rule
The augmentation is also called as a partial dependency. In augmentation, if X
determines Y, then XZ determines YZ for any Z.
If X → Y then XZ → YZ
Example: For R(ABCD), if A → B then AC → BC

3. Transitive Rule
In the transitive rule, if X determines Y and Y determine Z, then X must also
determine Z.
If X → Y and Y → Z then X → Z

4. Union Rule
Union rule says if X determines Y and X determines Z, then X must also
determine Y and Z.
If X → Y and X → Z then X → YZ

5. Decomposition Rule
Decomposition rule is also known as project rule. It is the reverse of union rule.
This Rule says, if X determines Y and Z, then X determines Y and X determines Z
separately.
If X → YZ then X → Y and X → Z

6. Pseudo Transitive Rule


In Pseudo transitive Rule, if X determines Y and YZ determines W, then XZ
determines W.
If X → Y and YZ → W then XZ → W

Join Dependency

A Join Dependency on a relation schema R, specifies a constraint on states, r of R that


every legal state r of R should have a lossless join decomposition into
R1, R2,..., Rn . In a database management system, join dependency is a generalization of
the idea of multivalued dependency.

Let R be a relation schema and R1, R2, ..., Rn be the decomposition of R, R is said to
satisfy the join dependency ( R1, R2, ..., Rn ), if and only if every legal instance r ( R ) is
equal to join of its projections on R1, R2, ..., Rn

Example of Join Dependency

Suppose we have the following table R:


E_Name Company Product

Rohan Comp1 Jeans

Harpreet Comp2 Jacket

Anant Comp3 TShirt

● We can break, or decompose the above table into three tables, this would mean
that the table is not in 5NF!
● The three decomposed tables would be:

1. R1: The table with columns E_Name and Company.

E_Name Company

Rohan Comp1

Harpreet Comp2

Anant Comp3

2. R2: The table with columns E_Name and Product.

E_Name Product

Rohan Jeans

Harpreet Jacket

Anant TShirt

3. R3: The table with columns Company and Product.


Company Product

Comp1 Jeans

Comp2 Jacket

Comp3 TShirt

Note: If the natural join of all three tables yields the relation table R, the
relation will be said to have join dependency.

Let's try to figure out whether or not R has join dependency.

Step 1: First, the natural join of R1 and R2:

E_Name Company Product

Rohan Comp1 Jeans

Harpreet Comp2 Jacket

Anant Comp3 TShirt

Step 2: Next, let's perform the natural join of the above table with R3:

E_Name Company Product

Rohan Comp1 Jeans

Harpreet Comp2 Jacket

Anant Comp3 TShirt


In the above example, we do get the same table R after performing the natural
joins at both steps, luckily.

Therefore, our join dependency comes out to be: {(E_Name, Company ), (E_Name,
Product), (Company, Product)}

Because the above-mentioned relations are joined dependent, they are not 5NF.
That is, a join relation of the three relations above is equal to our initial relation
table R.

Join Dependencies and Fifth Normal Form (5NF)

● If a relation is in 4NF and does not contain any join dependencies, it is in 5NF.
● To avoid redundancy, 5NF is satisfied when all tables are divided into as many
tables as possible.

Conclusion: If a relation has join dependency, it won't be in 5NF.

Difference in Multivalued and Join Dependency

Aspect Multivalued Dependencies (MVDs) Join Dependencies (JDs)

Specify dependencies between Specify dependencies between two


attributes within a relation, indicating or more relations, indicating that
Definition that certain attributes may have certain combinations of tuples from
multiple values for each combination those relations should always be
of values in other attributes. joined together.
Dependency
Applies to a single relation. Applies to multiple relations.
Scope

MVDs exist within a single relation and JDs involve multiple relations and
Dependency
are defined between attributes of that are defined based on the join of
Relation
relation. those relations.

MVDs can be inferred from functional


JDs cannot be inferred from FDs or
Dependency dependencies (FDs) but are more
MVDs. They must be specified
Inference general and can express dependencies
explicitly.
that FDs cannot.

MVDs are violated if for a given JDs are violated if the tuples from
combination of values in one set of the joined relations do not satisfy
Violation and attributes, there exist multiple the join dependency. JD violations
Resolution combinations of values in the other set can be resolved by decomposing the
of attributes. MVD violations can be relation or splitting the join into
resolved by decomposing the relation. multiple relations.

MVDs are used in normalization


JDs are less commonly used in
theory, particularly in the fourth normal
practice but can be used to optimize
Usage form (4NF). They help remove
certain query operations, such as
redundancy when dealing with
join elimination or join ordering.
composite or multi-valued attributes.

Normalization

Normalization is the process of minimizing redundancy from a relation or set of


relations. Redundancy in relation may cause insertion, deletion, and update anomalies.
So, it helps to minimize the redundancy in relations. Normal forms are used to eliminate
or reduce redundancy in database tables.
1. First Normal Form
This is the most basic level of normalization. In 1NF, each table cell should
contain only a single value, and each column should have a unique name. The
first normal form helps to eliminate duplicate data and simplify queries.

Example: Relation STUDENT in table 1 is not in 1NF because of multi-valued


attribute STUD_PHONE. Its decomposition into 1NF has been shown in table 2

2. Second Normal Form


NF eliminates redundant data by requiring that each non-key attribute be
dependent on the primary key. This means that each column should be directly
related to the primary key, and not to other columns.If a partial dependency
exists, we can divide the table to remove the partially dependent attributes and
move them to some other table where they fit in well.

Example: Let us take an example of the following <EmployeeProjectDetail> table


to understand what is partial dependency and how to normalize the table to the
second normal form:

EmployeeProjectDetail Table:

Employee Code Project ID Employee Name Project Name

101 P03 John Project103


101 P01 John Project101

102 P04 Ryan Project104

103 P02 Stephanie Project102

In the above table, the prime attributes of the table are Employee Code and
Project ID. We have partial dependencies in this table because Employee Name
can be determined by Employee Code and Project Name can be determined by
Project ID. Thus, the above relational table violates the rule of 2NF.

To remove partial dependencies from this table and normalize it into second
normal form, we can decompose the <EmployeeProjectDetail> table into the
following three tables:

EmployeeDetail Table:

Employee Code Employee Name

101 John

101 John

102 Ryan

103 Stephanie

EmployeeProject Table:

Employee Code Project ID

101 P03

101 P01

102 P04
103 P02

ProjectDetail Table:

Project ID Project Name

P03 Project103

P01 Project101

P04 Project104

P02 Project102

Thus, we’ve converted the <EmployeeProjectDetail> table into 2NF by


decomposing it into <EmployeeDetail>, <ProjectDetail> and <EmployeeProject>
tables. As you can see, the above tables satisfy the following two rules of 2NF as
they are in 1NF and every non-prime attribute is fully dependent on the primary
key.

The relations in 2NF are clearly less redundant than relations in 1NF. However,
the decomposed relations may still suffer from one or more anomalies due to the
transitive dependency. We will remove the transitive dependencies in the Third
Normal Form.

3. Third Normal Form


The normalization of 2NF relations to 3NF involves the elimination of transitive
dependencies in DBMS.

A functional dependency X -> Z is said to be transitive if the following three


functional dependencies hold:

● X -> Y
● Y does not -> X
● Y -> Z

For a relational table to be in third normal form, it must satisfy the following rules:

● The table must be in the second normal form.


● No non-prime attribute is transitively dependent on the primary key.
● For each functional dependency X -> Z at least one of the following
conditions hold:
○ X is a super key of the table.
○ Z is a prime attribute of the table.

If a transitive dependency exists, we can divide the table to remove the


transitively dependent attributes and place them to a new table along with a copy
of the determinant.

Example: Let us take an example of the following <EmployeeDetail> table to


understand what is transitive dependency and how to normalize the table to the
third normal form:

EmployeeDetail Table:

Employee Employee Employee Employee


Code Name Zipcode City

101 John 110033 Model Town

101 John 110044 Badarpur

102 Ryan 110028 Naraina

103 Stephanie 110064 Hari Nagar

The above table is not in 3NF because it has Employee Code -> Employee City
transitive dependency because:

● Employee Code -> Employee Zipcode


● Employee Zipcode -> Employee City
Also, Employee Zipcode is not a super key and Employee City is not a prime
attribute.
To remove transitive dependency from this table and normalize it into the third
normal form, we can decompose the <EmployeeDetail> table into the following
two tables:

EmployeeDetail Table:

Employee Code Employee Name Employee Zipcode

101 John 110033

101 John 110044

102 Ryan 110028

103 Stephanie 110064

EmployeeLocation Table:

Employee Zipcode Employee City

110033 Model Town

110044 Badarpur

110028 Naraina

110064 Hari Nagar

Thus, we’ve converted the <EmployeeDetail> table into 3NF by decomposing it


into <EmployeeDetail> and <EmployeeLocation> tables as they are in 2NF and
they don’t have any transitive dependency.
The 2NF and 3NF impose some extra conditions on dependencies on candidate
keys and remove redundancy caused by that. However, there may still exist some
dependencies that cause redundancy in the database. These redundancies are
removed by a more strict normal form known as BCNF.

4. Boyce-Codd Normal Form


Boyce-Codd Normal Form(BCNF) is an advanced version of 3NF as it contains
additional constraints compared to 3NF.

For a relational table to be in Boyce-Codd normal form, it must satisfy the


following rules:
1. The table must be in the third normal form.
2. For every non-trivial functional dependency X -> Y, X is the superkey of the
table. That means X cannot be a non-prime attribute if Y is a prime
attribute.

A superkey is a set of one or more attributes that can uniquely identify a row in a
database table.
Example: Let us take an example of the following <EmployeeProjectLead> table
to understand how to normalize the table to the BCNF:

EmployeeProjectLead Table:

Employee Code Project ID Project Leader

101 P03 Grey

101 P01 Christian

102 P04 Hudson

103 P02 Petro

The above table satisfies all the normal forms till 3NF, but it violates the rules of
BCNF because the candidate key of the above table is {Employee Code, Project
ID}. For the non-trivial functional dependency, Project Leader -> Project ID, Project
ID is a prime attribute but Project Leader is a non-prime attribute. This is not
allowed in BCNF.

To convert the given table into BCNF, we decompose it into two tables:

EmployeeProject Table:

Employee Code Project ID

101 P03

101 P01

102 P04

103 P02

ProjectLead Table:

Project Leader Project ID

Grey P03

Christian P01

Hudson P04

Petro P02

Thus, we’ve converted the <EmployeeProjectLead> table into BCNF by


decomposing it into <EmployeeProject> and <ProjectLead> tables.

5. Fourth Normal Form


● A relation will be in 4NF if it is in Boyce Codd normal form and has no
multi-valued dependency.
● For a dependency A → B, if for a single value of A, multiple values of B exists,
then the relation will be a multi-valued dependency.
Example:

Student Table:

STU_ID COURSE HOBBY

21 Computer Dancing

21 Math Singing

34 Chemistry Dancing

74 Biology Cricket

59 Physics Hockey

The given STUDENT table is in 3NF, but the COURSE and HOBBY are two
independent entity. Hence, there is no relationship between COURSE and HOBBY.

In the STUDENT relation, a student with STU_ID, 21 contains two courses,


Computer and Math and two hobbies, Dancing and Singing. So there is a
Multi-valued dependency on STU_ID, which leads to unnecessary repetition of
data.

So to make the above table into 4NF, we can decompose it into two tables:

StudentCourse Table:

STU_ID COURSE

21 Computer

21 Math

34 Chemistry

74 Biology
59 Physics

StudentHobby Table:

STU_ID HOBBY

21 Dancing

21 Singing

34 Dancing

74 Cricket

59 Hockey

6. Fifth Normal Form


● A relation is in 5NF if it is in 4NF and not contains any join dependency and
joining should be lossless.
● 5NF is satisfied when all the tables are broken into as many tables as possible in
order to avoid redundancy.
● 5NF is also known as Project-join normal form (PJ/NF).

Example:

SUBJECT LECTURER SEMESTER

Computer Anshika Semester 1

Computer John Semester 1

Math John Semester 1

Math Akash Semester 2


Chemistry Praveen Semester 1

In the above table, John takes both Computer and Math class for Semester 1 but
he doesn't take Math class for Semester 2. In this case, combination of all these
fields required to identify a valid data.

Suppose we add a new Semester as Semester 3 but do not know about the
subject and who will be taking that subject so we leave Lecturer and Subject as
NULL. But all three columns together acts as a primary key, so we can't leave
other two columns blank.

So to make the above table into 5NF, we can decompose it into three relations P1,
P2 & P3:

P1 Table:

SEMESTER SUBJECT

Semester 1 Computer

Semester 1 Math

Semester 1 Chemistry

Semester 2 Math

P2 Table:

SUBJECT LECTURER

Computer Anshika

Computer John

Math John

Math Akash

Chemistry Praveen
P3 Table:

SEMESTER LECTURER

Semester 1 Anshika

Semester 1 John

Semester 1 John

Semester 2 Akash

Semester 1 Praveen

Relational Decomposition

● When a relation in the relational model is not in appropriate normal form then the
decomposition of a relation is required.
● In a database, it breaks the table into multiple tables.
● If the relation has no proper decomposition, then it may lead to problems like
loss of information.
● Decomposition is used to eliminate some of the problems of bad design like
anomalies, inconsistencies, and redundancy.

Rules for Decomposition

Whenever we decompose a relation, there are certain properties that must be satisfied
to ensure no information is lost while decomposing the relations. These properties are:

1. Lossless Join Decomposition.


2. Dependency Preserving.
Lossless Join Decomposition
A lossless Join decomposition ensures two things:

● No information is lost while decomposing from the original relation.


● If we join back the sub decomposed relations, the same relation that was
decomposed is obtained.

We can follow certain rules to ensure that the decomposition is a lossless join
decomposition Let’s say we have a relation R and we decomposed it into R1 and R2,
then the rules are:

1. The union of attributes of both the sub relations R1 and R2 must contain
all the attributes of original relation R.
R1 ∪ R2 = R
2. The intersection of attributes of both the sub relations R1 and R2 must not
be null, i.e., there should be some attributes that are present in both R1
and R2.
R1 ∩ R2 ≠ ∅
3. The intersection of attributes of both the sub relations R1 and R2 must be
the superkey of R1 or R2, or both R1 and R2.
R1 ∩ R2 = Super key of R1 or R2

Example: Let’s see an example of a lossless join decomposition. Suppose we have the
following relation EmployeeProjectDetail as:
EmployeeProjectDetail Table:

Employee_Code Employee_Name Employee_Email Project_Name Project_ID

101 John [email protected] Project103 P03


m

101 John [email protected] Project101 P01


m

102 Ryan [email protected] Project102 P02


om

103 Stephanie stephanie@abc. Project102 P02


com

Now, we decompose this relation into EmployeeProject and ProjectDetail relations as:

EmployeeProject Table:

Employee_Code Project_ID Employee_Name Employee_Email

101 P03 John [email protected]


m

101 P01 John [email protected]


m

102 P04 Ryan [email protected]


om

103 P02 Stephanie stephanie@abc.


com

The primary key of the above relation is {Employee_Code, Project_ID}.

ProjectDetail Table:
Project_ID Project_Name

P03 Project103
P01 Project101

P04 Project104

P02 Project102

The primary key of the above relation is {Project_ID}.


Now, let’s see if this is a lossless join decomposition by evaluating the rules discussed
above:
Let’s first check the EmployeeProject ∪ ProjectDetail:

(EmployeeProject ∪ ProjectDetail) Table:

Employee_Code Project_ID Employee_Name Employee_Email Project_Name

101 P03 John [email protected] Project103

101 P01 John [email protected] Project101

102 P04 Ryan [email protected] Project104

103 P02 Stephanie [email protected] Project102

As we can see all the attributes of EmployeeProject and ProjectDetail are in


EmployeeProject ∪ ProjectDetail relation and it is the same as the original relation. So
the first condition holds.
Now let’s check the EmployeeProject ∩ ProjectDetail:

(EmployeeProject ∩ ProjectDetail) Table:

Project_ID

P03

P01

P04

P02
As we can see this is not null, so the the second condition holds as well. Also the
EmployeeProject ∩ ProjectDetail = Project_Id. This is the super key of the ProjectDetail
relation, so the third condition holds as well.

Now, since all three conditions hold for our decomposition, this is a lossless join
decomposition.

Lossy Decomposition
In a lossy decomposition, one or more of these conditions would fail and we will not be
able to recover Complete information as present in the original relation.

Example: let's say we decompose our original relation EmployeeProjectDetail into


EmployeeProject and ProjectDetail relations as:

EmployeeProject Table:

Employee_Code Employee_Name Employee_Email

101 John [email protected]

102 Ryan [email protected]

103 Stephanie [email protected]

The primary key of the above relation is {Employee_Code}.

ProjectDetail Table:

Project_ID Project_Name

P03 Project103

P01 Project101

P04 Project104

P02 Project102
The primary key of the above relation is {Project_ID}.
Now, the intersection EmployeeProject ∩ ProjectDetail is null. Therefore there is no way
for us to map a project to its employees. Thus this is a lossy decomposition.

Dependency Preserving

The second property of lossless decomposition is dependency preservation which says


that after decomposing a relation R into R1 and R2, all dependencies of the original
relation R must be present either in R1 or R2 or they must be derivable using the
combination of functional dependencies present in R1 and R2.

Let’s understand this from the same example above:

EmployeeProjectDetail Table:

Employee_Code Employee_Name Employee_Email Project_Name Project_ID

101 John [email protected] Project103 P03

101 John [email protected] Project101 P01

102 Ryan [email protected] Project104 P04


om

103 Stephanie stephanie@abc. Project102 P02


com

In this relation we have the following FDs:


● Employee_Code -> {Employee_Name, Employee_Email}
● Project_ID - > Project_Name
Now, after decomposing the relation into EmployeeProject and ProjectDetail as:

EmployeeProject Table:
Employee_Code Project_ID Employee_Name Employee_Email

101 P03 John [email protected]

101 P01 John [email protected]

102 P04 Ryan [email protected]

103 P02 Stephanie [email protected]

In this relation we have the following FDs:


Employee_Code -> {Employee_Name, Employee_Email}

ProjectDetail Table:

Project_ID Project_Name

P03 Project103

P01 Project101

P04 Project104

P02 Project102

In this relation we have the following FDs:


Project_ID - > Project_Name
As we can see that all FDs in EmployeeProjectDetail are either part of the
EmployeeProject or the ProjectDetail, So this decomposition is dependency preserving.

Difference in Lossless and Lossy Decompostion

Aspect Lossless Decomposition Lossy Decomposition


Does not guarantee
Preserves functional
preservation of functional
dependencies and ensures
dependencies and may
Integrity Preservation that the original data can be
result in the loss of some
reconstructed without any
data during
loss.
decomposition.

May introduce redundancy


Eliminates redundancy to
due to data loss and
the maximum extent
Data Redundancy compression techniques
possible while maintaining
used during
data integrity.
decomposition.

Reconstruction of the
Allows for the complete
original data is not
reconstruction of the
Reconstruction possible since some
original data without any
information is intentionally
loss or distortion.
discarded or compressed.

Generally less More storage-efficient


storage-efficient compared compared to lossless
Storage Efficiency to lossy decomposition as decomposition as it aims
it aims to preserve all the to discard or compress
original data. less critical information.

May have improved query Query processing may be


processing and affected due to data loss
optimization capabilities and the need for additional
Query Processing
due to preserved steps to compensate for
dependencies and data missing or compressed
consistency. information.
Suitable for scenarios
Suitable for scenarios where data volume
where data integrity is reduction and storage
critical, such as financial efficiency are prioritized
Application systems or applications over complete
that require accurate preservation of the original
representation of the data, such as multimedia
original data. applications or data
compression techniques.

Questions Answers

Q1. How does BCNF differ from 3NF? Prove that BCNF is stronger than 3NF with
example.
Sol.

Parameters 3NF BCNF

Strength 3NF is comparatively less BCNF is comparatively much


strong than that of the BCNF. stronger than that of the 3NF.

Functional The functional dependencies The functional dependencies in


Dependencies in 3NF already exist in 2NF and BCNF already exist in 3NF, 2NF,
INF. and INF.

Redundancy 3NF has a comparatively much BCNF has a comparatively


higher redundancy. much lower redundancy.

Functional In the case of 3NF, In the case of BCNF, there is no


Dependencies preservation occurs for all the preservation for all the
functional dependencies. functional dependencies.

Lossless Lossless decomposition is Lossless decomposition is


Decomposition comparatively much easier to comparatively much harder to
achieve in the case of 3NF. achieve in the case of BCNF.
Every BCNF relation is of 3NF also, but every 3NF relation does not need to be of BCNF.
The BCNF acts like an extension to the 3NF. BCNF has of more strict rules as compared
to 3NF. Therefore, BCNF is stronger than 3NF.

Q.2 What do you mean by functional dependency set and attribute closure
Sol. Functional Dependency Set: Functional Dependency set or FD set of a relation is
the set of all FDs present in the relation. For Example, FD set for relation STUDENT
shown in table 1 is:

{ STUD_NO->STUD_NAME, STUD_NO->STUD_PHONE, STUD_NO->STUD_STATE,


STUD_NO->STUD_COUNTRY, STUD_NO -> STUD_AGE, STUD_STATE->STUD_COUNTRY }

Attribute Closure: Attribute closure of an attribute set can be defined as set of


attributes which can be functionally determined from it.
How to find attribute closure of an attribute set?
To find attribute closure of an attribute set:

● Add elements of attribute set to the result set.


● Recursively add elements to the result set which can be functionally determined
from the elements of the result set.

Using FD set of table 1, attribute closure can be determined as:

(STUD_NO)+ = {STUD_NO, STUD_NAME, STUD_PHONE, STUD_STATE, STUD_COUNTRY,


STUD_AGE}
(STUD_STATE)+ = {STUD_STATE, STUD_COUNTRY}
How to find Candidate Keys and Super Keys using Attribute Closure?

● If attribute closure of an attribute set contains all attributes of relation, the


attribute set will be super key of the relation.
● If no subset of this attribute set can functionally determine all attributes of the
relation, the set will be candidate key as well. For Example, using FD set of table
1,

(STUD_NO, STUD_NAME)+ = {STUD_NO, STUD_NAME, STUD_PHONE, STUD_STATE,


STUD_COUNTRY, STUD_AGE}
(STUD_NO)+ = {STUD_NO, STUD_NAME, STUD_PHONE, STUD_STATE, STUD_COUNTRY,
STUD_AGE}
(STUD_NO, STUD_NAME) will be super key but not candidate key because its subset
(STUD_NO)+ is equal to all attributes of the relation. So, STUD_NO will be a candidate
key.

Q3. Consider the relation scheme R = {E, F, G, H, I, J, K, L, M, N} and the set of


functional dependencies {{E, F} -> {G}, {F} -> {I, J}, {E, H} -> {K, L}, K -> {M}, L -> {N} on R.
What is the key for R?
A. {E, F}
B. {E, F, H}
C. {E, F, H, K, L}
D. {E}
Sol. Finding attribute closure of all given options, we get:
{E,F}+ = {EFGIJ}
{E,F,H}+ = {EFHGIJKLMN}
{E,F,H,K,L}+ = {{EFHGIJKLMN}
{E}+ = {E}
{EFH}+ and {EFHKL}+ results in set of all attributes, but EFH is minimal. So it will be
candidate key. So correct option is (B).

Q4. Algorithm to find attribute closure?


Sol.
Algorithm to compute a+, the closure of a under F

Result:= a;
while (changes to Result) do
for each B → Y in F do
Begin
if B ⊆ Result then Result := Result ∪ Y
End

Q.5 A set of FD’s for the relation R{A,B,C,D} is A->B, B ->C, D->ABC, AC-> D
Sol. We can find the minimal cover by following the 3 simple steps.

Step: 1 First split the all left-hand attributes of all FDs (functional dependencies).
A->B, B->C, D->A, D->B, D->C, AC->D
[Note: We can't split AC->D as A->D, C->D]
Step: 2 Now remove all redundant FDs.
[Redundant FD is if we derive one FD from another FD ]
Let, 's test the redundance of A->B
A+ = A (A is only closure contains to A, simply we can derive A from A)
So, A->B is not redundant.
Similarly, B->C is not redundant.
But, D->B and D->C is redundant because D+= A and A+=B, So D+=B can be
derived which means D->B is redundant.
So, We remove D->B from the FDs set.
Now, check for D->C, it is not redundant. because we can't D+=B and B+=C as we
remove D->B from the list.
At last, we check for AC->D. This is also not redundant.
AC+=AC
So, the final FDs are: A->B, B->C, D->A,D->C, AC->D
Step: 3 Find the Extraneous attribute and remove it.
In this case, we should only check ->D. Simply we can say the right-hand
attributes are pointed by only one attribute at one time.
AC->D, either A or C, or none can be extraneous.
If A=+ C then C is extraneous and it can be removed.
If C+=A then A is extraneous and it can be removed.
So, the final FDs are: A->B, B->C, D->A,D->C, AC->D

Hence, we can write it as A->B, B->C, D->AC, AC->D this is the minimum cover.

Yes every set of functional dependency have a minmal set


It is not unique their can be multiple minimal sets

You might also like