0% found this document useful (0 votes)
33 views

DBDMS Answerkey

Answer

Uploaded by

Priya Elango
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
33 views

DBDMS Answerkey

Answer

Uploaded by

Priya Elango
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 22

DATABASE DESIGN AND MANAGEMENT

ANSWER KEY

Part - A

1.List any two advantages of database systems.

DBMS facilitates seamless data access, ensures data security, and enables real-time decision-
making, making it an indispensable asset for modern businesses and organizations.

2. State the different types of integrity constraints used in designing a relational database.

Domain, entity, referential, and key.

3.Define trivial functional dependency.

The dependency of an attribute on a set of attributes is known as trivial functional dependency if the
set of attributes includes that attribute.

Symbolically: A ->B is trivial functional dependency if B is a subset of A.

The following dependencies are also trivial: A->A & B->B

For example: Consider a table with two columns Student_id and Student_Name.

{Student_Id, Student_Name} -> Student_Id is a trivial functional dependency as Student_Id is a


subset of {Student_Id, Student_Name}. That makes sense because if we know the values of
Student_Id and Student_Name then the value of Student_Id can be uniquely determined.

4.An object is created without any reference to it, how can that object be deleted?

The object can be deleted by garbage collection or by using the DELETE command if there is
no reference to it.

5.Write an efficient relational algebraic expression for the following query.

SELECT BLBANKNAME FROM BANK AS B1, B2 WHERE BLASSETS> B2 ASSETS AND B2.
BANKLOCATION TAMILNADU.

pi_{B1.BLBANKNAME}(\sigma_{B2.BANKLOCATION = ‘TAMILNADU’ \land


B1.BLASSETS > B2.ASSETS}(BANK \times BANK))

6.List the steps involved in query processing.

The first step is known as parsing, and then the second step is query optimization, and finally,
the query execution.

7.Define the term transaction. Give an example.

A transaction is an action or series of actions that are being performed by a single user or application
program, which reads or updates the contents of the database.

A transaction can be defined as a logical unit of work on the database. This may be an entire program,
a piece of a program, or a single command (like the SQL commands such as INSERT or UPDATE),
and it may engage in any number of operations on the database. In the database context, the execution
of an application program can be thought of as one or more transactions with non-database processing
taking place in between.
8.State the benefits of strict two-phase locking.

The strict 2PL mechanism has the advantage of guaranteeing recoverable transactions.

9.Distinguish total rollback from partial rollback.

Total Rollback: A total rollback undoes all the changes made by a transaction, returning the
database to its state before the transaction began. It is used when a transaction fails entirely or
encounters an error that requires undoing all its operations.

Partial Rollback: A partial rollback undoes changes only up to a certain point within a
transaction. It is used when only a part of the transaction fails, allowing the successful parts to
be committed while undoing the unsuccessful operations.

10.State denormalization.

Denormalization is a database optimization technique in which we add redundant data to one


or more tables.

Part – B

11.(a)Classify Database system architecture and explain.

A Database stores a lot of critical information to access data quickly and securely. Hence it is
important to select the correct architecture for efficient data management. DBMS Architecture
helps users to get their requests done while connecting to the database. We choose database
architecture depending on several factors like the size of the database, number of users, and
relationships between the users. There are two types of database models that we generally use,
logical model and physical model. Several types of architecture are there in the database
which we will deal with in the next section.

Types of DBMS Architecture

There are several types of DBMS Architecture that we use according to the usage
requirements. Types of DBMS Architecture are discussed here.

1-Tier Architecture

2-Tier Architecture

3-Tier Architecture

1-Tier Architecture

In 1-Tier Architecture the database is directly available to the user, the user can directly sit on
the DBMS and use it that is, the client, server, and Database are all present on the same
machine. For Example: to learn SQL we set up an SQL server and the database on the local
system. This enables us to directly interact with the relational database and execute
operations. The industry won’t use this architecture they logically go for 2-tier and 3-tier
Architecture.

Advantages of 1-Tier Architecture


Simple Architecture: 1-Tier Architecture is the most simple architecture to set up, as only a
single machine is required to maintain it.

Cost-Effective: No additional hardware is required for implementing 1-Tier Architecture,


which makes it cost-effective.

Easy to Implement: 1-Tier Architecture can be easily deployed, and hence it is mostly used in
small projects.

2-Tier Architecture

The 2-tier architecture is similar to a basic client-server model. The application at the client
end directly communicates with the database on the server side. APIs like ODBC and JDBC
are used for this interaction. The server side is responsible for providing query processing and
transaction management functionalities. On the client side, the user interfaces and application
programs are run. The application on the client side establishes a connection with the server
side to communicate with the DBMS.

An advantage of this type is that maintenance and understanding are easier, and compatible
with existing systems. However, this model gives poor performance when there are a large
number of users.

Advantages of 2-Tier Architecture

Easy to Access: 2-Tier Architecture makes easy access to the database, which makes fast
retrieval.

Scalable: We can scale the database easily, by adding clients or upgrading hardware.

Low Cost: 2-Tier Architecture is cheaper than 3-Tier Architecture and Multi-Tier Architecture

Easy Deployment: 2-Tier Architecture is easier to deploy than 3-Tier Architecture.

Simple: 2-Tier Architecture is easily understandable as well as simple because of only two
components.

3-Tier Architecture

In 3-Tier Architecture, there is another layer between the client and the server. The client does
not directly communicate with the server. Instead, it interacts with an application server which
further communicates with the database system and then the query processing and transaction
management takes place. This intermediate layer acts as a medium for the exchange of
partially processed data between the server and the client. This type of architecture is used in
the case of large web applications.

Advantages of 3-Tier Architecture

Enhanced scalability: Scalability is enhanced due to the distributed deployment of application


servers. Now, individual connections need not be made between the client and server.

Data Integrity: 3-Tier Architecture maintains Data Integrity. Since there is a middle layer
between the client and the server, data corruption can be avoided/removed.

Security: 3-Tier Architecture Improves Security. This type of model prevents direct
interaction of the client with the server thereby reducing access to unauthorized data.
Disadvantages of 3-Tier Architecture

More Complex: 3-Tier Architecture is more complex in comparison to 2-Tier Architecture.


Communication Points are also doubled in 3-Tier Architecture.

Difficult to Interact: It becomes difficult for this sort of interaction to take place due to the
presence of middle layers.

11(B). Construct an Elt-diagram for hospital management system with a set of patients and a set
of doctors. Associate with each patient a log of the various tests and examination conducted

(i)Draw the ER-diagram for Hospital Database.

(ii) For each entity set and relationship used, indicate primary key, 1-1. many to one and one to
many relationships.

12(a) Distinguish between procedural and non procedural languages. Is relational algebra
procedural or non procedural. Explain the operations with example.

1.Procedural Language:

Definition:

A procedural language specifies how to perform a task, detailing the exact steps required to
achieve the desired result. The user must outline a sequence of operations to be executed.

Characteristics:

Control: The user has full control over the process and specifies the exact sequence of actions.

Step-by-Step Execution: It requires the user to describe the algorithm or procedure to achieve
the outcome.

Flexibility: Allows users to optimize performance by fine-tuning the sequence of operations.


Example Languages: C, Java, Python, PL/SQL.

Example in DBMS:

A typical SQL stored procedure or function that involves loops, conditionals, and multiple
SQL statements is procedural. For instance, calculating the total sales for each department by
iterating over records and summing the values in PL/SQL.

2. Non-Procedural Language:

Definition:

A non-procedural language specifies what the task is without requiring the user to detail how
it should be done. The system determines the most efficient way to execute the query.

Characteristics:

Control: The user specifies what they want to achieve, and the DBMS decides the best
method to retrieve or process the data.

Declarative Nature: Users only declare the conditions or relationships they are interested in
without specifying how to fulfill them.

Simplicity: Easier to use and requires less detailed knowledge of the underlying processes.

Example Languages: SQL, Relational Calculus, QBE (Query by Example).

Example in DBMS:

SQL is a non-procedural language where users can write queries like SELECT * FROM
Employee WHERE age > 30; without worrying about how the database will search for the
records.

Explanation of Relational Algebra Operations with Examples

Selection (σ):Retrieves rows from a relation that satisfy a given predicate.

Example: σ_{age > 30}(Employee) selects all employees older than 30.

Projection (π): Retrieves specific columns from a relation, removing duplicates.

Example: π_{name, salary}(Employee) retrieves the names and salaries of all employees.

Union (∪): Combines the results of two relations, removing duplicates.

Example: Employee ∪ Manager combines employees and managers.

Set Difference (−):Retrieves tuples that are in one relation but not in another.

Example: Employee – Manager retrieves employees who are not managers.

Cartesian Product (×):Combines tuples from two relations in all possible ways.

Example: Employee × Department gives all possible pairs of employee and department.

Rename (ρ:Renames the attributes of a relation.

Example: ρ_{E}(Employee) renames the relation Employee to E.


Join (⨝):Combines related tuples from two relations based on a common attribute.

Example: Employee ⨝ Department combines employees with their respective departments


based on a common attribute like DepartmentID.

Each of these operations forms the basis of constructing complex queries in a procedural
manner within relational algebra.

12(B)Consider the following relational database Emplovee (person name, street, city)

Works (person_name, company name, salary)

Company (company name, city)

Manager (person name, manager name) Give an SQL DDL, definition of this database. Identify
referential integrity constraints that should hold, and include them in DDL definition

 Creating the Employee table

CREATE TABLE Employee (

Person_name VARCHAR(100) PRIMARY KEY,

Street VARCHAR(100),

City VARCHAR(100)

);

 Creating the Company table

CREATE TABLE Company (

Company_name VARCHAR(100) PRIMARY KEY,

City VARCHAR(100)

);

 Creating the Works table

CREATE TABLE Works (

Person_name VARCHAR(100),

Company_name VARCHAR(100),

Salary DECIMAL(10, 2),

PRIMARY KEY (person_name, company_name),

FOREIGN KEY (person_name) REFERENCES Employee(person_name) ON DELETE


CASCADE,

FOREIGN KEY (company_name) REFERENCES Company(company_name) ON


DELETE SET NULL

);

 Creating the Manager table


CREATE TABLE Manager (

Person_name VARCHAR(100),

Manager_name VARCHAR(100),

PRIMARY KEY (person_name),

FOREIGN KEY (person_name) REFERENCES Employee(person_name) ON DELETE


CASCADE,

FOREIGN KEY (manager_name) REFERENCES Employee(person_name) ON DELETE


SET NULL

);

13. a)Consider the relation R(A,B,C,D,E) with Functional dependencies {A→BC, CD→E, B→D,
E → A}. Identify superkeys. Find F*

To find the superkeys and F+, we need to follow these steps:

1. Find the closure of each attribute set.

2. Identify the superkeys.

3. Find F+ (the closure of the set of functional dependencies).

Step 1: Find the closure of each attribute set

- a+ = aBC (using a→BC and B→D)

- b+ = bD (using B→D)

- c+ = c (no dependencies to add)

- d+ = dE (using CD→E)

- e+ = eA (using E→A)

- ab+ = abCD (using a→BC and B→D)

- ac+ = ac (no dependencies to add)

- ad+ = adE (using CD→E)

- ae+ = aeABCD (using E→A and a→BC)

- bc+ = bcD (using B→D)

- bd+ = bdE (using CD→E)

- be+ = beAD (using E→A and B→D)

- cd+ = cdE (using CD→E)

- ce+ = ceA (using E→A)

- de+ = deA (using E→A)

- abc+ = abcDE (using a→BC, B→D, and CD→E)


- abd+ = abdE (using B→D and CD→E)

- abe+ = abeCD (using E→A, a→BC, and B→D)

- acd+ = acdE (using CD→E)

- ace+ = ace (no dependencies to add)

- ade+ = ade (no dependencies to add)

- bcd+ = bcdE (using CD→E)

- bce+ = bceAD (using E→A, B→D)

- bde+ = bdeA (using E→A)

- cde+ = cdeA (using E→A)

- abcd+ = abcde (using a→BC, B→D, CD→E)

- abce+ = abceAD (using E→A, a→BC, B→D)

- abde+ = abdeA (using E→A, B→D)

- acde+ = acdeA (using E→A, CD→E)

- bcde+ = bcdeA (using E→A, CD→E)

- abcde+ = abcde (all attributes)

Step 2: Identify the superkeys

A superkey is an attribute set whose closure contains all attributes.

From the closures calculated above, we can see that the following are superkeys:

- abcde (trivial superkey)

- abce

- abde

- acde

- bcde

- abcd (not minimal, but a superkey)

Step 3: Find F+ (the closure of the set of functional dependencies)

F+ is the set of all functional dependencies that can be derived from the given dependencies using the
Armstrong’s axioms.

After analyzing the given dependencies, we can add the following dependencies to F+:

- a→D (using a→BC and B→D)

- a→E (using a→BC, B→D, and CD→E)

- ab→E (using a→BC and CD→E)


- ac→E (using CD→E)

- ad→A (using E→A)

- bc→E (using CD→E)

- bd→A (using E→A)

- cd→A (using E→A)

F+ includes all the original dependencies and the additional ones derived above.

13(b) Discuss the procedure used for loss-less decomposition with an example.

The original relation and relation reconstructed from joining decomposed relations must
contain the same number of tuples if the number is increased or decreased then it is Lossy
Join decomposition.

Lossless join decomposition ensures that never get the situation where spurious tuples are
generated in relation, for every value on the join attributes there will be a unique tuple in one
of the relations.

What is Lossless Decomposition?

Lossless join decomposition is a decomposition of a relation R into relations R1, and R2 such
that if we perform a natural join of relation R1 and R2, it will return the original relation R.
This is effective in removing redundancy from databases while preserving the original data.

In other words by lossless decomposition, it becomes feasible to reconstruct the relation R


from decomposed tables R1 and R2 by using Joins.

Only 1NF,2NF,3NF, and BCNF are valid for lossless join decomposition.

In Lossless Decomposition, we select the common attribute and the criteria for selecting a
common attribute is that the common attribute must be a candidate key or super key in either
relation R1, R2, or both.

Decomposition of a relation R into R1 and R2 is a lossless-join decomposition if at least one


of the following functional dependencies is in F+ (Closure of functional dependencies)

Example of Lossless Decomposition

— Employee (Employee_Id, Ename, Salary, Department_Id, Dname)

Can be decomposed using lossless decomposition as,

— Employee_desc (Employee_Id, Ename, Salary, Department_Id)


— Department_desc (Department_Id, Dname)
Alternatively the lossy decomposition would be as joining these tables is not possible so not
possible to get back original data.

– Employee_desc (Employee_Id, Ename, Salary)


– Department_desc (Department_Id, Dname)

Only data that is relevant to that relation. This can help to reduce data inconsistencies and
errors.

Improved Flexibility: Lossless decomposition can improve the flexibility of the database
system by allowing for easier modification of the schema.

Disadvantages of Lossless Decomposition

Increased Complexity: Lossless decomposition can increase the complexity of the database
system, making it harder to understand and manage.

Increased Processing Overhead: The process of decomposing a relation into smaller relations
can result in increased processing overhead. This can lead to slower query performance and
reduced efficiency.

Join Operations: Lossless decomposition may require additional join operations to retrieve
data from the decomposed relations. This can also result in slower query performance.

Costly: Decomposing relations can be costly, especially if the database is large and complex.
This can require additional resources, such as hardware and personnel.

Conclusion

In Conclusion, a lossless decomposition is an important concept in DBMS that ensures that


the original relation can be reconstructed from the decomposed relations without any loss of
information. The use of Armstrong’s axioms and decomposition algorithms such as BCNF
and 3NF can help achieve lossless decomposition in practice.

14. a) Distinguish recoverable and Non-recoverable schedules. Why is recoverability of


schedules desirable? Are there any circumstances under which it would be desirable to allow
Non-recoverable schedules? Justify your answer.

Aspects Recoverable schedule Non- Recoverable schedule


Definition A schedule where a transaction A schedule where a transaction
commits only after all commits even if the transaction
transactions it depends on have it depends on hasn’t committed
committed. yet.
Consistency Ensures database consistency by Can lead to database
preventing a transaction from inconsistency if a transaction
committing based on commits based on data from
uncommitted data. another transaction that is later
rolled back.
Roll back scenario If a dependent transaction rolls If a dependent transaction is
back, all subsequent transactions rolled back after another
that read its data can also be transaction has committed
rolled back to maintain based on its data, it can cause
consistency. inconsistencies.
Commit dependency A transaction will wait to A transaction may commit even
commit until the transactions before the transactions whose
whose data it has read have data it has read have committed.
committed.
Safety Safer and preferred for Risky and can lead to
maintaining database integrity inconsistent states in the
and consistency. database.
Usage Used in most DBMS Generally avoided due to the
implementations to ensure risk of inconsistency in the
reliable recovery and event of transaction failures.
consistency.

•Recoverable schedules are desirable because failure of a transaction might otherwise bring the
system into an irreversibly inconsistent state.

•A recoverable schedule is one where, for each pair of transactions Ti and Tj such that Tj reads data
items previously written by Ti, the commit operation of Ti appears before the commit operation of Tj.
Recoverable schedules are desirable because failure of a transaction might otherwise bring the system
into an irreversibly inconsistent state. Non recoverable schedules may sometimes be needed when
updates must be made visible early due to time constraints, even if they have not yet been committed,
which may be required for very long duration transactions.

b) What is the need for concurrency control mechanisms? Explain the working of lock-based
protocol.

→ Concurrency control is a very important concept of DBMS which ensures the simultaneous
execution or manipulation of data by several processes or user without resulting in data inconsistency.
Concurrency control provides a procedure that is able to control concurrent execution of the
operations in the database.

1. Data Consistency: It prevents data from getting messed up when multiple users are changing
the same information simultaneously.
2. Isolation: Ensures that each transaction is treated as if it’s the only one happening, so one
person’s work doesn’t interfere with another’s.
3. Deadlock Prevention: Helps avoid situations where transactions get stuck waiting for each
other, ensuring smooth processing.
4. Efficiency: Lets multiple transactions happen at the same time without slowing things down
unnecessarily.
5. Durability and Atomicity: Makes sure all changes in a transaction are either fully completed
or not done at all, protecting the integrity of the data.

→In a database management system (DBMS), lock-based concurrency control (BCC) is used to
control the access of multiple transactions to the same data item. This protocol helps to maintain data
consistency and integrity across multiple users. In the protocol, transactions gain locks on data items
to control their access and prevent conflicts between concurrent transactions.

Lock Based Protocols

A lock is a variable associated with a data item that describes the status of the data item to possible
operations that can be applied to it. They synchronize the access by concurrent transactions to the
database items. It is required in this protocol that all the data items must be accessed in a mutually
exclusive manner. Let me introduce you to two common locks that are used and some terminology
followed in this protocol.

Types of Lock-Based Protocols

1. Simplistic Lock Protocol

It is the simplest method for locking data during a transaction. Simple lock-based protocols enable all
transactions to obtain a lock on the data before inserting, deleting, or updating it. It will unlock the
data item once the transaction is completed.

2. Pre-Claiming Lock Protocol

Pre-claiming Lock Protocols assess transactions to determine which data elements require locks.
Before executing the transaction, it asks the DBMS for a lock on all of the data elements. If all locks
are given, this protocol will allow the transaction to start. When the transaction is finished, it releases
all locks. If all of the locks are not provided, this protocol allows the transaction to be reversed and
waits until all of the locks are granted.

3. Two-phase locking (2PL)

The two-phase locking protocol divides the execution phase of the transaction into three parts.

•In the first part, when the execution of the transaction starts, it seeks permission for the lock it
requires.

•In the second part, the transaction acquires all the locks. The third phase is started as soon as the
transaction releases its first lock.

•In the third phase, the transaction cannot demand any new locks. It only releases the acquired locks.
A transaction is said to follow the Two-Phase Locking protocol if Locking and Unlocking can be done
in two phases:

Growing Phase: New locks on data items may be acquired but none can be released.

Shrinking Phase: Existing locks may be released but no new locks can be acquired.

4. Strict Two-Phase Locking Protocol

Strict Two-Phase Locking requires that in addition to the 2-PL all Exclusive(X) locks held by the
transaction be released until after the Transaction Commits. The first phase of Strict-2PL is similar to
2PL. The only difference between 2PL and strict 2PL is that Strict-2PL does not release a lock after
using it.

•Strict-2PL waits until the whole transaction to commit, and then it releases all the locks at a time.

•Strict-2PL protocol does not have shrinking phase of lock release.

15. a)Describe the features of object- oriented data model.

Need of Object Oriented Data Model :

To represent the complex real world problems there was a need for a data model that is closely related
to real world. Object Oriented Data Model represents the real world problems easily.

Object Oriented Data Model :

In Object Oriented Data Model, data and their relationships are contained in a single structure which
is referred as object in this data model. In this, real world problems are represented as objects with
different attributes. All objects have multiple relationships between them. Basically, it is combination
of Object Oriented programming and Relational Database Model as it is clear from the following
figure :

Object Oriented Data Model


= Combination of Object Oriented Programming + Relational database model
Components of Object Oriented Data Model :

Objects –

An object is an abstraction of a real world entity or we can say it is an instance of class. Objects
encapsulates data and code into a single unit which provide data abstraction by hiding the
implementation details from the user. For example: Instances of student, doctor, engineer in above
figure.

Attribute –

An attribute describes the properties of object. For example: Object is STUDENT and its attribute are
Roll no, Branch, Setmarks() in the Student class.

Methods –

Method represents the behavior of an object. Basically, it represents the real-world action. For
example: Finding a STUDENT marks in above figure as Setmarks().

Class –

A class is a collection of similar objects with shared structure i.e. attributes and behavior i.e. methods.
An object is an instance of class. For example: Person, Student, Doctor, Engineer in above figure.

Class student

Char Name[20];

Int roll_no;
--

--

Public:

Void search();

Void update();

In this example, students refers to class and S1, S2 are the objects of class which can be created in
main function.

Inheritance –

By using inheritance, new class can inherit the attributes and methods of the old class i.e. base class.
For example: as classes Student, Doctor and Engineer are inherited from the base class Person.

Advantages of Object Oriented Data Model :

•Codes can be reused due to inheritance.

•Easily understandable.

•Cost of maintenance can reduced due to reusability of attributes and functions because of inheritance.

Disadvantages of Object Oriented Data Model :

•It is not properly developed so not accepted by users easily.

b)Explain the Hbase data model with example.

•Hbase is a database that is an open-source platform and Hbase is a distributed, scalable, NoSQL
database modeled after Google’s Bigtable and runs on top of Hadoop’s HDFS (Hadoop Distributed
File System).

• The Hbase database is column-oriented thus it makes it unique from other databases. One of the
unique qualities of Hbase is it doesn’t care about data types because we can store different data types
of data for the same column in different rows.

•It contains different sets of tables that maintain the data in key-value format. Hbase is best suitable
for sparse data sets which are very common in the case of big data.

•It can be used to manage structured and semi-structured data.

•The Hbase data model is quite different from traditional relational databases.

Components of Hbase data model:


Table: An Hbase table is a collection of rows, and it is similar to a table in a relational database.
However, Hbase tables do not have a fixed schema, meaning the number of columns and their types
are not predefined.

Row Key: Each row in an Hbase table is identified by a unique row key, which is a byte array. Row
keys are stored in lexicographical order, so data access is fast and efficient when rows are accessed by
key.

Column Families: Columns in Hbase are grouped into column families, which are predefined and
stored together on disk. A table must have at least one column family, and each column family can
have multiple columns.Column families are defined at table creation and are stored as separate files,
which makes retrieval of columns from the same family efficient.

Columns: Columns are identified by their column family and a qualifier. The full name of a column is
formed as column_family:qualifier.

For example, in a column family details, a column name would be represented as


details:name.Columns can be dynamically added to column families.

Cell: The intersection of a row key and a column (column family + qualifier) forms a cell. Each cell
stores a versioned value, where the version is identified by a timestamp (by default) or a custom
version number.The latest version is returned by default when querying.

Timestamp: Hbase allows storing multiple versions of data within a cell. Each version is identified
by a timestamp, which can either be automatically generated by Hbase or provided by the user. The
latest version is always returned by default unless specified otherwise.

Example Hbase Data Model:

Let’s consider an example where we store user information in an Hbase table:

Table Name: user

Row Key: User ID (e.g., user1, user2, …)

Column Families: personal, contact

Columns in Column Family personal: name,age

Columns in Column Family contact: email,phone

Row Key Personal: name Personal: age Contact: email Contact: phone
User 1 John Doe 30 [email protected] 1234567890
User 2 Jane Smith 25 [email protected] 0987654321
m

In this example:

•The row key uniquely identifies each user.

•The column families personal and contact group related columns.

•The personal column family contains the columns name and age.

•The contact column family contains the columns email and phone.
16. a)Describe normalisation upto 3NF and BCNF with example. State the desirable properties
of decomposition.

Normalization:

→Normalization is the process of organizing the data in the database.

→Normalization is used to minimize the redundancy from a relation or set of relations. It is also used
to eliminate undesirable characteristics like Insertion, Update, and Deletion Anomalies.

→Normalization divides the larger table into smaller and links them using relationships.

→The normal form is used to reduce redundancy from the database table.

1. First Normal Form (1NF):


•A relation will be 1NF if it contains an atomic value.
•It states that an attribute of a table cannot hold multiple values. It must hold only single-
valued attribute.
•First normal form disallows the multi-valued attribute, composite attribute, and their
combinations.

Example:

EMPLOYEE TABLE:

EMP_ID EMP_NAME EMP_PHONE EMP_STATE


14 John 7272826385, UP
9064738238
20 Harry 8574783832 Bihar
12 Sam 7390372389, Punjab
8589830302

Decomposition of Employee table into 1NF:

EMP_ID EMP_NAME EMP_PHONE EMP_STATE


14 John 7272826385 UP
14 John 9064738238 UP
20 Harry 8574783832 Bihar
12 Sam 7390372389 Punjab
12 Sam 8589830302 Punjab

2. Second Normal Form(2NF):


• In the 2NF, relational must be in 1NF.
• In the second normal form, all non-key attributes are fully functional dependent on the
primary key.

Example: Let's assume, a school can store the data of teachers and the subjects they teach. In a
school, a teacher can teach more than one subject.

TEACHER TABLE

TEACHER_ID SUBJECT TEACHER _AGE


25 Chemistry 30
25 Biology 30
47 English 35
83 Maths 38
83 Computer 38

The given table, non-prime attribute TEACHER_AGE is dependent on TEACHER_ID which is a


proper subset of a candidate key. That’s why it violates the rule for 2NF.

To convert the given table into 2NF, we decompose it into two tables:

TEACHER_DETAIL table:

TEACHER_ID TEACHER_AGE
25 30
47 35
83 38

TEACHER_SUBJECT table:

TEACHER_ID SUBJECT
25 Chemistry
25 Biology
47 English
83 Maths
83 Computer

3. Third Normal Form(3NF):


•A relation will be in 3NF if it is in 2NF and not contain any transitive partial dependency.
•3NF is used to reduce the data duplication. It is also used to achieve the data integrity.
•If there is no transitive dependency for non-prime attributes, then the relation must be in
third normal form.

Example:

EMPLOYEE _DETAIL table

EMP_ID EMP_NAME EMP_ZIP EMP_STATE EMP_CITY


222 Harry 201010 UP Noids
333 Stephan 02228 US Boston
444 Lan 60007 US Chicago
555 Kathrine 06389 UK Norwich
666 John 462007 MP Bhopal

• Here, EMP_STATE & EMP_CITY dependent on EMP_ZIP and EMP_ZIP dependent on EMP_ID.
The non-prime attributes (EMP_STATE, EMP_CITY) transitively dependent on super key(EMP_ID).
It violates the rule of third normal form.

• That’s why we need to move the EMP_CITY and EMP_STATE to the new <EMPLOYEE _ZIP>
table, with EMP_ZIP as a primary key.

EMPLOYEE table:

EMP_ID EMP_NAME EMP_ZIP


222 Harry 201010
333 Stephan 02228
444 Lan 60007
555 Kathrine 06389
666 John 462007

EMPLOYEE _ZIP table:

EMP_ZIP EMP_STATE EMP_CITY


201010 UP Noida
02228 US Boston
60007 US Chicago
06389 UK Norwich
462007 MP Bhopal

4. Boycee Codd Normal Form(BCNF):


•BCNF is the advance version of 3NF. It is stricter than 3NF.
•A table is in BCNF if every functional dependency X → Y, X is the super key of the table.
•For BCNF, the table should be in 3NF, and for every FD, LHS is super key.

Example: Let’s assume there is a company where employees work in more than one department.

EMP_ID EMP_COUNTRY EMP_DEPT DEPT_TYPE EMP_DEPT_NO

264 India Designing D394 283


264 India Testing D394 300
364 UK Stores D283 232
364 UK Developing D283 549

•The Table is not in BCNF because neither EMP_DEPT nor EMP_ID alone are keys.

•To convert the given table into BCNF, we decompose it into three tables:

EMP_COUNTRY table:
EMP_ID EMP_COUNTRY
264 India
364 UK

EMP_DEPT table:

EMP_DEPT DEPT_TYPE EMP_DEPT_TYPE


Designing D394 283
Testing D394 300
Stores D283 232
Developing D283 549

EMP_DEPT_MAPPING table:

DEPT_TYPE EMP_DEPT_TYPE
D394 283
D394 300
D283 232
D283 549

PROPERTIES OF DECOMPOSITION:

Decomposition refers to the division of tables into multiple tables to produce consistency in the data.
In this article, we will learn about the Database concept. This article is related to the concept of
Decomposition in DBMS. It explains the definition of Decomposition, types of Decomposition in
DBMS, and its properties.

PROPERTIES:

Lossless: All the decomposition that we perform in Database management system should be lossless.
All the information should not be lost while performing the join on the sub-relation to get back the
original relation. It helps to remove the redundant data from the database.

Dependency Preservation: Dependency Preservation is an important technique in database


management system. It ensures that the functional dependencies between the entities is maintained
while performing decomposition. It helps to improve the database efficiency, maintain consistency
and integrity.

Lack of Data Redundancy: Data Redundancy is generally termed as duplicate data or repeated data.
This property states that the decomposition performed should not suffer redundant data. It will help us
to get rid of unwanted data and focus only on the useful data or information.

b) Discuss query optimization with a diagram.

• The query optimizer is a critical component of a Database Management System (DBMS) that
determines the most efficient way to execute a given query.

• It generates various query plans and chooses the one with the least cost, ensuring optimal
performance in terms of time and resource utilization.
Steps of Query Optimization:

1. SQL Query Input:

The process starts when a user or application submits an SQL query to the DBMS. This query
specifies the data to be retrieved or manipulated.

2. Parsing:

Parsing is the first step where the SQL query is checked for syntax errors and then translated into an
internal representation, typically an Abstract Syntax Tree (AST).

Steps in Parsing:

•Lexical Analysis: The query is broken down into tokens such as keywords, identifiers, and
operators.Syntactical Analysis: The sequence of tokens is checked against the SQL grammar rules.

•Semantic Analysis: The query is checked for semantic correctness, such as verifying that the tables
and columns exist.

3.Logical Plan Generation:

Once the query is parsed, a Logical Query Plan is generated. This plan describes the sequence of high-
level operations (e.g., select, project, join) needed to execute the query.

Steps in Logical Plan Generation:

•Transformation Rules: The logical plan is transformed using a set of rules to optimize the sequence
of operations (e.g., pushing selections down to reduce the size of intermediate results).

(projection), ⋈ (join), etc.


•Relational Algebra: The query is represented using relational algebra operations like σ (selection), π

4.Cost Estimation:

The Cost Estimator evaluates different possible execution strategies for the query and assigns a cost to
each. The cost usually considers factors like I/O operations, CPU usage, and memory consumption.

Components of Cost Estimation:


•Statistical Information: The optimizer uses statistics such as table size, index availability, data
distribution, and selectivity of predicates to estimate costs.

•Cost Metrics: Costs are usually estimated in terms of disk I/O (number of page reads and writes),
CPU usage, and response time.

•Plan Alternatives: For each logical operation (e.g., join), multiple physical methods (e.g., nested
loop join, hash join) are considered and their costs are estimated.

5. Physical Plan Generation:

The logical plan is converted into a Physical Query Plan. This plan specifies the actual algorithms and
data access methods that will be used to execute the query.

Physical Operators:

•Table Access Methods: Methods like table scan, index scan, or index seek are chosen based on the
data and indexes available

•Join Algorithms: The optimizer selects between different join methods (e.g., nested loop join, sort-
merge join, hash join) based on cost and data characteristics.

•Sorting and Aggregation: Operations like sorting, grouping, and aggregation are optimized using
efficient algorithms like quicksort, hash aggregation, etc.

6. Plan Selection:

The optimizer compares the costs of all possible physical plans and selects the one with the lowest
estimated cost.

•Search Space: The optimizer explores a large search space of possible execution plans, considering
various transformations and join orders.

•Heuristics vs. Exhaustive Search: Some optimizers use heuristics to limit the search space (e.g.,
limiting the number of joins considered), while others may use exhaustive search methods like
dynamic programming.

7. Result of query:

The selected query execution plan is passed to the Query Executor, which actually retrieves the data
from the database.

•Execution Strategies:Pipelined Execution: Operators produce outputs that are immediately


consumed by the next operator, reducing the need for intermediate storage.

•Materialized Execution: Intermediate results are stored temporarily, and subsequent operations are
performed on these stored results.

You might also like