DBMS Practice Questions
DBMS Practice Questions
What is DBMS?
A Database Management System (DBMS) is software that allows users to create, access, and
manage databases, providing an interface between the user and the data. It includes functions
such as creating and maintaining the database structure, managing user access, and
optimizing database performance. It is widely used in many industries and applications.
Overall, a DBMS offers greater functionality, reliability, and security than a file
system, making it the preferred choice for managing large amounts of data in
many industries and applications.
Logical level: This level of abstraction describes the logical structure of data
and how it relates to other data. It defines the relationships between data
View level: This is the highest level of abstraction and provides a user-friendly
interface for users to access and interact with the data. It describes how data
is presented to users, such as through reports, forms, or queries, and hides the
underlying complexity of the data storage and retrieval processes.
Physical data independence: This refers to the ability to modify the physical
storage of data without affecting the application programs that use the data.
For example, if a database administrator decides to move the data from one
storage device to another, or to change the storage format, the application
programs should not be affected. Physical data independence is achieved
through the use of a DBMS that separates the physical storage details from the
logical structure of the data.
Logical data independence: This refers to the ability to modify the logical
structure of the data without affecting the application programs that use the
data. For example, if a database administrator decides to add a new field to a
table or to reorganize the table relationships, the application programs should
not be affected. Logical data independence is achieved through the use of a
DBMS that separates the logical structure of the data from the physical storage
details.
Improved flexibility: Changes to the schema can be made without affecting the
applications that use the data, making it easier to modify the database as
business needs change.
Improved scalability: Data independence makes it easier to add new data or
modify existing data, allowing the database to grow and evolve as needed.
Overall, data independence is a key feature of DBMS that improves the flexibility,
maintainability, and scalability of databases, making them more useful for businesses and
organizations.
Data Storage Layer: The data storage layer is responsible for storing and
retrieving data from the database. It includes components such as the file
manager, the buffer manager, and the index manager. The data storage layer
communicates with the application logic layer to retrieve or modify data,
and it also ensures that data is stored safely and securely.
Increased Latency: The use of multiple layers can lead to increased latency
in the system, as each layer adds some overhead to the processing of user
requests.
Single Point of Failure: The application logic layer can be a single point of
failure, as it handles all the processing of user requests. If this layer fails,
the entire system may become unavailable.
Performance: The performance of the system may be impacted if the layers
are not properly optimized or if there is a high volume of data being
transferred between layers.
Database Design: DBAs are responsible for designing the database schema,
including defining tables, columns, and relationships between tables.
Backup and Recovery: DBAs are responsible for creating and managing
database backups, as well as developing and testing disaster recovery
plans.
Data Migration: DBAs may be responsible for migrating data from one
system to another, ensuring that data is transferred accurately and without
loss.
Capacity Planning: DBAs must plan for future growth of the database
system, including estimating the amount of storage and computing
resources needed to support projected growth.
Overall, the role of a DBA is critical to ensuring the smooth operation and
reliability of a database system.
ENTITY RELATIONSHIP DATA MODEL
Unary Relationship: A unary relationship is a relationship between one entity and itself. This
type of relationship is also known as a reflexive relationship. For example, an employee may
have a relationship with their supervisor, who is also an employee.
Binary Relationship: A binary relationship is a relationship between two entities. This is the
most common type of relationship in a relational database. For example, a customer may
place an order for a product.
Total participation means that every instance of the parent entity must participate in the
relationship with at least one instance of the child entity. In other words, the participation is
mandatory. This is denoted by a solid line connecting the parent entity and the relationship in
an entity-relationship diagram. For example, in a database for a university, every department
must have at least one instructor. So, the relationship between the "Department" entity and
the "Instructor" entity would have total participation.
Partial participation means that an instance of the parent entity may or may not
participate in the relationship with an instance of the child entity. In other words, the
for a library, a book may or may not have an associated author. So, the relationship
between the "Book" entity and the "Author" entity would have partial participation.
ER diagrams :-
What is an attribute and enlist the types of attributes?
In a relational database, an attribute refers to a characteristic or property of an entity. It
is a column in a table that stores values related to the entity.
Simple attribute: A simple attribute is an atomic value that cannot be further divided.
For example, "age" can be a simple attribute for a person entity.
Single-valued attribute: A single-valued attribute is one that can only have one value for
each instance of an entity. For example, "date of birth" can be a single-valued attribute
for a person entity.
Multi-valued attribute: A multi-valued attribute is one that can have multiple values for
each instance of an entity. For example, "hobbies" can be a multi-valued attribute for a
person entity.
Derived attribute: A derived attribute is one that is derived or calculated from other
attributes in the same entity. For example, "age" can be a derived attribute for a person
entity, which is calculated based on the "date of birth" attribute.
Key attribute: A key attribute is an attribute that uniquely identifies each instance of an
entity. It can be a simple attribute or a composite attribute. For example, "student ID"
can be a key attribute for a student entity.
Attributes are important for organizing and categorizing data in a relational database.
They help to define the structure of entities and relationships and ensure that data is
consistent and well-organized.
Superkey: A superkey is a set of one or more attributes that can uniquely identify a
record in a table.
Primary key: A primary key is a candidate key that is selected to be the main key for a
table. It must be unique, non-null, and should not change over time.
Alternate key: An alternate key is a candidate key that is not chosen to be the primary
key.
Foreign key: A foreign key is a field in one table that refers to the primary key of
another table. It is used to establish relationships between tables.
Composite key: A composite key is a key that is made up of two or more fields. It can be
used to uniquely identify a record in a table when no single field can do so.
The EER model allows for more flexible and expressive database designs than the ER
model, and is particularly useful for modeling complex relationships between entities.
However, it can also be more difficult to design and implement, and may require more
advanced skills and tools.
Write note on generalization ?
Generalization is a process of abstracting common attributes and relationships
from a set of entities and grouping them into higher-level entities, known as
supertypes.
Generalization allows for the creation of hierarchies of entities, with increasingly
general concepts at higher levels and increasingly specific concepts at lower
levels.
Generalization can be used to represent complex relationships among entities,
such as the "is-a" relationship.
Generalization can help to simplify the design of a database by reducing the
number of entities and relationships that need to be modeled explicitly.
Generalization can also make the design of a database more flexible and
adaptable to changing requirements, by allowing for the creation of new subtypes
or changes to existing subtypes without affecting the overall structure of the
database.
For example, in a database for a hospital, one might have entities such as doctors,
nurses, and patients. By identifying common attributes such as name, address, and
contact information, and grouping them into a higher-level entity called "person," one
can create a more general concept that includes all individuals associated with the
hospital.
Tables: Tables are the basic building blocks of a relational database schema. They
contain rows of data and columns that define the attributes or fields of the data.
Columns: Columns are also known as attributes, fields, or properties. They define
the data type and structure of the data in the table.
Keys: Keys are used to uniquely identify rows in a table. They can be primary
keys, which uniquely identify each row in a table, or foreign keys, which link two
tables together.
Relationships: Relationships describe the connections between tables in a
database. They define how data in one table is related to data in another table.
Constraints: Constraints are used to enforce rules and restrictions on the data in
the database. They can be used to ensure data integrity, such as ensuring that
only valid data is entered into the database.
In order to create a relational schema, a database designer must first identify the
entities and their relationships. This is usually done using an entity-relationship
diagram (ERD). Once the relationships have been identified, the database designer can
begin creating the tables and columns, defining keys and relationships, and applying
constraints to ensure data integrity.
What is schema ?
Schema can be defined as a blueprint or a plan that describes the structure of a
database, including its tables, columns, relationships, and constraints.
It provides a framework for organizing and understanding the data stored in a
database. A schema helps in maintaining the integrity of the database and ensures that
the data is stored in a structured and organized manner.
Primary key: A primary key is a field or set of fields that uniquely identifies each record
in a table. It cannot contain null values and must be unique across all records in the
table. For example, in a table of students, the student ID could be the primary key.
Foreign key: A foreign key is a field or set of fields that refers to the primary key of
another table. It is used to establish a relationship between two tables. For example, in
a table of orders, the customer ID could be a foreign key that refers to the customer
table.
Candidate key: A candidate key is a field or set of fields that could potentially be used as
a primary key. It must be unique and not contain null values. For example, in a table of
employees, the employee ID and the social security number could both be candidate
keys.
Super key: A super key is a combination of fields that uniquely identifies a record in a
table. It may contain more fields than the primary key, but not all of them are
necessarily required to be unique. For example, in a table of customers, a super key
could be a combination of the customer ID, name, and address.
Composite key: A composite key is a combination of two or more fields that together
uniquely identifies a record in a table. For example, in a table of sales, a composite key
could be a combination of the date and the product ID.
Overall, keys are essential for maintaining the integrity and consistency of a database
by ensuring that each record is uniquely identified and related to other records in a
consistent and accurate manner.
Create a relation for each entity type, with the same attributes as the entity type.
Choose one of the candidate keys of the entity type as the primary key of the relation.
If the entity type has a multi-valued attribute, create a new relation to represent that
attribute.
Example:
Consider an ER diagram containing two entity types - Student and Course. The Student
entity has attributes Student_ID, Name, and Address, while the Course entity has
attributes Course_ID and Title. The primary key for the Student entity is Student_ID,
while the primary key for the Course entity is Course_ID.
Overall, the mapping process from ER/EER models to the relational database involves
careful consideration of the relationships and dependencies among the entities,
attributes, and relationships in the ER/EER model to produce an efficient and reliable
relational database schema.
Explain all relational algebra operators in detail?
Relational algebra is a procedural query language used to retrieve data from a
relational database. There are six primary relational algebra operators:
Aggregation:
Aggregation is used to summarize the data by performing some mathematical
functions on the data of a relation. Commonly used aggregation functions are
COUNT, SUM, MAX, MIN, and AVG.
Example: Suppose we have a relation R(A, B, C) and we want to find the total
sum of values of attribute B, the aggregation operation ∑(B) (R) will give us a
single value which is the sum of all values of B in relation R.
SQL (Structured Query Language)
Characteristics of SQL:
It is a standard language used to communicate with relational
databases.
It is a declarative language, meaning that you only have to specify what
you want to do, and the database management system will take care of
the how.
It is highly expressive, allowing for complex queries and data
manipulation.
It is easy to learn and use.
Advantages of SQL:
It allows for efficient and effective management of large volumes of data.
It is a standard language, which means that databases can be easily
migrated or shared between systems.
It allows for easy data retrieval and manipulation, enabling users to
quickly and easily generate reports and analyze data.
It supports multiple users and can be used in both client-server and
web-based applications.
In summary, DDL is used to define and manage the structure of the database,
DML is used to manipulate data within the tables, TCL is used to control
transactions, and DCL is used to manage the access and privileges of users on
the database objects.
TRUNCATE: This command is used to delete all the data from a table
while keeping its structure intact. For example, the following command
deletes all the data from the "employees" table:
In summary, DDL commands are used to create, modify, and delete the
structure of the database objects, and to add comments to these objects.
INSERT: This command is used to insert new data into a table. For
example, the following command inserts a new row into the
"employees" table:
INSERT INTO employees (id, name, salary) VALUES (1, 'John', 50000);
DELETE: This command is used to delete the existing data from a table.
For example, the following command deletes the employee with id=1
from the "employees" table:
DELETE FROM employees WHERE id = 1;
DCL commands are used to grant or revoke privileges to database users, while
TCL commands are used to control the transactions in a database. Some
commonly used DCL and TCL commands are:
a. GRANT: This command is used to grant privileges to a user or a group of
users in a database. For example, the following command grants the SELECT
privilege on the "employees" table to the user "john":
GRANT SELECT ON employees TO john;
TCL Commands:
Aggregate functions can be used with the GROUP BY clause to group the
results by one or more columns. This allows you to perform aggregate
calculations on subsets of rows within a table.
Table: Customers
Table: Orders
In this example, the "Customers" table has a primary key constraint on the
"customer_id" column to ensure that each customer has a unique identifier.
The "email" column also has a unique constraint to ensure that each email
address is associated with only one customer. The "date_of_birth" column has
a check constraint to ensure that the date is not in the future.
The "Orders" table has a foreign key constraint on the "customer_id" column
to ensure that each order is associated with a valid customer from the
"Customers" table. The "order_date" and "total_amount" columns are also
required and cannot be null.
{ Solve the examples of writing queries for given data to more practice
those questions are very important and they can come for 10 marks }
Relational database Design
There are several normalization techniques, which are listed below along with
examples:
First Normal Form (1NF):
A table is in 1NF if it has no repeating groups, i.e., each column contains atomic values.
For example, a table of students with a column for their favorite courses violates 1NF
since it may contain multiple values. To normalize it, we can create a separate table
for the courses and relate it to the student table using a foreign key.
Second Normal Form (2NF):
A table is in 2NF if it is in 1NF and all non-key columns depend on the entire primary
key. For example, a table of orders with customer name, product name, and price
violates 2NF since product name and price depend only on the product ID, not on the
entire primary key. To normalize it, we can create a separate table for products and
relate it to the order table using a foreign key.
Third Normal Form (3NF):
A table is in 3NF if it is in 2NF and all non-key columns depend only on the primary
key and not on other non-key columns. For example, a table of customers with
customer name, address, and zip code violates 3NF since the zip code depends only
on the address, not on the customer name. To normalize it, we can create a separate
table for zip codes and relate it to the customer table using a foreign key.
Boyce-Codd Normal Form (BCNF):
A table is in BCNF if for every non-trivial functional dependency X → Y, X is a superkey.
This means that every determinant (X) must be a candidate key. For example, a table
of employees with employee ID and department name violates BCNF since
department name depends only on the employee ID, not on the entire primary key. To
normalize it, we can create a separate table for departments and relate it to the
employee table using a foreign key.
Fourth Normal Form (4NF):
A table is in 4NF if it is in BCNF and has no multi-valued dependencies. Multi-valued
dependencies occur when a single value in one table corresponds to multiple values in
another table. For example, a table of employees with skills as a multi-valued attribute
violates 4NF since each employee can have multiple skills. To normalize it, we can
create a separate table for skills and relate it to the employee table using a foreign key.
What is functional dependency?
Functional dependency is a constraint between two sets of attributes in
a relation.
It states that the value of one set of attributes (the determinant)
uniquely determines the value of another set of attributes (the
dependent).
The constraint is often written as A → B, where A is the determinant and
B is the dependent.
Functional dependencies are used in database design to eliminate
redundancy and ensure data integrity.
They can be used to check if a relation is in a normal form, such as 1NF,
2NF, or 3NF.
Violations of functional dependencies can result in data anomalies, such
as update anomalies, insertion anomalies, and deletion anomalies.
Explain Armstrong’s Axioms ?
Armstrong's axioms are a set of rules used in relational database design and
normalization to derive all the functional dependencies in a relation. The
axioms are as follows:
Reflexivity: If A is a set of attributes, then A → A. This means that a set of
attributes is functionally dependent on itself.
Augmentation: If A → B, then A C → B C, where C is a set of attributes
that does not contain A or B. This means that if an attribute is
functionally dependent on another attribute, then it is also functionally
dependent on a set of attributes that includes the first attribute.
Transitivity: If A → B and B → C, then A → C. This means that if two
attributes are functionally dependent on each other, and one of them is
functionally dependent on a third attribute, then the first attribute is
also functionally dependent on the third attribute.
These axioms can be used to derive all the functional dependencies in a
relation, which can then be used to determine the appropriate level of
normalization for the relation.
What are the properties of Armstrong’s Axioms?
The properties of Armstrong's Axioms are as follows:
Closure: The closure of a set of attributes is the set of all attributes that
are functionally dependent on the original set. Armstrong's axioms can
be used to derive the closure of any set of attributes.
Soundness: Any functional dependency that can be derived using
Armstrong's axioms is a valid dependency in the relation.
Minimality: The set of functional dependencies derived using
Armstrong's axioms is minimal, meaning that it contains no redundant
dependencies.
What is decomposition and explain lossless join decomposition and
dependency preservation decomposition?
Decomposition is the process of breaking down a relation into two or more smaller
relations that can be used to store the same information. There are different types of
decomposition, but two common ones are lossless join decomposition and
dependency preservation decomposition.
Lossless join decomposition:
The original relation can be reconstructed exactly by joining the smaller
relations using a join operation.
No information is lost during the decomposition process.
A decomposition is said to be lossless if and only if the join of the
smaller relations results in the original relation.
It is important for maintaining data integrity and consistency.
Dependency preservation decomposition:
This decomposition technique preserves all the functional dependencies
that hold in the original relation.
It ensures that no new dependencies are introduced as a result of
decomposition.
A decomposition is said to preserve dependencies if and only if every
functional dependency that holds in the original relation also holds in at
least one of the smaller relations created by the decomposition.
It is important for avoiding anomalies and maintaining data consistency.
Transaction Management and Concurrency
What is transaction?
A transaction is a logical unit of work performed in a database
management system (DBMS).
A transaction consists of a sequence of database operations (such as
reads and writes) that are executed as a single, indivisible unit.
Transactions ensure that a set of related operations either complete
successfully or fail together as a group.
The ACID properties define the characteristics of a reliable transaction.
ACID stands for Atomicity, Consistency, Isolation, and Durability.
Atomicity ensures that a transaction is treated as a single, indivisible
unit of work. If any part of the transaction fails, the entire transaction is
rolled back and the database is restored to its state prior to the start of
the transaction.
Consistency ensures that a transaction brings the database from one
valid state to another.
Isolation ensures that each transaction is executed in isolation from
other transactions. This means that a transaction cannot see the
intermediate states of other transactions.
Durability ensures that once a transaction is committed, its changes to
the database are permanent and will survive subsequent system
failures.
Discuss ACID properties of transaction ?
ACID (Atomicity, Consistency, Isolation, Durability) properties are the key
characteristics of a transaction in a database management system. Here are
the explanations of each property:
Atomicity: A transaction is considered as a single logical unit of work,
which means that either all the operations of the transaction must be
executed or none of them should be executed. This property ensures
that if any part of a transaction fails, the entire transaction is aborted,
and the database is rolled back to its previous consistent state.
Consistency: This property ensures that a transaction transforms the
database from one consistent state to another. A transaction must
maintain the integrity constraints, which means that the database must
be in a consistent state before and after the transaction execution.
Isolation: This property ensures that the execution of multiple
transactions concurrently will result in the same state as if they had
executed serially in some order. The transactions should be isolated
from each other so that they don't interfere with each other's
operations. This is achieved by using locking mechanisms, which
Durability: Once a transaction is committed, its changes must be
permanent and persistent, even in the case of power failure or system
crash. The changes made by a committed transaction must be written to
non-volatile memory like a hard disk so that they are not lost.
These four properties ensure that a database is consistent, reliable, and
resilient to failures.
What is concurrent execution and list features of concurrent execution?
Concurrent execution refers to the execution of multiple transactions
simultaneously in a database system. The advantages of concurrent execution
are:
Improved throughput: Concurrent execution allows multiple
transactions to be processed simultaneously, resulting in better overall
system performance and higher throughput.
Improved response time: Concurrent execution enables multiple users
to access the database system simultaneously, resulting in faster
response times for queries and updates.
Resource sharing: Concurrent execution enables multiple transactions
to share system resources such as CPU time, disk I/O, and memory,
resulting in more efficient use of system resources.
Increased scalability: Concurrent execution enables a database system
to handle a larger number of transactions, which can help to increase
the scalability of the system.
Improved reliability: Concurrent execution can improve the reliability of
a database system by providing mechanisms for ensuring data
consistency and preventing data corruption due to concurrent access.
Reduced contention: Concurrent execution can help to reduce
contention for system resources by allowing multiple transactions to
execute simultaneously and by providing mechanisms for resolving
conflicts.
Explain the concept of serializability with its types ?
In database management, serializability is a concept used to ensure that
concurrent transactions do not interfere with each other in a way that violates
the consistency of the database. It ensures that the final state of the database
is the same as if the transactions were executed serially in some order.
There are two types of serializability:
Conflict serializability: In conflict serializability, transactions are serialized
based on their conflict with each other. A conflict occurs when two
transactions try to access the same data item, and at least one of them is a
write operation. The transactions are said to be conflict-equivalent if they have
the same set of conflicts.
View serializability: In view serializability, transactions are serialized based
on their order of accessing the data items. It ensures that the result of any
execution of concurrent transactions is the same as if the transactions were
executed in some serial order that is consistent with their read and write
dependencies.
Features of serializability:
Consistency: Serializability ensures that the database remains in a
consistent state before and after the execution of transactions.
Isolation: Serializability ensures that transactions are executed in
isolation, and the results of one transaction do not affect the results of
other transactions.
Durability: Serializability ensures that once a transaction is committed,
its effects become permanent and cannot be rolled back.
Atomicity: Serializability ensures that transactions are executed as
atomic units, which means that either all the operations of a transaction
are executed, or none of them are.
Advantages of serializability:
Data integrity: Serializability ensures that the data in the database
remains consistent and accurate, even when multiple transactions are
executed concurrently.
Reliable: Serializability ensures that the results of the transactions are
reliable and consistent, and do not depend on the order of execution.
Efficient: Serializability ensures that the transactions are executed
efficiently, without any unnecessary delays or conflicts.
Difference between serial and serializable schedule .
Discuss conflict serializability with example ?
Conflict serializability is a property of schedules in database systems that ensures that
the outcome of executing concurrent transactions is equivalent to some serial execution
of those transactions.
Consider two transactions T1 and T2, where T1 transfers $100 from Account A to
Account B, and T2 transfers $50 from Account B to Account C.
The following is an example of a concurrent execution of T1 and T2:
where r(x) denotes a read operation on variable x, and w(x) denotes a write operation on variable x.
Suppose that these transactions execute concurrently, and the following schedule S is produced:
S: r1(x), r2(y), w1(y), w2(x), w1(z), r2(z)
The schedule S is not conflict serializable because there is a conflict between T1 and T2
on variable x and y. However, the schedule S is view serializable. To prove this, we can
create a serial schedule S' that is equivalent to S. One possible serial schedule that is
equivalent to S is:
The schedule S' is equivalent to the concurrent schedule S because it produces the same
final state of the database. Thus, S is view serializable, even though it is not conflict
serializable.
In log-based recovery, the database management system maintains a log file of all
transactions executed on the system. Each transaction is identified by a unique
transaction ID, and the log file records all updates made by the transaction to the
database. The log file is periodically flushed to disk to ensure durability.
When a system failure occurs, the database management system uses the log file to
restore the database to a consistent state. The recovery process is performed in two
phases: redo and undo.
The redo phase involves applying all committed transactions that were not yet written
to disk before the failure occurred. This is done by scanning the log file and applying all
changes made by these transactions to the database.
The undo phase involves rolling back all transactions that were not committed before
the failure occurred. This is done by scanning the log file backwards and undoing all
changes made by these transactions to the database.
Lock-based protocols: These protocols use locks to control access to shared resources
in the database. They ensure that only one transaction can access a resource at a time,
preventing conflicts between transactions. Two popular lock-based protocols are Two-
Phase Locking (2PL) and Timestamp Ordering.
Optimistic protocols: These protocols assume that conflicts between transactions are
rare and allow multiple transactions to proceed without locking. They validate the
transactions after they have completed and undo any transactions that conflict with
others. Two popular optimistic protocols are Timestamp Ordering and Multi-Version
Concurrency Control (MVCC).
Shadow paging is a technique used in database systems to provide support for efficient
and reliable recovery from system failures. Here are some key points about shadow
paging:
The main advantage of shadow paging is that it eliminates the need for a log file,
which can improve the performance of the system by reducing the overhead
associated with logging.
In this technique, changes are made to a copy of the page, known as the shadow
page, and are not immediately applied to the actual database. Instead, the
changes are logged in a separate area of memory called the log file.
Once the transaction is committed, the changes are applied to the database by
copying the shadow page to the actual page.
If a system failure occurs before the transaction is committed, the changes made
to the shadow pages can be undone by simply discarding the pages.
Overall, shadow paging is a useful technique for providing reliable recovery in database
systems, especially in environments where high performance is a key requirement.
In conclusion, each of these approaches has its strengths and weaknesses, and the
choice of approach depends on the specific requirements and constraints of the system.
It is essential to carefully consider each approach and their trade-offs before selecting a
particular approach.