CS3492 QB
CS3492 QB
3 0 0 3
OBJECTIVES
To learn the fundamentals of data models, relational algebra and SQL
To represent a database system using ER diagrams and to learn normalization techniques
To understand the fundamental concepts of transaction, concurrency and recovery processing
To understand the internal storage structures using different file and indexing techniques whichwill
help in physical DB design
To have an introductory knowledge about the Distributed databases, NOSQL and database
security
UNIT 3 TRANSACTIONS
Transaction Concepts – ACID Properties – Schedules – Serializability – Transaction support in
SQL – Need for Concurrency – Concurrency control –Two Phase Locking- Timestamp –
Multiversion – Validation and Snapshot isolation– Multiple Granularity locking – Deadlock
Handling – Recovery Concepts – Recovery based on deferred and immediate update – Shadow
paging – ARIES Algorithm
TEXT BOOKS:
1. Abraham Silberschatz, Henry F. Korth, S. Sudharshan, ―Database System Concepts‖, Seventh
Edition, McGraw Hill, 2020.
2. Ramez Elmasri, Shamkant B. Navathe, ―Fundamentals of Database Systems‖, Seventh Edition,
Pearson Education, 2017
REFERENCES:
1. C.J.Date, A.Kannan, S.Swamynathan, ―An Introduction to Database Systems‖, Eighth
Edition,Pearson Education, 2006.
COURSE OUTCOME
At the end of course, students will be able to
UNIT I
RELATIONAL DATABASE :
Purpose of Database System – Views of data – Data Models – Database System Architecture –
Introduction to relational databases – Relational Model – Keys – Relational Algebra – SQL fundamentals
Advanced SQL features – Embedded SQL– Dynamic SQL
PART A
1. What is database?(R)
A database is a basic electronic storage with collection of interrelated data, organized to provide
efficient retrieval. Databases are organized by fields, records and files or tables.
14. What are the different data models? (R) (MAY 2012)
Relational Data Model
The Entity-Relationship Data Model
Object-Based Data Model
Semi structured Data Model
31. What is the use of embedded SQL? (U) (MAY 2012) (NOV 2014)
A fundamental principle underlying embedded SQL, which we call the dual-mode
principle, is that any SQL statement that can be used interactively can also be embedded in an
application program.
32.Write short note on OPEN, FETCH and CLOSE statements. (R)
OPEN:
EXEC SQL OPEN < Cursor name>;
Opens the specified cursor. A set of rows is thus identified and becomes the current active set for
the cursor.
FETCH:
EXEC SQL FETCH <Cursor name>
INTO <host variable reference
commalist>; Advances the specified cursor to the next
row in the active set.
CLOSE:
49. Give the usage of the rename operation with an example.(MAY 2010) (U)
1.LosslessDecomposition
2.DependencyPreservation
3. Lack of Data Redundancy
Key A key is a single or combination of multiple fields. Its purpose is to access or retrieve
1. Explain the system structure of a database system with neat block diagram. (R) (DEC 2007), (MAY
2010) (MAY 2012)
2. What is a data model? Explain various data model for describing the design of a database at the logical
level. (U) (APRIL 2008), (MAY 2010)
3. Explain the different between physical and logical data independence with an example. (U)(APRIL 2008)
5. Describe the three-schema architecture. Why do we need mapping between schema Levels? How do
different schema definition languages support this architecture? (R) (DEC 2008)
1. What is the notation used in E-R diagram? Explain the E-R model structure with Example.(U)(NOV
2014)
2. Explain the role and functions of the database administrator. (R)
iv. Views
4. Develop an Entity Relationship model for a library management system. Clearly State the
problem Definition, Description, Business Rule and any assumption you make. (AP)
(MAY 2009) (NOV 2014)
12. (i)State and explain the command DDL,DML,DCL with suitable example(7).
(ii) Justify the need of embedded SQL.Consider the relation
Student (student no, name, mark and grade).Write the embedded SQL statement in C language to
retrieve all the students records whose mark than 90. (6) (Nov/Dec-2017)
13.Discuss about Tuple Relational Calculus and Domain Relational Calculus. (R) (DEC 2008)
(MAY 2012)
14.What are aggregate functions? Explain five built-in aggregate functions. (U) (MAY 2008)
15.Consider the following relations for a company database Application.(AP)(MAY 2009)
(ii) Develop an SQL query to list the department number and the
number of Employee in each department.
(iii) Develop a View that will keep track of the department number, the number of
employees in the department, and the total basic pay expenditure for each
department.
(iv) Develop an SQL query to list the details of employees who have worked in more
than three projects on a day.
16. Explain how dangling tuple may arise and also explain problems that they Cause. (U)
(MAY 2008)
17. Briefly present a survey on Integrity and Security. (U) (MAY 2012)
18. Create an EMPLOYEE table and write the steps for various functions in an SQL Like add,
update, delete, save and join the various attributes of an EMPLOYEE Table.
(C)(MAY 2007)
19. Explain the use of trigger with your own example. (U) (MAY 2010)
20. What is a view? How can it be created? Explain with an example. (R) (MAY 2010)
21.Briefly present a survey on Integrity and Security. (U) (MAY 2012)
Course File
Professor File
Professor Number Name Office
Registration File
Consider a suitable example of tuples/records for the above mentioned tables and write
DML statements (SQL) to answer for the queries listed below:
(iv) For a specific student number, in which courses is the student registered and what is
his/her name?
23.Let relations r1(A,B,C)and r2(C,D,E) have the following properties r1 has 20000 tuples r2
has 45000 tupes ,25 tuples of r1, fit on one block and 30 tuples of r2 fit on one block .Estimate
the number of block transfers and seeks required,using each of the following join strategies for
r1∞r2:
(i)Nested loop join
(ii)Block Nested
loop join
(iii) Merge join
II / III 129
8. What do you mean by functional dependencies? (U) (MAY
2010)(nov2013)(Dec2015)
Let R be a relation variable and let X and Y be arbitrary subsets of the
set of attributes of R. then we say that Y is functionally dependent on-in
symbols,
XY
II / III 130
Let R be a relvar, and let A, B and C be subsets of the attributes of R.
Then we say that B is multi-dependent on A-in symbols, AB
*{A, B …Z}
if and only if every legal value of R is equal to the join of its projections on A,
B,……Z.
II / III 131
Structural aspect: The data in the database is perceived by
the user as tables, and nothing but tables.
Integrity aspect: Those tables satisfy certain integrity constraints.
Manipulative aspect: The operators available to the user for
manipulating those tables- for example, for purposes of data
retrieval – are operators that derive tables from tables. Of
those operators, three particularly important ones are select,
project and join.
25.Mention the important points about relational databases. (R)
A set of important points about relational databases:
a. Relational databases store data in the form of tables (logically).
b. The rows of a table are called as tuples.
c. The columns of a table are known as attributes.
d. Every attribute has a data type associated with it.
e. Every attribute has a domain which provides the set of all possible
values that can be stored as values for that attribute.
f. Tables are called as relations.
g. The table names are called as relational variable
28.What is the difference between Data Integrity and Data Security? (NOV 2013)
(U)
Data integrity and data security are two different aspects that make sure
the usability of data is preserved all the time. Main difference between integrity
and security is that integrity deals with the validity of data, while security deals
with protection of data. Backing up, designing suitable user interfaces and error
detection/correction in data are some of the means to preserve integrity, while
authentication/authorization, encryptions and masking are some of the popular
means of data security. Suitable control mechanisms can be used for both
security and integrity.
29.Which operators are called as unary operators and why are they called so
(NOV 2013) (U)
II / III 132
An operator that takes a single operand in an expression or a statement.
The unary operators in C# are +, -,!, ~, ++, -- and the cast operator.
PART B
1. What is the notation used in E-R diagram? Explain the E-R model structure with
Example.
(U)(NOV 2014)
2. Develop an Entity Relationship model for a library management system. Clearly
State the problem
Definition, Description, Business Rule and any assumption you make. (AP)
(MAY 2009) (NOV 2014)
3.Develop an Entity Relationship model preparation staff (chef) and finalize the
customer‗s bill. The Food preparation staffs (Chefs), with their touch-display
interfaces to the system, are able to view orders sent to the kitchen by waiters.
During preparation, they are able to let the waiter know the status of each item,
and can send notifications when items are completed. The system should have
full accountability and logging facilities, and should support Supervisor actions
to account for exceptional circumstances, such as a meal being refunded or
walked out on. (AP) (Dec2015)
4. Distinguish between lossless-join decomposition and dependency preserving
decomposition(7)
5. State and explain the architecture of DBMS.Draw the ER diagram for banking
systems.(Home loan Application)(13)(Nov/Dec-2017)
II / III 133
UNIT III
TRANSACTIONS
Transaction Concepts – ACID Properties – Schedules – Serializability –
Transaction support in SQL – Need for Concurrency – Concurrency
control –Two Phase Locking- Timestamp – Multiversion – Validation and
Snapshot isolation– Multiple Granularity locking – Deadlock Handling –
Recovery Concepts – Recovery based on deferred and immediate update
– Shadow paging – ARIES Algorithm
PART – A
II / III 134
d. Durability: Once a transaction commits, its updates persist in the
database, even if there is a subsequent system crash.
II / III 135
12. Define concurrency (MAY 2012) (R)
Concurrency refers to the fact that DBMSs typically allow many
transactions to access the same database at the same time.
13. What are the three problems that any concurrency control mechanism must
address? (R)
The three problems are:
Two –phase locking theorem is ―If all transactions obey the two-phase
locking protocol, then all possible interleaved schedules are serializable‖.
II / III 136
acquire any more locks.
20.What is an isolation level? (R)
The isolation level that applies to a given transaction might be defined
as the degree of interference the transaction in question is prepared to tolerate
on the part of concurrent transactions.
Hold and Wait:There must exist a process that is holding at least one
resource and is waiting to acquire additional resources that are
currently being held by other processes.
Circular Wait: There must exist a set {p0, p1,.....pn} of waiting processes
such that p0 is
waiting for a resource which is held by p1, p1 is waiting for a resource which
is held by p2,. .............................................................................................................................
pn-1 is waiting for a resource which is held by pn and pn is waiting for
a resource which is held by p0.
II / III 137
It is simple, but efficient, scheme called the shadow copy schemes. It is based o n
making copies of the database called shadow copies that one transaction is active
at a time. The scheme also
25. What type of locking needed for insert and delete operations (April/May-
2017)
When you execute an INSERT, UPDATE, or DELETE statement, the database server
uses
exclusive locks. An exclusive lock means that no other users can update or delete the
item until the
PART – B
3. What is Deadlock? List and discuss the four conditions for Deadlock. (R)
II / III 138
a. Two-phase locking protocol (2PL).(U)(NOV 2016)
b. Lock based protocol.(U)
10. Explain the deferred and immediate-modification version of the log based
Recovery scheme.
(U)(MAY 2007)
11. Write short notes on shadow paging.(R)( DEC 2007)
12. Describe about the testing of serializability. (U)( DEC 2007) (MAY 2012)
13. What is concurrency control? How is it implemented in DBMS? Explain. (R)(DEC
2007)
14. Explain various recovery techniques during transaction in detail. (R)(MAY 2007)
15. Explain the two-phase commit and three-phase commit protocols.(R)(Dec2015)
16. Consider the following schedules. The actions are listed in the order
they are scheduled, and prefixed with the transaction name.(AN)
(Dec2015)
S1:T1:R(X),T2:R(X),T1:W(Y),T2:W(Y),T1:R(Y),T2:R(Y)
S2:T3:W(X),T1:R(X),T1:W(Y),T2:R(Z),T2:W(Z),T3:R(Z)
For each of the schedule, answer the following questions:
(i) what is the precedence graph for the schedule?(2)
(ii) is the schedule conflict –serializable?if so,what are all the conflict
equivalent serial schedules?(7) (iii)is the schedule view-serializable?if so,
what are all the view equivalent serial schedules?(7)
A
=
0
t
h
e
n
II / III 139
B
:
=
B
+
1
;
W
r
i
t
e
(
B
)
.
T2:read(B);
Read (A);
If B=0 then A:=A+1;
Write (A).
Add lock and unlock instruction to transactions T1 and T2,so that they
observe the two phase locking protocol. Can the execution of these
transactions result in a deadlock?(AP)
(NOV 2016)
UNIT IV
IMPLEMENTATION TECHNIQUES
RAID – File Organization – Organization of Records in Files – Data dictionary Storage –
Column Oriented Storage– Indexing and Hashing –Ordered Indices – B+ tree Index Files –
B tree Index Files – Static Hashing – Dynamic Hashing – Query Processing Overview –
Algorithms for Selection, Sorting and join operations – Query optimization using Heuristics
- Cost Estimation.
PART – A
II / III 140
2. What is main memory?(U)
The storage medium used for data that are available to be operated on
is main memory. The general-purpose machine instructions operate on main
memory. Although main memory may contain many megabytes of data, or
even hundreds of gigabytes of data in large server systems, it is generally too
small for storing the entire database. The contents of main memory are
usually lost if a power failure or system crash occurs.
II / III 141
Used as protection from disk failures.
7. Draw the storage device hierarchy. (MAY 2010)(R)
Cache
Main memory
Flash memory
Magnetic disk
Optical disk
Magnetic tapes
9. What is NAS?(R)
Network attached storage (NAS) is an alternative to SAN. NAS is
much like SAN, except that instead of the networked storage appearing to be
a large disk, it provides a file system interface using networked file system
protocols such as NFS or CIFS.
II / III 142
13. Define rotational latency time.(U)
Once the head has reached the desired track, the time spent
waiting for the sector to be accessed to appear under the head is
called the rotational latency time
17. What are the factors to be taken into account in choosing a RAID level?(R)
The factors to be taken into account in choosing a RAID level are
II / III 143
using one block read.
II / III 144
All files are ordered sequentially on some search key. Such files, with a
clustering index on the search key, are called index-sequential files.
II / III 145
If there is a key value Ki =V
Then pointer Pi directs us to the
desired record or bucket Else no
record with key value k exists.
II / III 146
they do, add tr.ts to the
result
E
n
d
E
n
d
For each
block
Br of r
do
begin
For
each
block
Bs of s
do
begin
For each
tuple tr
in Br
do
begin
For
each
tuple ts
in Bs
do
begin
Test pair (tr, ts) to see if they
satisfy the join condition If
they do, add tr.ts to the result.
E
n
II / III 147
d
E
n
d
E
n
d
E
n
d
Can move records on page without changing rid; so, attractive for
fixed-length records too. Page is full when data space and slot
array meet.
II / III 148
... 2Rid = (i,2)
Rid = (i,1)
16
24
II / III 149
Dynamic Hashing
Searching any data in a B+ tree is very easy because all data are found in leaf
nodes. In a B tree, data cannot be found in leaf nodes.
B+ trees store redundant search key but B tree has no redundant value.
In a B+ tree, leaf nodes data are ordered as a sequential linked list but in B
tree the leaf node cannot be stored using a linked list. Many database
systems' implementations prefer the structural simplicity of a B+ tree.
57. What is a query execution plan?(Apr/May-2017)
Execution plan will be generated by Query optimizer with the help of statistics
and Algebrizer\processor tree.
It is the result of Query optimizer and tells how to do\perform your
work\requirement.
There are two different execution plans - Estimated and Actual.
Estimated execution plan indicates optimizer view.
Actual execution plan indicates what executed the query and how was it done.
58. Which cost component are used most often as the basis for cost function?
(Apr/May-2017)
(i)Access cost to secondary storage
II / III 150
(ii)Memory usage of cost
(iii)Storage cost
59.What is replication transparency?(Apr/May-2017)(U)
Distribution Transparency. Distribution transparency is the property of distributed
databases by the
virtue of which the internal details of the distribution are hidden from the users. The
DDBMS designer may choose to fragment tables, replicate the fragments and store them
at different sites.
60. State the Need for Query Optimization.
PART B
2. What are the steps involved in query processing? How would you estimate the
cost of the query? (U) (MAY 2007)
9. Explain how the RAID systems improve performance and reliability. (U)(DEC
2007)
10. Describe the structure of B+ tree and list the characteristics of a B+ tree.(U)
II / III 151
12. Describe static hashing and dynamic hashing.(U)(DEC 2008). (NOV 2014)
13. Describe in detail about how records are represented in a file and how to
organize them in a file. (AP)(MAY 2012).
14. Explain about spatial and mobile database (U)(NOV 2014)(NOV 2016)
A
D
V
A
N
C
E
D
T
O
P
I
C
S
II / III 152
Unit-V Advanced Topics
Distributed Databases: Architecture, Data Storage, Transaction Processing, Query
processing and optimization – NOSQL Databases: Introduction – CAP Theorem –
Document Based systems – Key value Stores – Column Based Systems – Graph
Databases. Database Security: Security issues – Access control based on privileges –
Role Based access control – SQL Injection – Statistical Database security – Flow
control – Encryption and Public Key infrastructures – Challenges
PART-A
Request(source text)
Terminal from which the operation was invoked
User who invoked the operation
Date and time of the operation
Relvar(s), tuples(s),attribute(s) affected
Before images(old values)
After images(new values)
II / III 153
5. What is entity integrity?(U)
No component of the primary key of any base relvar is allowed to
accept nulls are called entity integrity.
II / III 154
The fundamental principle of distributed database is to the user, a
distributed system should look exactly like a non-distributed system.
II / III 155
16. What is meant by Data warehousing? (NOV 2014)(U)
A data warehouse is a relational database that is designed for query and
analysis rather than for transaction processing. It usually contains historical data
derived from transaction data, but it can include data from other sources. It
separates analysis workload from transaction workload and enables an
organization to consolidate data from several sources.
II / III 156
20. Define Crawling. (NOV 2014)(R)
Web crawling is the process of search engines combing through web
pages in order to properly index them. These ―web crawlers‖ systematically
crawl pages and look at the keywords contained on the page, the kind of content,
all the links on the page, and then returns that information to the search engine‗s
server for indexing. Then they follow all the hyperlinks on the website to get to
other websites. When a search engine user enters a query, the search engine will
go to its index and return the most relevant search results based on the keywords
in the search term. Web crawling is an automated process and provides quick, up
to date data.
II / III 157
26. Define Statistical database.(R)
A statistical database is a database used for statistical analysis
purposes. It is an OLAP (online analytical processing), instead of OLTP
(online transaction processing) system. Modern decision, and classical
statistical databases are often closer to the relational model than the
multidimensional model commonly used in OLAP systems.
28. Write about the four types (Star, Snowflake, Galaxy and Fast
constellation) of Data warehouse schemas.(DEC2015)(R)
1. STAR SCHEMA:Centralized Fact table connect the one or more denormalized data
3. STAR FLAKE SCHEMA:One or more centralized fact table connect the single
denormolized data
Applications:
Cross-Marketing
Basket Data Analysis
Catalog design
II / III 158
PART B
8. Neatly write the K-means algorithm and show the intermediate results in
clustering the below given points into two clusters using K-means algorithm.
P10,0),P21,10),P3:(2,20),P4:(1,15),P5:(1000,2000),P6:(1500,1500),P7:(1
000,1250). (U) (Dec2015)
11. Suppose that you have been hired as a consultant to choose a database system
for your client‗s application .For each of the following applications, state what
type of database system (relational,persistent programming language based
OODB,object relational;do not specify a commercial product)you whould
recommend.Justify your recommendation
12.Trace the results of using the Apriori algorithm on grocery store example to
support threshold s=33.34% and confidence threshold c=60%.Show the candidate
and frequent itemsets foreach database scan.Enumerate all the final frequent
itemsets.Also indicate the association rules that are generated and highlight the
strong ones,sort them by confidence.
Transaction Id Items
T1 HotDogs,Buns,Ketchup
II / III 159
T2 HotDogs,Buns
T3 HotDogs,Coke,Chips
T4 Chips,Coke
T5 Chips,Ketchup
T6 HotDogs,Coke,Chips
II / III 160