0% found this document useful (0 votes)
45 views

UNIT V Imp Questions

Uploaded by

Poonam Chanakya
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
45 views

UNIT V Imp Questions

Uploaded by

Poonam Chanakya
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 12

UNIT-V (1 Mark)

1. Define unclustered index?


Nonclustered indexes have a structure separate from the data
rows. A nonclustered index contains the nonclustered index key values
and each key value entry has a pointer to the data row that contains the
key value. The pointer from an index row in a nonclustered index to a
data row is called a row locator

2. What is Indexing and Hashing?


Indexing: With its pre-organized data structures, Indexing offers
faster data retrieval, especially for range queries and ordered records.
Hashing: Thanks to its direct calculation of data locations, Hashing
outperforms Indexing when searching for specific items, especially in
large databases

3. What is an index? Give an example.


The indices are usually sorted to make searching faster. The
indices which are sorted are known as ordered indices.
Example: Suppose we have an employee table with thousands of record
and each of which is 10 bytes long. If their IDs start with 1, 2, 3....and so
on and we have to search student with ID-543.

4. Discuss about primary indexes.


A primary index is an ordered file whose records are of fixed length
with two fields: The first field is the same as the primary key of data file.
The second field is a pointer to the data block where the primary key is
available.

5. What is meant by secondary index?


A secondary index, put simply, is a way to efficiently access
records in a database (the primary) by means of some piece of
information other than the usual (primary) key.
UNIT-V (3 Marks)
1. Explain what are the differences between tree based and Hash based
indexes
B-tree Indexing
B-tree indexing is widely used in relational database management
systems (RDBMS). It organizes data in a balanced tree structure,
allowing efficient searching, insertion, and deletion operations.
Hash Indexing
Hash indexing utilizes hash functions to map keys to specific
locations in a hash table. It is commonly used in situations where exact
match queries are prevalent.

2. What are the advantages and disadvantages of B+ trees?


Advantages of B+Trees
 A B+ tree with ‘l’ levels can store more entries in its internal nodes
compared to a B-tree having the same ‘l’ levels. This accentuates the
significant improvement made to the search time for any given key.
Having lesser levels and the presence of Pnext pointers imply that the B+
trees is very quick and efficient in accessing records from disks.
 Data stored in a B+ tree can be accessed both sequentially and directly.
 It takes an equal number of disk accesses to fetch records.
 B+trees have redundant search keys, and storing search keys repeatedly
is not possible.
Disadvantages of B+ Trees
 The major drawback of B-tree is the difficulty of traversing the keys
sequentially. The B+ tree retains the rapid random access property of the
B-tree while also allowing rapid sequential access.

3. What is the difference between Indexing and Hashing ?


1. Indexing :
Indexing, as name suggests, is a technique or mechanism generally used
to speed up access of data. Index is basically a type of data structure
that is used to locate and access data in database table quickly. Indexes
can easily be developed or created using one or more columns of
database table.
2. Hashing :
Hashing, as name suggests, is a technique or mechanism that uses hash
functions with search keys as parameters to generate address of data
record. It calculates direct location of data record on disk without using
index structure. A good hash functions only uses one-way hashing
algorithm and hash cannot be converted back into original key. In simple
words, it is a process of converting given key into another value known
as hash value or simply hash.

4. What is the main difference between ISAM and B+ tree indexes?


Indexed Sequential Access Method(ISAM)
We use the index file, its structures is as following. Pairs of the form is
referred as entry. Key is the minimal value on the page that pointer
points to.

Note that each index file contains more entries and each key serves as
a sperator for the content of the pages pointed to by the pointer to its left
and right.
B+ tree
The B+ tree is derived from the ISAM tree, but is fully dynamic with
respect to updates:
o Search performance is only dependent on the height of the B+ tree.
o No overflow pages, B+ tree remains balance.
o B+ tree offers efficient insert/delete procedures, the underlying data file
can grow/shrink dynamically
o B+ tree nodes(desipte the root page) are guaranteed to have a minimun
occupany of 50%.

5. What are the advantages of using tree structured indexes?


 Efficient searching: Trees are particularly efficient for searching and
retrieving data. The time complexity of searching in a tree is typically
O(log n), which means that it is very fast even for very large data sets.
 Flexible size: Trees can grow or shrink dynamically depending on the
number of nodes that are added or removed. This makes them
particularly useful for applications where the data size may change over
time.
 Easy to traverse: Traversing a tree is a simple operation, and it can be
done in several different ways depending on the requirements of the
application. This makes it easy to retrieve and process data from a tree
structure.

UNIT-V (5 Marks)
1. Explain the Insertion and deletion Operations in B+ trees with
example.
Steps for insertion in B+ Tree
Every element is inserted into a leaf node. So, go to the appropriate leaf
node.
Insert the key into the leaf node in increasing order only if there is no
overflow. If there is an overflow go ahead with the following steps
mentioned below to deal with overflow while maintaining the B+ Tree
properties.
Properties for insertion B+ Tree
Case 1: Overflow in leaf node
Split the leaf node into two nodes.
First node contains ceil((m-1)/2) values.
Second node contains the remaining values.
Copy the smallest search key value from second node to the parent
node.(Right biased)
Below is the illustration of inserting 8 into B+ Tree of order of 5:
Case 2: Overflow in non-leaf node
Split the non leaf node into two nodes.
First node contains ceil(m/2)-1 values.
Move the smallest among remaining to the parent.
Second node contains the remaining keys.

2. Explain Deletion and insertion operations in ISAM.


Insert
If we want to insert a record, we should:
o find the leaf page where it belongs.
o if the page has enough space, simply insert it.
o otherwise node p must be split into p and p’ and a new separator has to
be inserted into the parent of p. Splitting happens recursively and may
eventually lead to a split of root node.
o distribute then entries of p and the new entry onto pages p and p’
Delete
Deletion is the opposite to the insertion.
Here we denote m the number of entries of page p and d is the order of
B+ tree.
To delete a record with key k
o find the page where k belongs to.
o if m>d>, then page p has enough occupancy, simply delete k from page
p.
o otherwise borrow a entry from its right(left) sibling
o If its right(left) sibling has less then d entries, merging leaf nodes is
required.

3. What are the Pros and Cons of ISAM?


Advantages:
It combines both sequential and direct
Suitable for sequential access and random access
Provides quick access to records
Disadvantages:
It uses special software and is expensive
Extra time is taken to maintain index
Extra storage for index files
Expensive hardware is required.
Pros of ISAM
o In this method, each record has the address of its data block, searching
a record in a huge database is quick and easy.
o This method supports range retrieval and partial retrieval of records.
Since the index is based on the primary key values, we can retrieve the
data for the given range of value. In the same way, the partial value can
also be easily searched, i.e., the student name starting with 'JA' can be
easily searched.
Cons of ISAM
o This method requires extra space in the disk to store the index value.
o When the new records are inserted, then these files have to be
reconstructed to maintain the sequence.
o When the record is deleted, then the space used by it needs to be
released. Otherwise, the performance of the database will slow down.

4. Explain Primary and Secondary Indexes.


Primary indexing is defined mainly on the primary key of the data-file,
in which the data-file is already ordered based on the primary key.
Primary Index is an ordered file whose records are of fixed length with
two fields. The first field of the index replicates the primary key of the
data file in an ordered manner, and the second field of the ordered file
contains a pointer that points to the data-block where a record
containing the key is available. The first record of each block is called the
Anchor record or Block anchor. There exists a record in the primary
index file for every block of the data-file. The average number of blocks
using the Primary Index is = log2B+ 1, where B is the number of index
blocks.
Secondary Indexing
Secondary indexing is a database management technique used to create
additional indexes on data stored in a database. The main purpose of
secondary indexing is to improve the performance of queries and to
simplify the search for specific records within a database. A secondary
index provides an alternate means of accessing data in a database, in
addition to the primary index.
The primary index is typically created when the database is created and
is used as the primary means of accessing data in the database.
Secondary indexes, on the other hand, can be created and dropped at
any time, allowing for greater flexibility in managing the database.

5. Explain about clustered index organization


A clustered index is created only when both the following conditions are
satisfied:
 The data or file, that you are moving into secondary memory should be in
sequential or sorted order.
 There should be a key value, meaning it cannot have repeated values.
Whenever you apply clustered indexing in a table, it will perform sorting
in that table only. You can create only one clustered index in a table like
a primary key. A clustered index is as same as a dictionary where the
data is arranged in alphabetical order.
In a clustered index, the index contains a pointer to block but not direct
data.

Example of Clustered Index


Example of Clustered Index
If you apply the primary key to any column, then automatically it will
become a clustered index.
Create Table
Create table Student ( Roll_No int primary key,
Name varchar(50),
Gender varchar(30),
Mob_No bigint );
insert into Student values (4, 'ankita', 'female', 9876543210 );
insert into Student values (3, 'anita', 'female', 9675432890 );
insert into Student values (5, 'mahima', 'female', 8976453201 );

UNIT-V (10 Marks)


1. Explain Hash Based Indexing?
In hash-based indexing, a hash function is used to convert a key into a hash
code. This hash code serves as an index where the value associated with that
key is stored. The goal is to distribute the keys uniformly across an array, so
that access time is, on average, constant.

Let's break down some of these elements to further understand how hash-
based indexing works in practice:
Buckets
In hash-based indexing, the data space is divided into a fixed number of slots
known as "buckets." A bucket usually contains a single page (also known as a
block), but it may have additional pages linked in a chain if the primary page
becomes full. This is known as overflow.
Hash Function
The hash function is a mapping function that takes the search key as an input
and returns the bucket number where the record should be located. Hash functions
aim to distribute records uniformly across buckets to minimize the number of
collisions (two different keys hashing to the same bucket).
Disk I/O Efficiency
Hash-based indexing is particularly efficient when it comes to disk I/O
operations. Given a search key, the hash function quickly identifies the bucket (and
thereby the disk page) where the desired record is located. This often requires only
one or two disk I/Os, making the retrieval process very fast.
Insert Operations
When a new record is inserted into the dataset, its search key is hashed to find
the appropriate bucket. If the primary page of the bucket is full, an additional
overflow page is allocated and linked to the primary page. The new record is then
stored on this overflow page.
Search Operations
To find a record with a specific search key, the hash function is applied to the
search key to identify the bucket. All pages (primary and overflow) in that bucket
are then examined to find the desired record.
Limitations
Hash-based indexing is not suitable for range queries or when the search key is
not known. In such cases, a full scan of all pages is required, which is resource-
intensive.
Hash-Based Indexing Example
Let's consider a simple example using employee names as the search key.
Employee Records
| Name | Age | Salary
|-----------|----------|--------
| Alice | 28 | 50000
| Bob | 35 | 60000
| Carol | 40 | 70000
Hash Function: H(x) = ASCII value of first letter of the name mod 3
 Alice: 65 mod 3 = 2
 Bob: 66 mod 3 = 0
 Carol: 67 mod 3 = 1
Buckets:
Bucket 0: Bob
Bucket 1: Carol
Bucket 2: Alice
Pros of Hash-Based Indexing
 Extremely fast for exact match queries.
 Well-suited for equality comparisons.
Cons of Hash-Based Indexing
 Not suitable for range queries (e.g., "SELECT * FROM table WHERE age
BETWEEN 20 AND 30").
 Performance can be severely affected by poor hash functions or a large number
of collisions.

2. Explain Tree Based Indexing


The most commonly used tree-based index structure is the B-Tree, and its variations like B+
Trees and B* Trees. In tree-based indexing, data is organized into a tree-like structure. Each node
represents a range of key values, and leaf nodes contain the actual data or pointers to the data.
Why Tree-based Indexing?
Tree-based indexes like B-Trees offer a number of advantages:
 Sorted Data: They maintain data in sorted order, making it easier to perform range
queries.
 Balanced Tree: B-Trees and their variants are balanced; meaning the path from the root
node to any leaf node is of the same length. This balancing ensures that data retrieval
times are consistently fast, even as the dataset grows.
 Multi-level Index: Tree-based indexes can be multi-level, which helps to minimize the
number of disk I/Os required to find an item.
 Dynamic Nature: B-Trees are dynamic, meaning they're good at inserting and deleting
records without requiring full reorganization.
 Versatility: They are useful for both exact-match and range queries.

Tree-based Indexing Example


Continuing with the "Students" table:
ID Name
1 Abhi
2 Bharath
3 Chinni
4 Devid
A simplified B-Tree index could look like this:
[1, 3]
/ \
[1] [3, 4]
/ \ / \
1 2 3 4
In the tree, navigating from the root to the leaf nodes will lead us to the desired data
record.
Pros of Tree-based Indexing:
 Efficient for range queries.
 Good for both exact and partial matches.
 Keeps data sorted.
Cons of Tree-based Indexing:
 Slower than hash-based indexing for exact queries.
 More complex to implement and maintain.
3. Explain about Comparison of File Organizations
File organization, indexing, and performance tuning are three interconnected areas in the realm
of Database Management Systems, each contributing to the overall efficiency and effectiveness of data
storage and retrieval. Below is a comparison of these three concepts, focusing on their objectives,
methodologies, and implications.
File Organizations
1. Objective: To physically store records on storage media in an organized manner.
2. Methodologies: Includes sequential, random (or direct), and hashed file organizations, among
others.
3. Implications:
 Sequential organization is suitable for batch processing but inefficient for random access.
 Direct or random organization allows fast access but can be inefficient in terms of storage
space.
 Hashed file organization is excellent for equality searches but not for range-based
queries.
4. Real-world Examples: Ledger systems, log files, archival systems.
Indexing
1. Objective: To create a data structure that improves the speed of data retrieval operations.
2. Methodologies: Includes clustered, non-clustered, primary, secondary, composite, bitmap,
and hash indexes, among others.
3. Implications:
 Clustered indexes are excellent for range-based queries but slow down insert/update
operations.
 Non-clustered indexes improve data retrieval speed but can take up additional storage.
 Bitmap indexes are useful for low-cardinality columns.
4. Real-world Examples: Search engines, e-commerce websites, any application that requires
fast data retrieval.
Performance Tuning
1. Objective: To optimize the resources used by the database for efficient transaction processing.
2. Methodologies: Query optimization, index tuning, denormalization, database sharing,
caching, partitioning, etc.
3. Implications:
 Query optimization can dramatically reduce the resources needed for query processing.
 Proper indexing can mitigate the need for full-table scans.
 Denormalization and caching can improve read operations but may compromise data
integrity or consistency.
4. Real-world Examples: Financial trading systems, real-time analytics, high-performance
computing

You might also like