0% found this document useful (0 votes)
2 views

DBMS Series Part-2

The document provides an overview of Database Management Systems (DBMS), focusing on query optimization techniques, physical storage, file organization, and indexing methods. It discusses strategies for improving query performance, such as using indexes efficiently, optimizing JOIN operations, and caching results. Additionally, it explains the differences between sparse and dense indexing, their use cases, and the concept of multilevel indexing for managing large datasets.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

DBMS Series Part-2

The document provides an overview of Database Management Systems (DBMS), focusing on query optimization techniques, physical storage, file organization, and indexing methods. It discusses strategies for improving query performance, such as using indexes efficiently, optimizing JOIN operations, and caching results. Additionally, it explains the differences between sparse and dense indexing, their use cases, and the concept of multilevel indexing for managing large datasets.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 80

LET'S LEARN

DATABASE MANAGEMENT
SYSTEM (DBMS)
-By Riti Kumari
LET’S START WITH DBMS :)
Query Optimization
Whenever we want to improve the efficiency of a query, make it execute faster and
consume fewer resources we use query optimization techniques.

Optimizing SQL queries is essential for improving the performance of database


applications

1.Use Indexes Efficiently: Index the columns frequently used in WHERE, JOIN,
ORDER BY, and GROUP BY clauses to speed up data retrieval.

2.Select Only Necessary Columns : Avoid SELECT *, specify only the columns you
need in your query. This reduces the amount of data transferred and processed.
LET’S START WITH DBMS :)
Query Optimization
3. Optimize JOIN Operations : Choose the Right JOIN Type and use indexed
columns in Join conditions.

4. Partition Large Tables: For very large tables, consider partitioning them to
improve query performance. Partitioning allows the DBMS to scan only relevant
partitions, reducing I/O and improving response times.

5. Cache results : Cache the results of frequently executed queries to avoid


redundant calculations.
LET’S START WITH DBMS :)
Physical Storage And File Organization

Physical Storage

Storage device : For storing the data, there are different types of storage options
available.
Primary Storage -> Main memory and cache (fastest and expensive and as soon
as the system leads to a power cut or a crash, the data also get lost. )

Secondary Storage -> Flash Memory and Magnetic Disk Storage (non-
volatile)save and store data permanently.

Tertiary Storage-> Optical disk and Magnetic tape (used for data backup and
storing large data offline)
LET’S START WITH DBMS :)
Physical Storage And File Organization

Databases store data in files on disk. Each table or index may correspond to one or
more files. Each file has sequence of records.
Accessing data from RAM is much faster than accessing data from a hard disk or
SSD. This is because RAM is designed for high-speed read and write operations. Data
retrieval in RAM typically involves fetching data in nanoseconds.

RAM : Provides fast, volatile memory for quick data retrieval. Data is accessed via
addressable locations and cache memory enhances speed.

Hard Disk: Provides slower, non-volatile storage with mechanical components


that impact access speed. Data retrieval involves moving the read/write head and
waiting for disk rotation.
LET’S START WITH DBMS :)
Physical Storage And File Organization

When a program or application needs to access data, it first checks if the data is
available in RAM. If the data is not in RAM, it must be loaded from a slower storage
device like a hard disk or SSD. Index files in a Database Management System
(DBMS) are stored on disk but are also managed in RAM to optimize performance

Query

RAM
CPU Hardisk
(Volatile)
(non-volatile)
LET’S START WITH DBMS :)
Physical Storage And File Organization
How file is organized?

Snam
Rollno SAge
e
Files are allocated on the
hard disk in contiguous
blocks to reduce
fragmentation.

DB HARDISK
No of records stored in each block= Size of block/size of record
Records can be stored in 2 ways-> sorted(seraching is fast) or
unsorted(searching is slow, insertion is fast)
LET’S START WITH DBMS :)
Indexing and its types
Why do we need indexing?
Imagine you have a 1,000-page textbook and you want to serach a topic “xyz”.

When indexing is not present-> You need to manually search for each term in
the book, which could be frustrating and time-consuming.

When indexing is present->The index allows you to quickly locate each topic.
You can look up "xyz," find that it's covered on pages 448-490.This makes it
more efficient and less time-consuming.

Indexing is a critical technique in database management that significantly improves


the performance of query operations by minimizing the no of disk access. Index
table is always sorted
LET’S START WITH DBMS :)
Indexing and its types
Why do we need indexing?
In the same way how we have used indexing in book , it helps us to find specific
records or data entries in a large table or dataset.

Search key Data acccess

It contains the address


where the data item is
It is used to find the record
stored in memory for the
provided search key

It helps in Index table set of pointer


finding what a holding address
user is looking of a block
for
LET’S START WITH DBMS :)
Indexing and its types
How it helps?

Improved Query Performance: Indexes allow the database management


system (DBMS) to locate and retrieve data much more quickly than it could by
scanning the entire table.

Indexes can be used to optimize queries that sort data (ORDER BY) or group
data (GROUP BY) : When an index is available on the columns being sorted or
grouped, the DBMS can retrieve the data in the desired order directly from the
index, eliminating the need for additional sorting operations.

We have certain indexing methods like Dense(index entry for all the records)
and Sparse(index entries for only a subset of records) index.
LET’S START WITH DBMS :)
Indexing and its types

Sparse Index: A sparse index is a type of database indexing technique where the
index does not include an entry for every single record in the database. Instead, it
contains entries for only some of the records, typically one entry per block (or
page) of data in the storage. This makes sparse indexes smaller and faster to
search through compared to dense indexes, which have an entry for every record.

It is used when we have an ordered data set.

EX: If a table has 1,000 rows divided into 100 blocks, and you create a sparse
index, the index might only have 100 entries, with each entry pointing to the first
record in each block.
Index table
RollNo Name
1 Ram
Search key data ref
2 Shyam B1
3 1 Ram
1 B1

4 2 Shyam
4 B2 5 B2
6 3 Raghu

Sparse Index
4 Riti
no of records in index disk
file=no of blocks 5 Raj

6 Rahul

How Sparse Indexing Works:


Primary Index: Sparse indexes are often used with primary indexes, where the
table is sorted on the indexed column. The sparse index only includes an entry
for the first record in each block or page.

Searching: When searching for a specific value, the database uses the sparse
index to quickly locate the block where the record might reside. It then searches
within the block to find the exact record.
Index table
1 RollNo Name
Search key data ref
2 B1
3 1 Ram
1 B1

Disadvantages of Sparse Indexes 4 B2


4
5 B2
2 Shyam

Additional I/O Operations: After using the index Sparse Index


3 Raghu

no of records in index disk 4 Riti

to find the correct block, an additional search file=no of blocks 5 Raj

within the block is required to find the specific 6 Rahul

record.

Less Efficient for Random Access: If queries


often need to retrieve random, non-sequential
records, a dense index might be more efficient.

Use Cases:
Primary Indexing on Large Tables: Sparse indexes are ideal for primary
indexes on large tables where records are sequentially stored.
Range Queries: Sparse indexes can be effective for range queries where
the query needs to retrieve a range of records rather than a single record.
LET’S START WITH DBMS :)
Indexing and its types

Dense Index: A dense index is a type of indexing technique used in databases where
there is an index entry for every single record in the data file. This means that the index
contains all the search keys and corresponding pointers (or addresses) to the actual
data records, making it highly efficient for direct lookups.

It is used when we have an un-ordered data set.

EX : Consider a table with 1,000 rows, and you create a dense index on a non-primary
key column. The dense index will have no ofunique non key records, each pointing to a
specific row in the table.

These are useful in scenarios where random access to individual records is quiet
frequent .
18 Ram Age Name
Search key data ref 18. Shyam B1
19 18 Ram
18 B1
20 18 Shyam
20 B2
19 B1
21
19 Raghu
20 B2
Dense Index 20 Riti
21 B2
no of records in index file=no 20 Raj

Index table of blocks unique non key


21 Rahul
records

How Dense Indexing Works:


In a dense index, each index entry includes:
A search key (the value of the indexed column).
A pointer (or address) to the actual record in the data file.
When a query searches for a specific value, the database uses the dense index to
quickly locate the corresponding record.
18 Ram Age Name
Search key data ref 18. Shyam B1
19 18 Ram
18 B1
Disadvantages of Dense Indexes 19 B1
20
20 B2
18 Shyam

Storage Intensive: Requires significant storage


21
19 Raghu
20 B2
Dense Index
space because it maintains an entry for every
20 Riti
21 B2
no of records in index file=no 20 Raj
record in the table. Index table of blocks unique non key
21 Rahul
records

Maintenance Overhead: Inserting, updating, or


deleting records requires updating the dense
index, which can be costly in terms of
performance, especially for large tables.
Use Cases:
Primary Indexing: Dense indexes are often used for primary indexing when
the table is small to medium-sized, and quick access to individual records
is required.
Exact Match Queries: Ideal for situations where queries frequently request
individual records based on an exact key match.
LET’S START WITH DBMS :)
Indexing and its types
Index table
B1 B2
Search key data ref
1-10 11-20
1 B1
B3 B4
21-30 31-40
11 B2
Hardisk
21 B3

When a query is run, the database engine checks the index to find the pointers
to the rows that contain the desired data. It then retrieves these rows directly,
without scanning the entire table.
LET’S START WITH DBMS :)
Indexing and its types
How indexing works?
B1 B2
It takes a search key as input and then returns a collection of all the matching
1-10 11-20
records. Now in index there are 2 columns, the first one stores a duplicate of the
key attribute/ non-key attribute(serach key) from table while the second one
stores pointer which hold the disk block address of the corresponding key-value.
LET’S START WITH DBMS :)
Indexing and its types
Indexing

Clustered Index Multilevel Indexing Non clustered Index

Primary index When there is one more


level of indexing in the index
Secondary index
When key= PK table we use inner and outer
When key= non-key/key
order= sorted index.
Follows sparse indexing order= unsorted
Mostly used for large files
Follows dense indexing

Clustering Index
When key= non-key
order= sorted
Follows dense indexing
LET’S START WITH DBMS :)
Indexing and its types

Key data: Unique identifiers for each record. Often used to distinguish one
record from another.
Non-key data: All other data in the record besides the key.

Single-Level Indexing is straightforward and works well for smaller


databases, offering a direct approach to speeding up query performance.
Multi-Level Indexing is more complex but necessary for larger databases, as
it effectively manages large indexes by creating additional layers of indexing,
thereby improving data retrieval times.
LET’S START WITH DBMS :)
Primary Index
It enhances the efficiency of retrieving records by using their primary key values. It establishes
an index structure that associates primary key values with disk block addresses. This index is
composed of a sorted list of primary key values paired with their corresponding disk block
pointers, thereby accelerating data retrieval by reducing search time. It is especially beneficial
for queries based on primary keys. It is done on sorted data
1 RollNo Name
Search key data ref 2 B1
3 1 Ram
1 B1
4 2 Shyam
4 B2 5 B2
6
3 Raghu
It creates an index
structure that maps Sparse Index 4 Riti

primary key values to no of reocrds in index 5 Raj


disk block addresses. file=no of blocks
6 Rahul
Index table
LET’S START WITH DBMS :)
Primary Index

Example : In a Users table with a UserID column as the primary key, a


primary index on UserID allows for quick retrieval of user records when you
know the UserID

The primary index ensures that the database can immediately locate the
row associated with a given UserID, making operations like SELECT,
UPDATE, and DELETE highly efficient.
LET’S START WITH DBMS :)
Cluster Index
The cluster index is generally on a non- key attribute. The data in the table is typically
ordered according to the order defined by the clustered index. Mostly each index entry
points directly to a row. It is mostly used where we are using GROUP BY. It physically
order records in a table based on the indexed columns.
Age Name
Search key data ref 18
18 B1
19 18 Ram
18 B1

20 18 Shyam
19 B1
20 B2
21 19 Raghu
20 B2

21 B2 Dense Index 20 Riti

no of reocrds in index file=no 20 Raj

Index table of blocks unique non key


21 Rahul
records
LET’S START WITH DBMS :)
Cluster Index
Example : In an Employees table, if you often need to produce lists of employees
ordered by LastName, creating a clustered index on LastName can make these
queries faster.

The clustered index keeps the rows in LastName order, reducing the need for the
database to perform additional sorting when executing queries.
LET’S START WITH DBMS :)
Multilevel Index
A multilevel index is an advanced indexing technique used in database management
systems to manage large indexes more efficiently. When a single-level index
becomes too large to fit into memory, multilevel indexing helps by breaking down
the index into multiple levels, reducing the number of disk I/O operations needed to
search through the index

First Level (Primary Index): The first level is the original index, where each entry
points to a block of data or a page in the table.
Second Level (Secondary Index): If the first level index is too large to fit in
memory, a second-level index is created. This index points to blocks of the first-
level index, effectively indexing the index itself.
LET’S START WITH DBMS :)
Multilevel Index
Eid Pointer Eid Data

Eid Pointer E101 B1 E101


E102
E151 B2 .
E101
.

E201 Eid Pointer


E151
E201 B3 E152
E301 .
.
E251 B4

Outer index E201


Eid Pointer E202
.
.
E301 B5

E351 B6

Inner index
LET’S START WITH DBMS :)
Secondary Index
Secondary indexing speeds up searches for non-key columns in a database.
Unlike primary indexing, which uses the primary key, secondary indexing
focuses on other columns. It creates an index that maps column values to the
locations where the records are stored, making it faster to find specific records
without scanning the entire table.

The data is mostly unsorted and it can be performed on both key and non-key
attributes.
LET’S START WITH DBMS :)
Secondary Index
1. when serach key is key attribute and data is unsorted

Search key data ref Age


18
18 B1 20
B1
Dense Index 45
67
no of reocrds in index 20 B1 68

file=no of blocks unique 22 B2


22
non key records 24 B2
24 B2 30

.
.
.
.
.
.
.

Index table
LET’S START WITH DBMS :)
Secondary Index
2. when serach key is non- key attribute and data is unsorted

Search key data ref Age


18
18 B1 18
B1
Dense Index 20
67
no of reocrds in index 20 B1 22

file=no of blocks unique 22 B1->B2


22
non key records 24 B2
24 B2 30

.
.
.
.
.
.
.

Index table
LET’S START WITH DBMS :)
Indexing and its types
How to create indexes
Non-Clustered index: Clustered index:
(On one column) Automatically created with the primary
CREATE INDEX index_name key. MySQL does not support explicit
ON table_name (column_name); creation of additional clustered indexes.

(On multiple column)


CREATE INDEX index_name
ON table_name (column_name1, column_name2,.....)
Remove an index
DROP INDEX index_name ON table_name
LET’S START WITH DBMS :)
B and B+ trees
Clustered and non-clustered indexes are concepts that describe how data is stored and
accessed in a database, but B-trees (and B+ trees) are the data structures that actually
implement these indexes. Knowing B-trees gives you insight into how these indexes work under
the hood.

Understanding B-trees helps you understand why certain queries perform well or poorly based
on the structure of the index. For example, how a B-tree's balanced nature affects search times
or why range queries are efficient with B-tree indexes

B-trees provide the balanced, efficient structure that makes these types of indexes performant,
ensuring that operations like search, insert, delete, and update are done in logarithmic time
(O(log n)). This makes B-tree indexing crucial for optimizing database queries, whether in
clustered or non-clustered index scenarios.
LET’S START WITH DBMS :)
B trees(M-way tree)

B-trees are self-balancing tree data structures that maintain sorted data and allow
searches, sequential access, insertions, and deletions in logarithmic time. All leaf
nodes are at the same level.

In B trees you can have minimum 2 children and max x children.


Now B-tree is a generalisation of Binary Search trees. In BST every node can have
atmost 2 children(0,1,2) and only one key

key root key

key
key key

Binary tree
B-tree
Root node
LET’S START WITH DBMS :)
internal node
B trees
Dp-> record/data pointer(where the record is present
in secondary memory(disk) Leaf node
Tp-> block/tree pointer(links to the children nodes)

Structure of B-Tree:
Nodes: A B-tree is composed of nodes, each containing keys and pointers (references) to
child nodes. The keys within a node are sorted in ascending order.
Root, Internal Nodes, and Leaves:
keys Rp
1. The root node is the topmost node in the tree. Tp
2. Internal nodes contain keys and child pointers.
3. Leaf nodes contain keys and possibly pointers to records or other data.
In indexing, each key in a B-tree node typically represents a value or range of values, and
the associated pointer directs to a data block where records corresponding to the key(s)
can be found.
For example, in a database, the key might be a value in a column, and the pointer might
direct to the location of a row or a set of rows in a table.
Root node
LET’S START WITH DBMS :)
internal node
B trees
A B-tree of order x can have:
Leaf node
The max no. of children
For every node -> x i.e the order of tree.
The max no. of keys Ceiling-> upper value
For every node -> x-1 floor-> lower value
The min no of keys
a. root node -> 1 K1 K2
b. other nodes apart from root-> ceiling(m/2) -1
The min no. of children
a. root node -> 2
b. Leaf node ->0
c. internal node-> ceiling(m/2)

Insertion in B-tree always happens from leaf node.


Root node
LET’S START WITH DBMS :)
internal node
B trees
To determine the order of a B-tree when the
Leaf node
block size, block pointer size, and data pointer
size are given :
For a B-tree node with m children:
m×Pb​+(m−1)×(K+Pd​)≤B Number of Keys: A node with m children can
have a maximum of m-1 keys.
B- Block size Number of Block Pointers: Each node has m
m- Order of tree block pointers (pointers to child nodes).
Pb- Block pointer size Number of Data Pointers: Each key has an
K- Key size associated data pointer, so there are m-1
Pd- Data pointer size data pointers.
To determine the order of a B-tree when the

LET’S START WITH DBMS :) block size, block pointer size, and data pointer
size are given :
B trees
m×Pb​+(m−1)×(K+Pd​)≤B
Let's say you have the following:
Block size (B) = 1024 bytes B- Block size
Block pointer size (Pb) = 8 bytes m- Order of tree
Pb- Block pointer size
Data pointer size (Pd) = 12 bytes K- Key size
Key size (K) = 16 bytes Pd- Data pointer size
Find the order of the tree.

Formula to find order : m×Pb​+(m−1)×(K+Pd​)≤B


mx8+(m-1)x(16+12) <=1024

So, the order of the B-tree is m = 29


LET’S START WITH DBMS :)
Insertion in B-TREE
Steps to insert values in B-tree
1.Start at the root and recursively move down the tree to find the appropriate leaf node where the new
value should be inserted.

2. Insert the value into the leaf node in sorted order. If the leaf node has fewer than the maximum
allowed keys (order - 1), this step is simple.

3. If the leaf node contains the maximum number of keys after the insertion, it causes an overflow.
Split the Node:
Divide the node into two nodes. The middle key (median) is pushed up to the parent node.
The left half of the original node stays in place, while the right half forms a new node.
Insert the Median into the Parent:
If the parent node also overflows after this insertion, recursively split the parent node and
propagate the median up the tree.

4.If the root node overflows (which can happen if it already has the maximum number of keys), split it
into two nodes, and the median becomes the new root. This increases the height of the B-tree by one.
A B-tree of order x can have:

LET’S START WITH DBMS :)


Every node will have max x children i.e the
order of tree.
Every node can have max (x-1) keys
The min no of keys
Insertion in B-TREE a. root node -> 1
b. other nodes apart from root-> ceiling(m/2) -1
Create a B-tree of Order 3 and insert values from 1,4,6,8,10,12,14,16 Min children
a. root node -> 2

Max children= order of tree=3 b. Leaf node ->0


c. internal node-> ceiling(m/2)

Max keys node can have=order-1=3-1=2

Insertion in B-tree happens from leaf node and values are also inserted
in sorted order. The element in left of root would be less than root and the
element in right would be greater than root

Step 1: Start at the root and recursively move down the tree to find the appropriate
leaf node where the new value should be inserted. Insert the value into the leaf
node in sorted order. If the leaf node has fewer than the maximum allowed keys
(order - 1), this step is simple.
1 4
Max children= order of tree=3
LET’S START WITH DBMS :) Max keys node can have=order-1=3-1=2
Insertion in B-TREE
Create a B-tree of Order 3 and insert values from 1,4,6,8,10,12,14,16

Step 2: If the leaf node contains the maximum number of keys after the insertion, it
causes an overflow.
Split the Node: Divide the node into two nodes. The middle key (median) is pushed up
to the parent node.
The left half of the original node stays in place, while the right half forms a new
node.
Insert the Median into the Parent:
If the parent node also overflows after this insertion, recursively split the parent
node and propagate the median up the tree.
4 4 (left- 1 , right- 6)

1 6
Max children= order of tree=3
LET’S START WITH DBMS :) Max keys node can have=order-1=3-1=2
Insertion in B-TREE
Create a B-tree of Order 3 and insert values from 1,4,6,8,10,12,14,16
4 (left- 1 , right- 6,8)
4

1 6 8

Follow step-2 again

4 (left- 1 , right- 6)
4 8
8(right -10,12)
1 6 10 12
Max children= order of tree=3
LET’S START WITH DBMS :) Max keys node can have=order-1=3-1=2
Insertion in B-TREE
Create a B-tree of Order 3 and insert values from 1,4,6,8,10,12,14,16
Step 3:If the root node overflows (which can happen if it already has the maximum
number of keys), split it into two nodes, and the median becomes the new root. This
increases the height of the B-tree by one.
4 (left- 1 , right- 6)
8
8(left-4, right-12)
4 12 12(left -10,right -14)

1 6 10 14
maximum of (m-1)=3 keys
LET’S START WITH DBMS :) and minimum of (ceil(m/2)-1)=1 key
Deletion in B-TREE
Consider a B-tree of order 4
]
Step 1: Begin at the root and recursively move 40, 50
down the tree to find the node that contains
the key to be deleted. 10, 20 44, 47 52, 59

Step 2 :
Case 1: The Key is in a Leaf Node (Delete 10)
Simply remove the key from the leaf node. ]
If the node still has the minimum required number 40, 50
of keys (i.e., at least ceil(order/2) - 1 keys), the
deletion is complete. 20 44, 47 52, 59
maximum of (m-1)=3 keys
LET’S START WITH DBMS :) and minimum of (ceil(m/2)-1)=1 key
Deletion in B-TREE
Consider a B-tree of order 4 ]
40, 50
Step 2 :
Case 2: The Key is in a Leaf Node (Delete
20) 20 44, 47 52, 59
If the deletion causes the node to have
fewer than the minimum number of keys,
proceed to the Borrowing or Merging
step.

1. If the node has a sibling with more than ]


47, 50
the minimum number of keys, you can
borrow a key from this sibling. The
parent key between the node and the 40 44 52, 59
sibling moves down to the node, and a
key from the sibling moves up to the
parent.
maximum of (m-1)=3 keys
LET’S START WITH DBMS :) and minimum of (ceil(m/2)-1)=1 key
Deletion in B-TREE
Consider a B-tree of order 4
]
47, 50
Step 2 :
Case 2: The Key is in a Leaf Node (Delete 40)
40 44 52, 59
2. If borrowing is not possible (i.e., the sibling
also has the minimum number of keys), merge
the node with a sibling. The key from the
parent that separates the two nodes moves
]
down into the newly merged node.If this 50
causes the parent to have too few keys,
repeat the borrowing or merging process at 44 47 52, 59
the parent level.
]
40, 50
LET’S START WITH DBMS :)
Deletion in B-TREE 20 44, 47 52, 59
Steps on how we can delete elements in B-
tree
Step 2: 10,11 22,23 43 48 60
Case 2: The Key is in an Internal Node (Delete 20)
Replace the key with its predecessor (the largest key in the left subtree) or its
successor (the smallest key in the right subtree).
Delete the predecessor or successor key from the corresponding subtree.
If this causes an underflow (i.e., a node has too few keys), proceed to the
Borrowing or Merging step. 40, 50

11 44, 47 52, 59

10 22,23 43 48 60
LET’S START WITH DBMS :)
B+ Tree
A B+ tree is an extension of the B-tree and is commonly used in databases and
file systems to maintain sorted data and allow for efficient insertion, deletion, and
search operations. B+ tree is a balanced tree, meaning all leaf nodes are at the
same level

The key difference between a B+ tree and a B-tree lies in how they store data
and how leaf nodes are structured.
1. In a B+ tree, all actual data (or references to data) are stored in the leaf nodes.
Internal nodes only store keys

leaf node
LET’S START WITH DBMS :)
B+ Tree

2. In a B+ tree, Leaf nodes are linked together in a linked list fashion, allowing for
efficient sequential access, one leaf node will have only 1 Bp.

leaf nodes

During inserting value in B+ tree, a copy of the key is always stored in


the leaf node.
LET’S START WITH DBMS :)
B+ Tree

3. Internal nodes do not store data pointers, only keys and child pointers. This allows
more keys to be stored in each internal node, leading to a lower height and more
efficient operations.

internal nodes
LET’S START WITH DBMS :)
B+ Tree

Structure of a B+ Tree:
Root Node:
The top node of the B+ tree, which points to the first level of
internal nodes or directly to leaf nodes if the tree has only one
level.
Internal Nodes:
These nodes contain only keys and pointers to child nodes.
They guide the search process down to the correct leaf node.
Leaf Nodes:
These nodes contain keys and data pointers. Each leaf node
stores a pointer to the next leaf node, enabling quick traversal
of records.
LET’S START WITH DBMS :)
B+ Tree
Advantages :

1. B+ trees have a balanced structure, meaning all leaf nodes are at the same
level. This balance ensures that search operations require logarithmic time
relative to the number of keys, making it very efficient even for large
datasets.
2. The structure of the B+ tree allows for direct access to data. Since the
internal nodes contain only keys, searching for a specific value can be done
quickly by navigating through the tree down to the leaf node where the data
is stored.
B- Block size
LET’S START WITH DBMS :) m- Order of tree
B+ Tree Pb- Block pointer size
Order of B+ tree K- Key size
Pd- Data pointer size
Order of Leaf node :
leaf nodes
1xPb+M(k+Pd)<=B

Order of non-Leaf node(internal,root) :


internal nodes
mxPb+(m-1)k<=B
LET’S START WITH DBMS :)
Difference between B and B+ Tree

B-Tree Example: Think of a B-Tree as a directory structure on your computer


where folders (nodes) contain both names (keys) and pointers to subfolders
or files (child pointers).

B+ Tree Example: Imagine a library catalog where all book records are listed
in a sorted, linked list (leaf nodes), and the internal nodes only contain
pointers to guide you to the right part of the catalog.
LET’S START WITH DBMS :)
Difference between B and B+ Tree
Case B tree B+ tree

keys and associated pointers to data are all actual data is stored only in the leaf
Data Storage stored in all nodes (internal and leaf nodes.Internal nodes contain only keys
nodes) and pointers to child nodes

Leaf nodes are linked together in a linked


There is no inherent linked structure list. This linked structure allows for
Leaf Node Linking
between the leaf nodes efficient sequential access, making range
queries faster and easier

Access times can be slower for certain types


Leaf nodes are linked, allowing for efficient
Search Performance of queries because you may need to traverse
sequential access and range queries.
multiple levels of the tree to find the data.

B+ trees typically have better space utilization


Since data is stored throughout the tree, B-
because internal nodes are used only for
Space Utilization trees might have lower space utilization in the
keys, allowing more keys to be stored per
nodes.
node.
LET’S START WITH DBMS :)
Scaling in Databases

Process of user sending a request to access data on the internet


LET’S START WITH DBMS :)
Scaling in Databases

Process of user sending a request to access data on the internet

But this system is not scalable


as the number of user grows
it will require scalability in our systems
LET’S START WITH DBMS :)
Scaling in Databases

How to scale a Database?


LET’S START WITH DBMS :)
Scaling in Databases

How to scale a Database?


LET’S START WITH DBMS :)
Scaling in Databases

User Journey

1. User Action: Sending a Request


2. Browser Parses the URL
3. DNS Lookup
4. Establishing a Connection
5. Sending the HTTP Request
6. Server Processing
7. Browser Receives the Response
8. Displaying the Web Page
9. Caching
LET’S START WITH DBMS :)
Scaling in Databases

The user sends a request by entering a URL or clicking a link. The browser
then translates the domain name into an IP address through DNS lookup,
establishes a connection with the server, sends an HTTP request, and
receives a response.

The browser then processes this response, rendering the webpage and
displaying it to the user. The process involves multiple layers of networking,
data transfer, and client-server interaction, all working together to deliver the
desired content to the user’s screen.
LET’S START WITH DBMS :)
Scaling in Databases

Scaling in databases is a crucial topic in development, mostly when


applications grow and the volume of data increases.

Vertical Scaling (Scaling Up)


Horizontal Scaling (Scaling Out)
Database Partitioning
Replication
Sharding
Caching
Load Balancing
Increase the
LET’S START WITH DBMS :) capacity of
instances
Scaling in Databases RAM,CPU

Vertical Scaling (Scaling Up)

Vertical scaling involves increasing the capacity of a single database server.


Adding More Resources: Upgrading the server by adding more RAM, faster
processors, or larger and quicker storage. This helps the server handle more
requests and store more data.

Improving Performance: Tweaking the database settings, optimizing queries, and


indexing tables to make better use of the hardware.

Upgrading to a Stronger Machine: Moving the database to a more powerful server,


like switching from a regular server to a high-performance cloud instance.
LET’S START WITH DBMS :)
Scaling in Databases

Vertical Scaling (Scaling Up)

Advantages:
Easy to set up because it usually doesn't need changes to the app or
database structure.
Simple to manage since everything is kept on one server.

Disadvantages:
Limited by the hardware's maximum capacity.
Expensive, as high-end hardware can be very costly.
Single point of failure: if the server fails, the entire database goes offline.
LET’S START WITH DBMS :) add more instances

Scaling in Databases
Horizontal Scaling (Scaling Out):

Horizontal scaling involves spreading the database workload across multiple servers. This
can be done in different ways:

a. Database Replication:
Replication is the process of copying data from one main server (master) to one or more
backup servers (slaves).
Master-Slave Replication: The master server manages all write operations, while read
operations are distributed among the slave servers. This setup improves read
performance and provides redundancy.

Multi-Master Replication: Multiple servers handle both read and write operations,
increasing availability and write capacity. However, it also introduces challenges in
managing conflicts.
Shard
LET’S START WITH DBMS :) T1 T2 T3
Scaling in Databases
Divide the data between multiple tables
Horizontal Scaling (Scaling Out): in diff db instances

b. Database Sharding
Sharding involves partitioning the database into smaller, more manageable pieces
(shards), with each shard stored on a separate server.

Range-Based Sharding: Data is divided based on ranges of a particular key, such as


date or user ID.
Hash-Based Sharding: A hash function is applied to a key (e.g., user ID) to determine
which shard the data should go to.
LET’S START WITH DBMS :)
Scaling in Databases
Caching: Implement caching layers (e.g., Redis, Memcached) to store frequently
accessed data in memory, reducing the load on the database.

Load Balancing: Distribute database queries across multiple servers using load
balancers to prevent any single server from becoming a bottleneck

Partitioning: Divide large tables into smaller partitions that can be queried
independently, improving query performance.

Partitioning : Divide the data


T1
into multiple table in same
T2 DB instance
T3
LET’S START WITH DBMS :)
RBAC(Role-based access control )

RBAC is a security system that controls who can access a database or


system based on their role in the organization.

Instead of giving permissions directly to each user, the system creates


roles with specific permissions, and users are assigned to these roles.

For example, a database administrator might create roles like DB_Read for
reading data and DB_Write for modifying data. Only people with the right
role can read or change the information.
LET’S START WITH DBMS :)
RBAC(Role-based access control )

Roles: A set of permissions associated with a job function (e.g., admin,


manager, developer)
Permissions: The rights granted to perform certain actions (e.g., read,
write, delete).
Users: Individuals who are assigned one or more roles.

Advantages:
Makes it easier to manage permissions, especially in big systems.
Lowers the risk of misuse by making sure users only have access to
what they need for their role.
Improves security and helps keep rules consistent across the system.
LET’S START WITH DBMS :)
Encryption

Encryption in Databases refers to the process of converting data into a


secure format that can only be read or accessed by someone with the
correct decryption key. This helps protect sensitive information from
unauthorized access
For example, an MNC might use encryption to protect records in their
database. Even if a hacker gains access to the database, they wouldn't be
able to read the data without the decryption key.

Types of Encryption
1. At-Rest Encryption
2. In-Transit Encryption
3. Column-Level Encryption
4. Full Disk Encryption:
LET’S START WITH DBMS :)
Encryption

At-Rest Encryption: At-rest encryption protects data stored in the


database by encrypting it when it is saved to disk. This means that if
someone gains access to the physical storage or the database files,
they won't be able to read the data without the encryption key.

Example: A company encrypts employee salary records stored in their


database. If someone accesses the database files directly, they see
encrypted data like xYz1234!@# instead of the actual salary figures.
LET’S START WITH DBMS :)
Encryption

In-Transit Encryption: In-transit encryption secures data as it moves


between the database and applications or users. This is typically done
using secure communication protocols like SSL/TLS to prevent
eavesdropping or tampering during data transmission.

Example: When you send an email, in-transit encryption ensures that


the email content is scrambled while traveling to the recipient's server,
so even if intercepted, it appears as random characters like qW9rTy!@
and is unreadable.
LET’S START WITH DBMS :)
Encryption

Column-Level Encryption: Column-level encryption encrypts specific


columns within a database. This is used when only certain pieces of
data, like credit card numbers or social security numbers, need to be
protected, leaving the rest of the data unencrypted for easier access.

Example: A database holds user information with a column for Social


Security Numbers (SSNs). This column is encrypted, so an SSN like
123-45-6789 might be stored as aBcXyZ!@3 in the database.
LET’S START WITH DBMS :)
Encryption

Full Disk Encryption: Full disk encryption encrypts the entire disk
where the database is stored. This ensures that all data on the disk is
protected, not just the database but also any files, logs, or backups
stored on that disk.

Example: A laptop used by an employee is encrypted. If the laptop is


stolen, the entire disk is protected, and the thief would see only
encrypted data like r9T!@2d3Xy instead of readable files.
LET’S START WITH DBMS :)
Encryption
Advantages
It ensures that sensitive information remains secure, even if the database is
breached.

Helps organizations meet regulatory requirements for data security, such as


GDPR, HIPAA

Limits who can view or modify encrypted data to those with the appropriate
decryption keys.

Challenges
Encryption can slow down database operations because of the additional
processing required.

If decryption keys are lost, the data may become permanently inaccessible.
LET’S START WITH DBMS :)
Data masking techniques

Data masking means hiding or changing sensitive information so it’s


protected from unauthorized access. This allows the data to be used for
things like testing, analytics, or training without exposing the real data. The
goal is to create data that looks real but isn’t actually the original
information.

For example : A bank needs to train its machine learning models to detect
fraudulent transactions. To protect customers' personal information, the
bank uses data masking to replace real account numbers and transaction
details with fake but realistic-looking data. This way, the models can be
trained effectively without risking exposure of actual customer data.
LET’S START WITH DBMS :)
Data masking techniques
Common Data Masking Techniques

Substitution:
Replacing sensitive data with random but realistic-looking values. For
example, swapping real names with random names from a list.

Example: A company's employee database contains Social Security Numbers


(SSNs). To mask these, the company replaces real SSNs with randomly
generated fake SSNs. So, 123-45-6789 might be substituted with 987-65-
4321.
LET’S START WITH DBMS :)
Data masking techniques
Common Data Masking Techniques

Shuffling:
Rearranging the values within a column so that the data remains realistic but
is not linked to the original records.

Example: A hospital has a list of patients with their birth dates. To protect
privacy, the hospital shuffles the birth dates among the patients. For instance,
Patient A's birth date of 01/15/1980 might be swapped with Patient B's birth
date of 07/22/1985.
LET’S START WITH DBMS :)
Data masking techniques
Common Data Masking Techniques

Encryption:
Encrypting data so that it is unreadable without the decryption key. This can be
used as a form of masking if the data doesn't need to be human-readable.

Example: A retail company encrypts credit card numbers in its database. A


number like 4111 1111 1111 1111 might be encrypted to something like
Ae34Bf98Gh65Jk21, which can’t be read without the decryption key.
LET’S START WITH DBMS :)
Data masking techniques
Common Data Masking Techniques

Nulling Out:
Replacing sensitive data with null values or placeholders like "XXXX". This
removes the original data but may reduce the usefulness of the dataset.

Example: An HR department wants to share employee records for analysis but


needs to hide salary information. They replace the salary field with null values or
placeholders like N/A or XXXX.
LET’S START WITH DBMS :)
Data masking techniques
Common Data Masking Techniques

Number Variance:
Altering numeric data by adding or subtracting a random value within a certain
range. For example, changing salary figures slightly to mask the exact amounts

Example: A finance department needs to protect the exact sales figures but still
wants to share data for trend analysis. They slightly alter the figures by adding or
subtracting a small random amount. For example, a sales figure of $10,000
might be changed to $10,020 or $9,980.
LET’S START WITH DBMS :)
Data masking techniques
Common Data Masking Techniques

Tokenization:
Tokenization is a technique used to replace sensitive data with non-sensitive
equivalents, called "tokens." These tokens can be stored and processed without
exposing the original sensitive data, while maintaining a mapping to the original
values when needed.

Example: Imagine a database that stores customer credit card information. To


protect this sensitive data, tokenization can be applied. Each original credit card
number is replaced with a unique token (e.g., TKN_XYZ123). These tokens have
no meaningful relationship to the actual credit card numbers.

You might also like