Dsa Merged
Dsa Merged
69
98
72
14
50
Data Structures and Algorithms
Dr. L. Rajya Lakshmi
Analysis: open addressing
• How many distinct probe sequences are possible with
• Linear probing?
• The initial probe decides the subsequent probes
• Quadratic probing?
• Here also, the subsequent probes are dependent on the initial probe
• Double hashing?
• When N is a prime or a power of 2, double hashing uses θ(N2) probe sequences, since
each possible (h’(k), h’’(k)) pair yields a distinct probe sequence
• For the analysis, we assume uniform hashing
• Uniform hashing: the probe sequence for each key is equally likely to
be any one of N! permutations of (0, 1, . . ., N-1)
• When the values of parameters are selected appropriately, double
hashing performs close to ideal scheme of uniform hashing
Analysis: open addressing
• Analysis is in terms of load factor α = n/N (n is the number of items in
the hash table and N is the capacity of the hash table)
• With open addressing, n ≤ N, so α ≤ 1
• Assume that we are using uniform hashing
• The probe sequence (h(k,0), h(k,1), . . ., h(k, N-1)) used for key k is
equally likely to be any of the permutation of (0, 1, . . ., N-1)
Analysis: open addressing
Theorem: Given an open addressing hash table with load factor α =
n/N < 1, the expected number of probes in an unsuccessful search is at
most 1/(1- α), assuming uniform hashing.
Proof:
Every probe except for the last one accesses an occupied bucket
that does not contain the required item
The last bucket accessed is an empty one
X denotes the number of probes made in an unsuccessful search
Ai: the event that an ith probe occurs and it is to a occupied
bucket
Analysis: open addressing
• Now consider the event {X ≥ i}
• 1st probe occurs and it is to a occupied bucket, 2nd probe occurs and
it is to a occupied bucket, . . ., (i-1)th probe occurs and it is to a
occupied bucket
• The event {X ≥ i} is the intersection of the events A1, A2, . . .Ai-1
• {X ≥ i} is A1ꓵ A2 ꓵ . . . ꓵ Ai-1
Analysis: open addressing
For a collection of events A1, A2, . . .Ai-1 the following relation
holds:
Pr{A1ꓵ A2 ꓵ . . . ꓵ Ai-1} = Pr{A1}. Pr{A2 | A1}. Pr{A3 | (A1 ꓵ A2)} . . .
Pr{Ai-1 | A1 ꓵ A2 ꓵ . . . ꓵ Ai-2}
Pr(A1) = n/N
Consider the event that jth probe occurs and it is to an occupied
bucket, given that the first j-1 probes were to occupied buckets, j > 1
In the jth probe we would find one of the remaining (n-(j-1))
items in one of the remaining (N-(j-1)) buckets (uniform hashing)
Probability of the event is (n-(j-1))/(N-(j-1))
Observing that n < N, (n-j)/(N-j) ≤ n/N, for all j s.t. 0 ≤ j < N
Pr{A2|A1} = n-1/(N-1); Pr{A3 | (A1 ꓵ A2)} = n-2/(N-2), . . ., Pr{Ai-1 |
A1 ꓵ A2 ꓵ . . . ꓵ Ai-2} = n-i+2/(N-i+2)
Analysis: open addressing
Since the event {X ≥ i} is A1ꓵ A2 ꓵ . . . ꓵ Ai-1,
𝑛 𝑛−1 𝑛−2 𝑛−𝑖+2
Pr{X ≥ i} = . . ... (for all i s.t. 0 ≤ i ≤ N)
𝑁 𝑁−1 𝑁−2 𝑁−𝑖+2
𝑛 𝑖−1
≤
𝑁
= αi-1
When a random variable takes values from the set of natural
numbers N = {0, 1, . . .}, we have a formula for its expectation:
E[X] = ∞ 𝑖=0 𝑖. Pr{𝑋 = 𝑖}
= ∞ 𝑖=0 𝑖. (Pr{𝑋 ≥ 𝑖} − Pr{𝑋 ≥ 𝑖 + 1}
= ∞ 𝑖=1 Pr{𝑋 ≥ 𝑖}
Analysis: open addressing
Using the above mentioned relation
E[X] = ∞𝑖=1 Pr{𝑋 ≥ 𝑖}
≤ ∞𝑖=1 α i−1
= ∞𝑖=0 α
i
1
=
1−α
• Unsuccessful search runs in O(1) if α is constant.
• If the hash table is half full, then the average number of probes in an
unsuccessful search is 2
• If hash table if 90% full, then the average number of probes in an
unsuccessful search is 10
Analysis: open addressing
Corollary: Inserting an item into an open addressing hash table with
load factor α requires at most 1/(1- α) probes on average assuming
uniform hashing.
Proof:
An item is inserted into a hash table if and only if there is room,
that is α < 1
Inserting an item: an unsuccessful search followed by placing the
item into the empty bucket found
The expected number of probes is at most 1/(1- α)
Analysis: open addressing
Theorem: Given an open addressing hash table with load factor α < 1,
the expected number of probes in a successful search is at most
1 1
ln
α 1− α
assuming uniform hashing and assuming that each key in the table is
equally likely to be searched for.
Proof:
A search for key k reproduces the probe sequence that was used
while inserting that key into hash table
By the corollary, if the key k was the (i+1)st item inserted into
hash table, then the expected number of probes in search for key is:
1/(1-i/N), that is, N/(N-i)
Analysis: open addressing
Averaging over all n items in the table,
1 𝑛−1 𝑁 𝑁 𝑛−1 1
𝑖=0 𝑁−𝑖 = 𝑖=0 𝑁−𝑖
𝑛 𝑛
1 𝑁 1
A= 𝑘=𝑁−𝑛+1 𝑘
α
𝑞+1 𝑞 𝑞
𝑝
𝑓 𝑥 𝑑𝑥 ≤ 𝑘=𝑝 𝑓(𝑘)≤ 𝑝−1 𝑓 𝑥 𝑑𝑥
1 𝑁 1
A ≤ 𝑁−𝑛( )𝑑𝑥
α 𝑥
1 𝑁
= ln
α N− n
1 1
= ln
α 1− α
Analysis: open addressing
• If the table is half full, then the expected number of probes in a
successful search is 1.387
• If the table is 90% full, then the expected number of probes in a
successful search is: 2.559
Selection sort
• Similar to the insertion, the given input array can be divided into two
parts: sorted and unsorted part
• Basic Principle: Take the smallest elements from the unsorted part
and move it to the end of the sorted part
Selection sort
Algorithm Selection_Sort(A[0. .n-1], n)
for i ← 0 to n-2 do
m←i
for j ← i+1 to n-1 do
if A[j] < A[m]
m←j
swap A[i] and A[m]
Selection sort
4 4
i m
i m
4 2
i m i m
2 2
i m i m
2 2
i m i m
Selection sort
2 2
i m i m
2 2
i m i m
2 2
i m i m
2 2
i m i m
Selection sort
Algorithm Selection_Sort(A[0. .n-1], n)
for i ← 0 to n-2 do
m←i
for j ← i+1 to n-1 do
if A[j] < A[m]
m←j
swap A[i] and A[m]
𝑛−2 𝑛−1
• T(n) = 𝑖=0 𝑗=𝑖+1 𝑐
𝑛−2
= 𝑖=0 𝑐(𝑛 − 𝑖 − 1) = c 𝑛−2
𝑖=0 (𝑛 − 1) - c
𝑛−2
𝑖=0 𝑖
= c(n-1)(n-1) – c(n-2)(n-1)/2 = cn(n-1)/2
Data Structures and Algorithms
Dr. L. Rajya Lakshmi
Heapsort
• Runs in O(n log n) time
• An example of in-place sorting algorithm
• A new data structure “heap” is used (different from the heap that we
use for garbage collection)
Heapsort
• A binary tree: An ordered rooted tree where each internal node can have
at most two children (left child and right child)
• Leaf nodes, level of a node
• A nearly complete binary tree: A binary tree where except for the last level,
the other levels are completely filled
16
16
14 10
14 10
8 7 9 3
8 7 9 3
2 4 1
2 4 1
Heapsort
• The binary heap data structure is an array object which can be viewed
as a nearly complete binary tree
16
14 10
8 7 9 3
2 4 1
Heapsort
• An array A that represents a heap is an object with two attributes:
• A.length (represents the size of A)
• A.heap_size (represents the number of elements in the heap that are stored
within A)
• Though A[1 . . A.length] may contain numbers, only the elements in
A[1 . . A.heap_size], where 0 ≤ A.heap_size ≤ A.length are valid
elements of the heap
• A[1] is the root
• Given an index “i“, can easily locate its parent, left and right child
Heapsort
Algorithm Parent(i)
return 𝑖/2
Algorithm Left(i)
return 2i
Algorithm Right(i)
return (2i+1)
Heapsort
• There are two kinds of binary heaps: max-heaps and min-heaps
• The elements stored in nodes satisfy a heap order property
• These two types of heaps differ in terms of heap order property
Heapsort
• Max-heap-property:
• For every node “i“ other than the root, A[Parernt(i)] ≥ A[i]
• The maximum elements is stored at the root
• The subtree rooted at a node stores elements no larger than that
stored at that node
Heapsort
16
14 10
8 7 9 3
2 4 1
Heapsort
• Min-heap-property:
• For every node “i“ other than the root, A[Parernt(i)] ≤ A[i]
• The minimum elements is stored at the root
• The subtree rooted at a node stores elements no smaller than that
stored in that node
Heapsort
2 4
3 5 6 8
7 9 10
Heapsort
• Height of a node: the length (number of edges) of the longest
downward path from that node to a leaf
• Height of the tree: height of the root
• A heap of “n” elements is a nearly complete binary tree, its height is
θ(log n)
• Basic operations on heap run in time proportional to the height of the
tree, thus the time complexity of these operations is O(log n)
Heapsort
2 16
16 10 2 10
14 7 9 3 14 7 9 3
8 4 1 8 4 1
Heapsort
16
16
14 10
14 10
8 7 9 3
2 7 9 3
2 4 1
8 4 1
Heapsort
Algorithm Max_Heapify(A, i)
l ← Left(i)
r ← Right(i)
if l ≤ A.heap_size and A[l] > A[i]
largest ← l
else
largest ← i
if r ≤ A.heap_size and A[r] > A[largest]
largest ← r
if largest ≠ i
swap A[i], A[largest]
Max_Heapify(A, largest)
Heapsort
• Max_Heapify assumes that the binary trees rooted at Left(i) and
Right(i) are max-heaps
• The running time on a subtree of size n rooted at node i is:
• Time required to fix up the relationships among the elements A[i], A[Left(i)],
A[Right(i)]
• Time to rum Max_Heapify on a subtree rooted at one of children of node i
• The children’s subtree size is at most 2n/3 (the bottom level of the
tree is exactly half full)
• T(n) <= T(2n/3) + θ(1)
• The solution to this recurrence is O(log n)
Data Structures and Algorithms
Dr. L. Rajya Lakshmi
Analysis of insertion sort
for j ← 1 to n-1 do n
key ← A[j] n-1
{insert A[j] into the sorted
sequence A[0. .j-1]}
i ← j-1 n-1
𝑛−1
𝑡𝑗
while i ≥ 0 and A[i] > key do 𝑗=1
𝑛−1
A[i+1] ← A[i] 𝑗=1
(𝑡𝑗 − 1)
𝑛−1
i-- 𝑗=1
(𝑡𝑗 − 1)
Algorithm Max_Heapify(A, i)
l ← Left(i) 16 10
r ← Right(i)
if l ≤ A.heap_size and A[l] > A[i] 14 7 9 3
largest ← l
else
8 4 1
largest ← i
if r ≤ A.heap_size and A[r] > A[largest]
largest ← r
if largest ≠ i
swap A[i], A[largest]
Max_Heapify(A, largest)
Heapsort
3 4
2 10 9 7
8 6 5
Heapsort
1 1
3 4 3 4
2 10 9 7 8 10 9 7
8 6 5 2 6 5
Heapsort
1 1
3 9 10 9
8 10 4 7 8 5 4 7
2 6 5 2 6 3
Heapsort
1 10
10 9 8 9
8 5 4 7 6 5 4 7
2 6 3 2 1 3
Heapsort
Algorithm Build_Max_Heap(A)
A.heap_size ←A.length
for i ← 𝐴. 𝑙𝑒𝑛𝑔ℎ𝑡/2 downto 1
Max_Heapify(A, i)
Heapsort
b c
d e f g
h i j k
Heapsort
• Time to run Max_Heapify at a node varies with the height of the node
• Time required to run Max_Heapify at a node of height h is O(h)
• An n element heap has height log(𝑛)
• At height “h”, there would be at most n/2h+1 nodes
• The total cost of Build_Max_Heap:
log(𝑛) log(𝑛)
ℎ=1 n/2h+1 O(h) which is O(n ℎ=1 h/2h
Heapsort
∞ 𝑖 1
𝑖=0 𝑥 =
1−𝑥
if |x| < 1 {differentiate both sides}
∞ i−1 1
𝑖=1 𝑖x =
1−𝑥 2 {multiply both sides by x}
∞ i 𝑥
𝑖=1 𝑖x = 1−𝑥 2
Taking x = ½ yields,
∞ i 1/2
𝑖=1 𝑖x = 1−1/2 2 =2
log(𝑛) ∞
O(n ℎ=1 h/2h is O(n ℎ=1 h/2h which can be written
1/2
as O(n 1 ) which is O(n)
1− 2
2
Heapsort
• Consider an array A[1 . . n], n is A.length
• Run Build_Max_Heap on A
• The largest element is sitting in A[1]
• Swap A[1] and A[n], decrement A.heap_size by 1, and run
Max_Heapify on the new root
• Repeat the above step until A.heap_size becomes 1
Heapsort
Algorithm Heapsort(A)
Build_Max_Heap(A)
for i ← A.length downto 2
swap A[1] and A[i]
Max_Hepify(A, 1)
• Build_Max_Heap() takes O(n) time
• Each of n-1 calls of Max_Heapify() takes O(log n) time
• Running time is O(n log n)
Heapsort
5 5
13 2 13 2
25 7 17 20 25 51 17 20
8 4 51 8 4 7
5 13 2 25 7 17 20 8 4 51 5 13 2 25 51 17 20 8 4 7
Heapsort
5 5
13 2 13 20
25 51 17 20 25 51 17 2
8 4 7 8 4 7
5 13 2 25 51 17 20 8 4 7 5 13 20 25 51 17 2 8 4 7
Heapsort
5 51
51 20 25 20
25 13 17 2 8 13 17 2
8 4 7 5 4 7
5 51 20 25 13 17 2 8 4 7 51 25 20 8 13 17 2 5 4 7
Heapsort
7
51 25
25 20 13 20
8 13 17 2 8 7 17 2
5 4 7 5 4
51 25 20 8 13 17 2 5 4 7 25 13 20 8 7 17 2 5 4 51
7 25 20 8 13 17 2 5 4 51
Heapsort
20
4
25
13 17
13 20
8 7 4 2
8 7 17 2
5
5 4
25 13 20 8 7 17 2 5 4 51 20 13 17 8 7 4 2 5 25 51
4 13 20 8 7 17 2 5 25 51
Heapsort
5
20 17
13 17 13 5
8 7 4 2 8 7 4 2
17 13 5 8 7 4 2 20 25 51
20 13 17 8 7 4 2 5 25 51
5 13 17 8 7 4 2 20 25 51
Heapsort
17
2 13
13 5 8 5
8 7 4 2 2 7 4
17 13 5 8 7 4 2 20 25 51
13 8 5 2 7 4 17 20 25 51
2 13 5 8 7 4 17 20 25 51
Heapsort
13
4 8
8 5 7 5
2 7 4 2 4
13 8 5 2 7 4 17 20 25 51 8 7 5 2 4 13 17 20 25 51
4 8 5 2 7 13 17 20 25 51
Heapsort
48 7
77 55 4 5
22 4 2
8 7 5 2 4 13 17 20 25 51 7 4 5 2 8 13 17 20 25 51
4 7 5 2 8 13 17 20 25 51
Heapsort
27
5
44 55
4 2
7 4 5 2 8 13 17 20 25 51 5 4 2 7 8 13 17 20 25 51
2 4 5 7 8 13 17 20 25 51
Heapsort
4
52
2
4 2
5 4 2 7 8 13 17 20 25 51 4 2 5 7 8 13 17 20 25 51
2 4 5 7 8 13 17 20 25 51
Heapsort
24
2 4 5 7 8 13 17 20 25 51
2 4 5 7 8 13 17 20 25 51
Data Structures and Algorithms
CS F211
Vishal Gupta
Department of Computer Science and Information Systems
BITS Pilani Birla Institute of Technology and Science
Pilani Campus Pilani Campus, Pilani
BITS Pilani
Pilani Campus
Examples of trees:
- Directory tree
- Family tree
- Company organization chart
-Table of contents
Tree edges
root
level 0
level 1 internal nodes
level 2
parent
and
level 3
child
leaves
BITS Pilani, Pilani Campus
Tree Terminology (1)
• A vertex (or node) is an object that can have a name and can
carry other associated information.
• The first or top node in a tree is called the root node.
• An edge is a connection between two vertices.
• A path in a tree is a list of distinct vertices in which successive
vertices are connected by edges in the tree.
• The defining property of a tree is that there is precisely one
path connecting any two nodes.
• A disjoint set of trees is called a forest.
• Nodes with no children are leaves, terminal or external nodes.
BITS Pilani, Pilani Campus
Tree Terminology (2)
Each node except the root has exactly one node above it in
the tree, (i.e. it’s parent), and we extend the family
analogy talking of children, siblings, or grandparents.
Nodes that share parents are called siblings.
10 16
8 12 15 18
7 9 11 13
BITS Pilani, Pilani Campus
Complete Binary Trees: Array Representation
1
14
2 3
10 16
4 5 6 7
8 12 15 18
8 9 10 11
7 9 11 13
1 preorder: 1 2 3
2 3
inorder: 2 1 3
postorder: 2 3 1
Tree Traversal: InOrder
In-order traversal
• Visit left subtree (if there is one) In Order
• print the key of the current node
• Visit right subtree (if there is one) In Order
I
Tree Traversal: PreOrder
Another common traversal is PreOrder.
It goes as deep as possible (visiting as it goes) then left to
right
• print root
• Visit left subtree in PreOrder
• Visit right subtree in PreOrder
Tree Traversal: PostOrder
PostOrder traversal also goes as deep as possible, but
only visits internal nodes during backtracking.
recursive:
•Visit left subtree in PostOrder
•Visit right subtree in PostOrder
•print root
Data Structures and Algorithms
CS F211
Vishal Gupta
Department of Computer Science and Information Systems
BITS Pilani Birla Institute of Technology and Science
Pilani Campus Pilani Campus, Pilani
BITS Pilani
Pilani Campus
root
root
14
C
root 10 16
A D
8 11 15 18
14
10 16
8 11 15
1) Insert C 2) Insert A
C
C
A
3) Insert B C
5) Insert M
A
B C
A L
4) Insert L C B M
A L
B C
C
A L
L
M B M
• Find ( 2 )
10 5
10 > 2, left 5 > 2, left
5 > 2, left 2 45 2 = 2, found
5 30
2 = 2, found
2 25 45 30
10
25
11
BITS Pilani, Pilani Campus
Binary Search Tree - Insertion
Insert Algorithm
• If value we want to insert < key of current
node, we have to go to the left subtree
• Otherwise we have to go to the right subtree
• If the current node is empty (not existing)
create a node with the value we are
inserting and place it here.
5 9
12
12
Red-black properties:
1. Every node is either red or black
2. Every leaf (NULL pointer) is black
3. If a node is red, both children are black
4. Every path from node to descendent leaf
contains the same number of black nodes
5. The root is always black
BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Red-Black Trees:
The Problem With Insertion
• Insert 8 7
– Where does it go? 5 9
12
8 12
11
1. Every node is either red or black
2. Every leaf (NULL pointer) is black
3. If a node is red, both children are black
4. Every path from node to descendent leaf
contains the same number of black nodes
5. The root is always black
BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Red-Black Trees:
The Problem With Insertion
• Insert 11 7
– Where does it go? 5 9
– What color?
• Can’t be red! (#3) 8 12
11
1. Every node is either red or black
2. Every leaf (NULL pointer) is black
3. If a node is red, both children are black
4. Every path from node to descendent leaf
contains the same number of black nodes
5. The root is always black
BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Red-Black Trees:
The Problem With Insertion
• Insert 11 7
– Where does it go? 5 9
– What color?
• Can’t be red! (#3) 8 12
• Can’t be black! (#4) 11
1. Every node is either red or black
2. Every leaf (NULL pointer) is black
3. If a node is red, both children are black
4. Every path from node to descendent leaf
contains the same number of black nodes
5. The root is always black
BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Red-Black Trees:
The Problem With Insertion
• Insert 11 7
– Where does it go? 5 9
– What color?
• Solution: 8 12
recolor the tree
11
1. Every node is either red or black
2. Every leaf (NULL pointer) is black
3. If a node is red, both children are black
4. Every path from node to descendent leaf
contains the same number of black nodes
5. The root is always black
BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Red-Black Trees:
The Problem With Insertion
• Insert 10 7
– Where does it go? 5 9
8 12
11
1. Every node is either red or black
2. Every leaf (NULL pointer) is black
3. If a node is red, both children are black
4. Every path from node to descendent leaf
contains the same number of black nodes
5. The root is always black
BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Red-Black Trees:
The Problem With Insertion
• Insert 10 7
– Where does it go? 5 9
– What color?
8 12
11
1. Every node is either red or black
2. Every leaf (NULL pointer) is black 10
3. If a node is red, both children are black
4. Every path from node to descendent leaf
contains the same number of black nodes
5. The root is always black
BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Red-Black Trees:
The Problem With Insertion
• Insert 10 7
– Where does it go? 5 9
– What color?
• A: no color! Tree 8 12
is too imbalanced
11
• Must change tree structure
to allow recoloring
10
– Goal: restructure tree in
O(lg n) time
y rightRotate(y) x
x C A y
leftRotate(x)
A B B C
5 9
8 12
11
5 12
8 11
• Move up the tree until there are no violations or we are at the root.
• In the following discussion we will assume the parent is a left child
(if the parent is a right child perform the same steps swapping
``right'' and ``left'')
BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Red-Black Trees: Insertion
RB-Insert(T,x)
Case I: x's uncle is Red
• Change x's grandparent to Red
• Change x's uncle and parent to Black
• Change x to x's grandparent
Agenda: B Trees
B Tree: Motivation
B-Trees 4
BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Motivation (cont.)
• Assume that we use an AVL tree to store about
20 million records
• We end up with a very deep binary tree with lots
of different disk accesses; log2 20,000,000 is
about 24, so this takes about 0.2 seconds
• We know we can’t improve on the log n lower
bound on search for a binary tree
• But, the solution is to use more branches and
thus reduce the height of the tree!
– As branching increases, depth decreases
B-Trees 5
BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
B-Trees
B-Trees are useful in the following cases:
Insert B:
Insert Q:
Insert L:
Insert F:
Insert F:
Deletion: B-Tree-Delete(x, k)
1) Search down the tree for node containing k
2) When B-Tree-Delete is called recursively, the number of keys in x must
be at least the minimum degree t (the root can have < t keys)
1) If x is a leaf, just remove key k and still have at least t-1 keys in x
2) If there are not ≥ t keys in x, then borrow keys from other nodes.
Deletion Example
a)If the child y that precedes k in node x has at least t keys, then find
the predecessor k' of k in the subtree rooted at y. Recursively
delete k', and replace k by k' in x. (Finding k' and deleting it can be
performed in a single downward pass.)
a) If ci[x] has only t - 1 keys but has a sibling with t keys, give ci[x]
an extra key by moving a key from x down into ci[x], moving a
key from ci[x]'s immediate left or right sibling up into x, and
moving the appropriate child from the sibling into ci[x].
a) If ci[x] and all of ci[x]'s siblings have t - 1 keys, merge ci with one
sibling, which involves moving a key from x down into the new
merged node to become the median key for that node.
Deletion Example
Delete F
Delete M
Delete G
Delete D
Delete B
B-Trees 40
BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Comparing Trees
• Binary trees
– Can become unbalanced and lose their good time complexity
(big O)
– AVL trees are strict binary trees that overcome the balance
problem
– Heaps remain balanced but only prioritise (not order) the keys
• Multi-way trees
– B-Trees can be m-way, they can have any (odd) number of
children
– One B-Tree, the 2-3 (or 3-way) B-Tree, approximates a
permanently balanced binary tree, exchanging the AVL tree’s
balancing operations for insertion and (more complex) deletion
operations
B-Trees 41
BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
BITS Pilani
Pilani|Dubai|Goa|Hyderabad
Data Structures and Algorithms
CS F211
Vishal Gupta
Department of Computer Science and Information Systems
BITS Pilani Birla Institute of Technology and Science
Pilani|Dubai|Goa|Hyderabad
Pilani Campus, Pilani
BITS Pilani
Pilani|Dubai|Goa|Hyderabad
• EXTRACT-MIN (H)
– Deletes the node from heap H whose key is
minimum. Returns a pointer to the node.
• DECREASE-KEY (H, x, k)
– Assigns to node x within heap H the new value k
where k is smaller than its current key value.
B0 B1 B2
B3
B3
B2 B0
B1
B4
BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Binomial Trees
B1 Bo
B2
Bk-2
Bk-1
Bk
BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Properties of Binomial Trees
LEMMA: For the binomial tree Bk ;
1. There are ____ nodes,
2. The height of tree is ___,
3. There are exactly ____ nodes at depth i
for i = 0,1,..,k and
4. The root has degree ___ > degree of any other
node if the children of the root are numbered
from left to right as k-1, k-2,...,0; child i is the
root of a subtree Bi.
Bk-1
Bk-1 true by induction
k-1 k
D(k,i)=D(k-1,i -1) + D(k-1,i) = k-1
+ =
i -1 i i
13 =< 1, 1, 0, 1>2
Consists of B0, B2, B3
head[H]
10 1 6
B0 12 25 8 14 29
18
B2 11 17 38
B3
27
parent
key 12 25
degree 1 0
child 18
0 sibling
MAKE-BINOMIAL-HEAP ( )
allocate H RUNNING-TIME= Θ(1)
head [ H ] NIL
return H
end
RUNNING–TIME = O (lgn)
BINOMIAL-LINK (y,z)
p [y] z
sibling [y] child [z]
child [z] y
degree [z] degree [z] + 1
end
z
+1
NIL
sibling [y]
Uniting Two Binomial Heaps: Cases
We maintain 3 pointers into the root list
Bk Bl
l >k
prev-x x next-x
a b c d
Bk Bl
a b c d
BK BK BK
prev-x x next-x
a b c d
BK BK BK
a b c d
Bk Bk Bl l>k
prev-x x next-x
CASE 3
a b d
key [b] ≤ key [c]
c
prev-x x next-x
CASE 4
a c d
b
BINOMIAL-HEAP-MERGE PROCEDURE
H1 with n1 NODES : H1 =
H2 with n2 NODES : H2 =
5 4 3 2 1 0
CASE2 MARCH B1
then CASE3 and B2 x next-x
CASE4 LINK
Cin=1 B0 B2 B2 B2 B4 B5 B5
1+1=11
CASE1
MARCH
Cin=0 x next-x
0+1=1
B0 B2 B3 B4 B5 B5
CASE3 OR 4 x
LINK Cin=0 B0 B2 B3 B4 B5
1+0=10
B5 B6
MERGE x next-x
( H,H’) B0 B0 B1 B4 B5
5 4 3 2 1 0
x next-x 1
LINK B5 1 1 0 0 1 1
B0 B1 B4
1
+
B0 B2 B4 B5
1 1 0 1 0 0
B1 B4 B5
LINK
B1
Extracting the Node with the Minimum Key
BINOMIAL-HEAP-EXTRACT-MIN (H)
(1) find the root x with the minimum key in the
root list of H and remove x from the root list of H
(2) H’ MAKE-BINOMIAL-HEAP ( )
(3) reverse the order of the linked list of x’ children
and set head [H’] head of the resulting list
(4) H BINOMIAL-HEAP-UNION (H, H’)
return x
end
x
head [H]
B0 B1
B4
B1 B0
B2
x
head [H]
B0 B1
B4
B2 B1 B0
head [H’]
• G1
– V(G1)={0,1,2,3} 0
– E(G1)={(0,1),(0,2),
(0,3),(1,2),(1,3), (2,3)}
1 2
• G2
– V(G2)={0,1,2,3,4,5,6} 0
– E(G2)={(0,1),(0,2),
(1,3),(1,4),(2,5),(2,6)} 1 2
• G2 is also a tree
– Tree is a special case of 3 4 5 6
graph
• G3 0
– V(G3)={0,1,2}
– E(G3)={<0,1>,<1,0
1
>,<1,2>}
• Directed graph
2
(digraph)
• Cycle 0
– a simple path, first and last
vertices are same. 1 2
• 0,1,2,0 is a cycle 3
• Acyclic graph
– No cycle is in graph
3 6
0
0
1 2 7
0 1 1 0 0 0 0 0
3 1
1 0 0 1 0 0 0 0
0 1 1 1
0 1 0 1 0 0 1 0 0 0 0
1 0 1 1
1 0 1 0 1 1 0 0 0 0 0
1 1 2 0 0 0 0
1 0
0 0 0 0 1 0 0
1 1 1 0
G2 0 0 0 0 1 0 1 0
G1 0 0 0 0 0 1 0 1
symmetric
undirected: n2/2 0 0 0 0 0 0 1 0
directed: n2
G 4 3 of UGC Act, 1956
BITS Pilani, Deemed to be University under Section
Merits of Adjacency Matrix
n 1 n 1
ind (vi ) A[ j , i ] outd (vi ) A[i , j ]
j 0 j 0
Adjacency Lists
• Replace n rows of the adjacency matrix with n
linked list
#define MAX_VERTICES 50
typedef struct node *node_pointer;
typedef struct node {
int vertex;
struct node *link;
};
node_pointer graph[MAX_VERTICES];
int n=0; /* vertices currently in use */
BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Example(1)
0
0
1 2 1
3
0 2
1 2 3
1 0 2 3 0 1
2 0 1 3 1 0 2
3 0 1 2 2
G1 G3
0 1 2
0 4
1 0 3
2 1 5 2 0 3
3 6 3 1 2
4 5
7 5 4 6
6 5 7
7 6
G4
BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Interesting Operations
VISHAL GUPTA, PhD 20 BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Breadth First Search
VISHAL GUPTA, PhD 21 BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Breadth First Search
VISHAL GUPTA, PhD 22 BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Breadth First Search
Breadth_First_Search(G, s) While (Q )
for each vertex u G.V – {s} {
u.color = WHITE u = DEQUEUE (Q)
u.d = for each v G.Adj[u]
u. = NIL {
s.color = GRAY if v.color == WHITE
s.d = 0 v.color = GRAY
s. = NIL v.d = u.d + 1
Q= v. = u
ENQUEUE (Q, s) ENQUEUE (Q, v)
}
u.color = BLACK
}
VISHAL GUPTA, PhD 23 BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Depth first search
Global Variable: time DFS-VISIT (G, u)
DEPTH_FIRST_SEARCH time = time+1
for each vertex u G.V u.d = time
{ u.color = GRAY
u.color = WHITE for each v G.Adj[u]
u. =NIL {
} if v.color == WHITE
time = 0 v. = u
for each vertex u G.V DFS-VISIT (G, v)
{ }
if u.color == WHITE u.color = BLACK
DFS-VISIT (G, u) time = time+1
} u.f = time
VISHAL GUPTA, PhD 24 BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
DFS: Properties
• discovery and finishing times have parenthesis structure.
VISHAL GUPTA, PhD 25 BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Parenthesis Theorem
In any depth first search of G = (V, E), for any two vertices u and v, exactly
one of the following three conditions holds:
a) The intervals [u.d, u.f] and [v.d, v.f] are entirely disjoint, and neither u nor
v is a descendant of the other in the depth first forest.
b) The interval [u.d, u.f] is contained entirely within the interval [v.d, v.f],
and u is a descendent of v in a depth first tree, or
c) The interval [v.d, v.f] is contained entirely within the interval [u.d, u.f],
and v is a descendent of u in a depth first tree
VISHAL GUPTA, PhD 26 BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Classification of Edges
• DFS can be used to classify the edges of G into:
a) Tree Edge: Edge (u, v) is a tree edge if v was first discovered by exploring edge
(u, v).
b) Back edges: Edges (u, v) connecting a vertex u to an ancestor v in a depth-first
tree. Self-loops, which may occur in directed graphs, are considered to be back
edges.
c) Forward edges: Those non-tree edges (u, v) connecting a vertex u to a
descendant v in a depth-first tree.
d) Cross edges are all other edges. They can go between vertices in the same
depth-first tree, as long as one vertex is not an ancestor of the other, or they can
go between vertices in different depth-first trees.
• Edge (u, v) can be classified by the color of the vertex v that is reached when
the edge is first explored
1. WHITE indicates a tree edge,
2. GRAY indicates a back edge, and
3. BLACK indicates a forward or cross edge.
VISHAL GUPTA, PhD 27 BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Properties
• In a depth-first search of an undirected graph G, every edge of
G is either a tree edge or a back edge.
• A directed graph is acyclic if and only if a depth-first search
yields no “back” edges.
• For a weighted graph, DFS traversal of the graph produces the
minimum spanning tree and all pair shortest path tree.
• We can specialize the DFS algorithm to find a path between
two given vertices u and z.
i. Call DFS(G, u) with u as the start vertex.
ii. Use a stack S to keep track of the path between the start
vertex and the current vertex.
iii. As soon as destination vertex z is encountered, return the
path as the contents of the stack
VISHAL GUPTA, PhD 28 BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Properties
• In a depth-first search of an undirected graph G, every edge of
G is either a tree edge or a back edge.
• A directed graph is acyclic if and only if a depth-first search
yields no “back” edges.
• For a weighted graph, DFS traversal of the graph produces the
minimum spanning tree and all pair shortest path tree.
• We can specialize the DFS algorithm to find a path between
two given vertices u and z.
i. Call DFS(G, u) with u as the start vertex.
ii. Use a stack S to keep track of the path between the start
vertex and the current vertex.
iii. As soon as destination vertex z is encountered, return the
path as the contents of the stack
VISHAL GUPTA, PhD 29 BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Topological sort
• A topological sort of a DAG G = (V, E) is a linear ordering of all its vertices
such that if G contains an edge (u, v), then u appears before v in the
ordering.
• A topological sort of a graph can be viewed as an ordering of its vertices
along a horizontal line so that all directed edges go from left to right.
• If the graph is not acyclic, then no linear ordering is possible.
TOPOLOGICAL-SORT (G)
1. call DFS(G) to compute finishing times u.f for each vertex u
2. As each vertex is finished, insert it onto the front of a linked list
3. Return the linked list of vertices
VISHAL GUPTA, PhD 30 BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Topological sort
VISHAL GUPTA, PhD 31 BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Data Structures and Algorithms
CS F211
Vishal Gupta
Department of Computer Science and Information Systems
BITS Pilani Birla Institute of Technology and Science
Pilani|Dubai|Goa|Hyderabad
Pilani Campus, Pilani
BITS Pilani
Pilani|Dubai|Goa|Hyderabad
A wide array of graph problems that can be solved in polynomial time are
variants of these above problems.
In this class, we’ll cover the first two problems – shortest path and minimum
spanning tree
3
Minimum Spanning Trees
• It’s the 1920’s. Your friend at the electric company needs to choose where to build
wires to connect all these cities to the plant.
B
6
3
E
2
1 C
A 1
9
5 0
7
4
D
8
She knows how much it would cost to lay electric wires between any pair of locations, and wants the cheapest way to
make sure electricity from the plant to every city.
• It’s 1950’s
the 19Your friend
boss at the electricompany
phone needs to choose where to build wires to
connect all these cities each
to the pl
other.
B
6
3
E
2
1 C
A 1
9
5 0
7
4
D F
8
phone wires between any pair of locations, and wants the cheapest way
She knows how much it would cost to lay electric
to make sure electricity from the
Everyone can plant to every else.
call everyone city.
It’s the 1
today Your friend at the electric
ISP company needs to choose where to build wires
to connect all these cities to the plant.
Internet with fiber optic cable
B
6
3
E
2
1 C
A 1
9
5 0
7
4
D
8
Cable
She knows how much it would cost to lay electric wires between any pair of locations, and wants the cheapest way to
make sure electricityEveryone
from thecan
plant to every
reach city.
the server
is minimized.
VISHAL GUPTA, PhD 9 BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Generic MST Algorithm
GENERIC_MST (G, w)
A=
while A does not form a spanning tree
find an edge (u, v) that is safe for A
A = A U {(u, v)}
return A
VISHAL GUPTA, PhD 10 BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Definitions
• A cut (S, V - S) of an undirected graph G = (V, E) is a partition of V.
• We say that an edge (u, v) E crosses the cut (S, V - S) if one of its
endpoints is in S and the other is in V - S.
• We say that a cut respects a set A of edges if no edge in A crosses the cut
• An edge is a light edge crossing a cut if its weight is the minimum of any
edge crossing the cut. Note that there can be more than one light edge
crossing a cut in the case of ties.
VISHAL GUPTA, PhD 11 BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Theorem
Let G = (V, E) be a connected, undirected graph with a real-valued weight
function w defined on E. Let A be a subset of E that is included in some
minimum spanning tree for G, let (S, V - S) be any cut of G that respects A,
and let (u, v) be a light edge crossing (S, V - S). Then, edge (u, u) is safe for A.
VISHAL GUPTA, PhD 12 BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Kruskal’s Algorithm
MST_Kruskal (G, w)
A=
for each vertex v G.V
MAKE-SET (v)
Sort the edges of G.E into non-decreasing order by weight w
for each edge (u, v) G.E taken in non-decreasing order by weight
if FIND-SET(u) FIND-SET (v)
A = A U {(u, v)}
return A
VISHAL GUPTA, PhD 13 BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Prim’s Algorithm
MST_PRIM (G, w, r)
for each u G.V
u.key =
u. = NIL
r.key = 0
Q = G.V
while Q
u = EXTRACT_MIN (Q)
for each v G.adj[u]
if v Q and w(u, v) < v.key
v. = u
v.key = w(u, v)
VISHAL GUPTA, PhD 14 BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
CS F211
(Data Structures and Algorithms)
Vishal Gupta
Department of Computer Science and Information Systems
BITS Pilani Birla Institute of Technology and Science
Pilani|Dubai|Goa|Hyderabad Pilani Campus, Pilani
BITS Pilani
Pilani|Dubai|Goa|Hyderabad
► Define i1
► Example.
3
► The breadth-first-search algorithm is a shortest-paths algorithm that
works on unweighted graphs (or each edge have unit weight).
Variants
4
Single-destination shortest-paths problem: Find a shortest path to a
given destination vertex t from each vertex v.
5
Lemma 24.1: (Subpaths of shortest paths are shortest paths)
Given a weighted, directed graph G = (V, E) with weight function w, let
p = 〈 v1, v2,..., vk〉 be a shortest path from v1 to vk, then the
subpath pij = 〈 vi, vi+1,..., vj〉 is a shortest path from vi to vj.
Negative-weight edges
7
Cycles
► Since any acyclic path in a graph G = (V, E) contains at most |V| distinct
vertices, it also contains at most |V| - 1 edges. Thus, we can restrict our
attention to shortest paths of at most |V| - 1 edges.
8
Representing shortest paths
► Besides finding the shortest path weight, we wish to find the vertices on
shortest paths.
► Given a graph G = (V, E), π[v] denotes the predecessor of vV (as BFS
does!). PRINT-PATH(G, s, v) can print a shortest path from s to v.
Suppose π[v] for every v.
9
PRINT-PATH(G, s, v)
1 if v = s
2 then print s
3 else if π[v] = NIL
4 then print "no path from" s "to" v "exists"
5 else PRINT-PATH(G, s, π[v])
6 print v
10
► A shortest-paths tree rooted at s is a directed subgraph G' = (V', E'), where
V' ⊆ V and E' ⊆ E, such that
1. V' is the set of vertices reachable from s in G,
2. G' forms a rooted tree with root s, and
3. for all v∈V', the unique simple path from s to v in G' is a shortest
path from s to v in G.
Relaxation
► For each vertex v, d[v] is an upper bound on the weight of a shortest path
from s to v and is called a shortest-path estimate.
11
INITIALIZE-SINGLE-SOURCE(G, s) RELAX(u, v, w)
1 for each vertex v ∈ V[G] 1 if d[v] > d[u] + w(u, v)
2 do d[v] ← ∞ 2 then d[v] ← d[u] + w(u, v)
3 π[v] ← NIL 3 π[v] ← u
4 d[s] ← 0
12
► The process of relaxing an edge (u, v) consists of testing whether we can
improve the shortest path to v found so far by going through u and, if so,
updating d[v] and π[v].
13
No-path property (Corollary 24.12)
If there is no path from s to v, then we always have d[v] = δ(s, v) = ∞.
14
1. The Bellman-Ford algorithm
► Bellman-Ford algorithm:
1 INITIALIZE-SINGLE-SOURCE(G, s)
2 for i ← 1 to |G.V| - 1
3 do for each edge (u, v) ∈ G.E
4 do RELAX(u, v, w)
16
► Each pass relaxes the edges in the order (t, x), (t, y), (t, z), (x, t), (y, x), (y, z),
(z, x), (z, s), (s, t), (s, y). There are four passes!!
17
► Analysis of running time:
The Bellman-Ford algorithm runs in time O(V E).
The initialization: Θ(V) time
Each passe: Θ(E) time
The for loop of lines 5-7 takes O(E) time.
18
2. Single-source shortest paths in directed acyclic graphs
DAG-SHORTEST-PATHS(G, w, s)
19
► Running time: Θ(V + E) time.
20
Theorem 24.5
If a weighted, directed graph G = (V, E) has source vertex s and
no cycles, DAG-SHORTEST-PATHS returns d[v] = δ(s, v)
for all vertices v ∈ V, and Gπ is a shortest-paths tree.
21
DIJKSTRA’s ALGORITHM
DIJKSTRA (G, w, s)
INITIALIZE-SINGLE-SOURCE (G,s)
S=
Q = G.V
while Q
u = EXTRACT-MIN (Q)
S = S U {u}
for each vertex v G.Adj[u]
RELAX (u, v, w)
3. Dijkstra's algorithm
relaxed
23
► Dijkstra's algorithm uses a greedy strategy (always chooses the "lightest"
or "closest" vertex in V - S to add to set S).
Proof Claim: For each vertex u ∈ V, we have d[u] = δ(s, u) at the time when
u is added to set S.
Suppose this is not true. Let u be the first vertex for which d[u] (s,u)
when it is added to S. Let P be a shortest path from s to u.
Let y be the first vertex along P such that y S , and let x S be y’s
predecessor.
24