0% found this document useful (0 votes)
2 views

CS1

The document provides an overview of algorithm analysis using Big-O notation, defining key concepts such as time and space complexity. It discusses various algorithms for addition, multiplication, and sorting, including divide-and-conquer strategies like mergesort and binary search, while also introducing graph theory concepts. Additionally, it explains the master theorem for analyzing recursive algorithms and presents a randomized selection algorithm for finding the kth smallest element.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

CS1

The document provides an overview of algorithm analysis using Big-O notation, defining key concepts such as time and space complexity. It discusses various algorithms for addition, multiplication, and sorting, including divide-and-conquer strategies like mergesort and binary search, while also introducing graph theory concepts. Additionally, it explains the master theorem for analyzing recursive algorithms and presents a randomized selection algorithm for finding the kth smallest element.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 16

0 Introduction

0.1 Big-O notation


We will express the running time by counting the number of basic steps, as a function of the size of the input.
Say that an algorithm takes 5n3 + 4n + 3 steps on an input of size n, it is much simpler to leave out lower-order terms
such as 4n and 3 (which become insignificant as n grows), and even the detail of the coefficient 5 in the leading term.

Definition: Let f (n) and g(n) be functions from positive integers to positive reals.
We say f = O(g) (which means that “f grows no faster than g”) if there is a constant c > 0 such that f (n) ≤ c · g(n).

Big-O notation is a way to describe how well an algorithm performs as its inputs size grows (n).
As we add more inputs to an algorithm there are two things that may grow in complexity: time complexity (it may
take longer) and space complexity (it may required more memory).

Remark: If f (n) = O(g) and h(n) = O(l) , then f (n) + h(n) = O(max(g(n), l(n))) , and f (n) · h(n) = O(g(n) · l(n)).

Definition: Let f (n) and g(n) be functions from positive integers to positive reals.
We say f = Ω(g) if there is a constant c > 0 such that f (n) ≥ c · g(n) , in other words f = Ω(g) means g = O(f ).

Definition: Let f (n) and g(n) be functions from positive integers to positive reals.
We say f = Θ(g) if there are two constant c1 , c2 > 0 such that f (n) ≤ c1 · g(n) and f (n) ≥ c2 · g(n),
in symbols we have f = O(g) and f = Ω(g) .

1 Algorithms with numbers


Bases and logs
- How many digits are needed to represent a number N in base b?
With k digits in base b we can express up to bk − 1 , indeed in decimal (b = 10) 3 digits arrive to 103 − 1 = 999. By
solving for k we find that we need logb (N + 1) digits to write a number N in base B.

- How much does the size of a number change when we change bases?
loga N
Recall the rule for converting logarithms from base a to base b: logb N = loga b .
So the size of integer N in base a is the same as its size in base b, times a constant factor loga b, therefore in big-O
notation the base is irrelevant, and we write the size simply as O(log N ).

1.1 Addition
We align their right-hand ends, and then perform a single right-to-left pass in which the sum is computed digit by
digit, maintaining the overflow as a carry.
110101 (53)
+ 100011 (35)
1011000 (88)

- Given two binary numbers x and y, how long does our algorithm take to add them?
We want the answer expressed as a function of the size of the input: the number of bits of x and y.
Suppose x and y are each n bits long, then the sum of x and y is n + 1 bits at most, and the running time is is therefore
of the form c0 + c1 n (it is linear), in other words O(n).

1
1.2 Multiplication
In order to multiply two numbers x and y we have to create an array of intermediate sums, each representing the
product of x by a single digit of y. These values are appropriately left-shifted and then added up.
1101
× 1011
1101
1101
0000
1101
10001111

Note that adding zeros to the right means multiplying by base 2 (left-shifting).
n intermediate rows, with lengths of up to 2n bits (left-shifting into account).
The total time is O(n) + O(n) + · · · + O(n) , n − 1 times (since we have n - 1 sums to do), which is O(n2 ).

Egyptian multiplication
To multiply two decimal numbers x and y, write them next to each other.
Divide the first number by 2, rounding down the result, and double the second number, and keep going till the first
number gets down to 1.
Then strike out all the rows in which the first number is even, and add up whatever remains in the second column.

11 13
5 26
2 52
1 104
143
The three numbers added are precisely the multiples of 13 by powers of 2 that were added in the binary method, so
we are doing the exact same thing.
2(x · ⌊ y2 ⌋)
(
if y is even
This algorithm can be rewritten in different ways, even in a recursive one: x · y = y
x + 2(x · ⌊ 2 ⌋) if y is odd
2
The total time taken is thus O(n ), just as before.

2 Divide-and-conquer algorithms
The mathematician Gauss once noticed that (a + bi)(c + di) = ac − bd + (bc + ad)i
can also be done with just three multiplication: ac, bd, and (a + b)(c + d) ,
since bc + ad = (a + b)(c + d) − ac − bd.
In our big-O way of thinking, reducing the number of multiplications from four to three seems wasted ingenuity, but
this improvement becomes very significant when applied recursively.

2.1 Multiplication
Suppose x and y are two n-bit integers, and assume for convenience that n is a power of 2.
As a first step toward multiplying x and y, we split each of them into their left and right halves, n/2 bits long.
n
x = xL | xR = 2 2 xL + xR
n
y = yL | yR = 2 2 yL + yR .

2
The product of x and y can then be written as:
n n n
xy = (2 2 xL + xR )(2 2 yL + yR ) = 2n xL yL + 2 2 (xL yR + xR yL ) + xR yR .

The significant operations are the four n2 -bit multiplications, xL yL , xL yR , xR yL , xR yR ,


and we can handle them with four recursive calls, so the running time is T (n) = 4T ( n2 ) + O(n)

However, it’s true that we have a radically new algorithm, but we haven’t yet made any progress in efficiency since the
newly found T (n) works out to O(n2 ).

Big improvement → This is where Gauss’s trick comes to mind.


Although the expression for xy seems to demand four n2 - bit multiplications, as before just three will do:
xL yL , xR yR , and (xL + xR )(yL + yR ), since xL yR + xR yL = (xL + xR )(yL + yR ) − xL yL − xR yR .
Now we have obtained an improved running time of T (n) = 3T ( n2 ) + O(n).
This improvement leads to a dramatically lower time bound of O(n1.59 ).
n
Explanation: We have a three with 3k sub-problems, each of size 2k
.
Size n

n n n
Size 2 Size 2 Size 2

n n n
Size 4 Size 4 Size 4

At the bottom, when k = log2 n , we have O(3log2 n ) = O(nlog2 3 ), which is the overall running time, equal to O(n1.59 ).

In the absence of Gauss’s trick, each level would have had 4 sub-problems instead of 3, and this would have lead to a
recursion tree with the same height (log2 n), but each level would have had more nodes.
Specifically, at the bottom level, there would have been 4log2 n = O(n2 ) nodes.

2.2 The master theorem


Let T (n) = aT (n/b) + O(nd ) for some constants a > 0, b > 1, and d ≥ 0. Then:

d
O(n )
 if d > logb a
d
T (n) = O(n log n) if d = logb a

O(nlogb a ) if d < logb a

This theorem tells us the running times of most of the divide-and-conquer procedures we are likely to use.

Binary Search
The ultimate divide-and-conquer algorithm is, of course, binary search: to find a key k in a large file containing keys
z[0, 1, . . . , n − 1] in sorted order, we first compare k with z[ n2 ], and depending on the result we recurse either on the
first half of the file, z[0, . . . , n2 − 1], or on the second half, z[ n2 ], . . . , n − 1].
The recurrence now is T (n) = T ( n2 ) + O(1), which is the case a = 1, b = 2, d = 0, and plugging into our master
theorem we get a running time of just O(log n).

3
2.3 Mergesort
This algorithm is based on a divide-and-conquer strategy, we split the list that we want to sort into two halves,
recursively sort each half, and then merge the two sorted sublists.

Function mergesort:
Input: An array of numbers a[1 . . . n]
Output: A sorted version of this array
if n > 1:
return merge(mergesort(a[1 . . . n2 ]),
mergesort(a[ n2 + 1 . . . n]))
else:
return a

Function merge:
Input: Two sorted arrays x[1 . . . k] and y[1 . . . l]
Output: A sorted version of the merged array x and y
if k = 0: return y[1 . . . l]
if l = 0: return x[1 . . . k]
if x[1] ≤ y[1]:
return x[1] ◦ merge(x[2 . . . k], y[1 . . . l])
else:
return y[1] ◦ merge(x[1 . . . k], y[2 . . . l])

The merge procedure does a constant amount of work per recursive call,
T (n) = 2T (n/2) + O(n), and plugging into our master theorem we get a running time of just O(n · logn).

Remark: Sorting algorithms can be depicted as trees, where the number of comparisons on the longest path from
root to leaf corresponds to the worst-case time complexity of the algorithm. Let’s do an example with an array of three
elements, a1, a2, a3.

a1 < a2 ?

no yes

a1 < a3 ? a2 < a2 ?

a2 < a3 ? 213 a1 < a3 ? 123

321 231 312 132

In this case 3 is the worst-case time complexity.

Theorem: Aany deterministic comparison-based sorting algorithm must perform Ω(n · logn) comparisons to sort n
distinct elements in the worst-case (which corresponds to the height of the decision tree).
For any deterministic comparison-based sorting algorithm, that we call A, we say that for all n ≥ 2 there exists an
input of size n such that A makes at least log2 n! = Ω(n · logn) comparisons (for n = 1 there’s nothing to sort).

4
2.4 Medians
The median of a sorted list of numbers is its middle element, but if the list has even length there are two choices for
what the middle element could be (we pick the smallest one).
Computing the median of n numbers is easy, we just have to sort them, and this takes O(n · logn) time.
However sorting is doing far more work than we really need, we just want the middle element and don’t care about
the relative ordering of the rest of them.

Selection
Input: A list of numbers S, and an integer k
Output: The kth smallest element of S
|S|
If k = 1, the algorithm searches for the minimum of S, whereas if k = 2 , the algorithm searches for the median.

A randomized divide-and-conquer algorithm for selection


For any number v, we create three sublists:
SL = {x ∈ S : x < v} , elements smaller than v.
SV = {x ∈ S : x = v} , elements equal to v.
SR = {x ∈ S : x > v} , elements greater than v.

Example: The array is split on v=5


S : 2 , 36 , 5 , 21 , 8 , 13 , 11 , 20 , 5 , 4 , 1.
SL = 2 , 4 , 1 .
SV = 5 , 5 .
SR = 36 , 21 , 8 , 13 , 11 , 20 .
If we want the eighth-smallest element of S, we know it must be the third-smallest element of SR since |SL | + |SV | = 5,
that is selection(S, 8) = selection(SR , 3).

selection(SL , k)
 if k ≤ |SL |
selection(S, k) = v if |SL | < k ≤ |SL | + |Sv |

selection(SR , k − |SL | − |Sv |) if k > |SL | + |SV |

The ideal situation being |SL |, |SR | ≈ 12 |S|, if this is the case, we would get a running time of T (n) = T ( n2 ) + O(n).

Lemma: After two split operations on average, the array will shrink to at most three-fourths of its size.
Let T (n) be the expected running time, on array of size n we get T (n) ≤ T ( 3n
4 + O(n)).
Time taken on an array of size n ≤ (time taken on an array of size 3n 4 ) + (time to reduce array size to ≤ 3n
4 ),
and from this recurrence we conclude that T (n) = O(n) on any input.

Explanation: After two split operations, the combined sizes of the resulting segments are at most three-quarters of the
original size of the original array.
This is because, on average, each split operation splits the array approximately in half, leading to this proportional
reduction in size.

5
3 Decompositions of graphs
Definition of graph: Formally, a graph G = (V, E) is a mathematical structure consisting of two finite sets V
(vertices) and E (edges), where E is a subset of V × V .
If e = (u, v) ∈ E then we say that u and v are endpoints of e, or that u is adjacent to v, and viceversa.

Definition of neighbour: A vertex joined by an edge to a vertex v is called a neighbour of v (the set of all adjacent
vertices to v = neighbour of v).

Definition of degree: The degree of a vertex v, denoted d(v), represents the number of edges incident with v
(an edge is said to be incident with a vertex v if v is one of the endpoints of that edge.).

Definition of “a walk on a graph”: It is a sequence of vertices and edges where each edge connects two adjacent
vertices in the sequence (every edge is incident with the vertices before and after it).

Definition of path: It is a walk where all the vertices are distinct (no vertex is repeated).

Definition of cycle: It is a walk where first vertex = last vertex (otherwise the graph is acyclic).

Definition of bipartite graph: A bipartite graph G = (V, E) is an undirected graph in which the set V can be
partitioned into two sets, V 1 and V 2, such that for every (u, v) ∈ E we have that u ∈ V1 and v ∈ V2 , or viceversa.

How can a graph be represented? We can represent a graph by an adjacency matrix:


(
1 if there is an edge from vi to vj ,
Aij =
0 otherwise.

The matrix takes up O(n2 ) space, which is wasteful if the graph does not have very many edges.
An alternative representation, with size proportional to the number of edges, is the adjacency list, which consists of
|V | linked lists, one per vertex, where the total size of the data structure is O(|V | + |E|).
The linked list for vertex u holds the names of vertices to which u has an outgoing edge, so each edge appears in exactly
one of the linked lists if the graph is directed, or in two of the lists if the graph is undirected.

3.1 Depth-first search in undirected graphs


Definition of undirected graph: In an undirected graph the edges do not have a direction associated with them,
therefore, if there is an edge connecting vertex A to vertex B, then there is also an edge connecting B to A.
An undirected graph is connected if every vertex is reachable from all other vertices.
A B

The question from which we start is “What parts of the graph are reachable from a given vertex?”
The reachability problem is rather like exploring a labyrinth, and everybody knows that to explore a labyrinth all you
need is a ball of string and a piece of chalk.
The chalk helps you remember which paths you’ve already visited, so you don’t get lost in loops, while the string
ensures you can always find your way back to the starting point to explore passages you may have missed.

6
How can we simulate these two primitives, chalk and string, on a computer?
The chalk marks are easy, for each vertex we maintain a Boolean variable indicating whether it has been visited already,
on the other hand for the ball of string we can use a stack (pushing a new vertex onto the stack or popping it).”

Setup:
for all v ∈ V
visited[v] = False
pre[v] = None (Time we enter the node)
post[v] = None (Time we exit the node)

Algorithm:
Input: G = (V, E) is a graph, v ∈ V
Output: visited(u) is set to True for all nodes u reachable from v.
explore (G, v)
visited[v] = True
pre[v]
for all (u, v) ∈ E
if not visited[u]
explore (G, u) (We recursively call the function explore)
post[v]

Theorem: The command explore (G, v) visits all vertices u reachable from v if and only if visited[u] is set to True
after completing explore (G, v).
Proof:
→ We’ve started with all the nodes set to False, the only way to a node to ged visited is to get explored.
The only way visited[u] is set to True is that u is part of an exploration of some node w.
If w is v then we are done, if w ̸= v then visited[w] is set to True and we repeat the same reasoning from w
(we’re sure it’s gonna stop because there is a finite number of nodes).

← By contradiction assume that u is reachable from v but there is a run of explore that misses u.
We choose any path from v to u, and look at the last vertex on that path that the procedure actually visited.
Call this node z, and let w be the node immediately after it on the same path.
v z w u

So z was visited but w was not, and this is a contradiction: while the explore procedure was at node z,
it would have noticed w and moved on to it.

Note that: However, the explore procedure visits only the portion of the graph reachable from its starting point.
To examine the rest of the graph, we need to restart the procedure at some vertex that has not yet been visited.

DFS Algorithm:
for all v ∈ V
visited[v] = False
for all v ∈ V (It does this repeatedly until the entire graph has been traversed)
if not visited[v]
explore (G, v)

Running time of DFS: The calling for explore (G, v) is done exactly once, but each edge is examined twice, from
one vertex’s POV and from the other, so the running time is proportional to the size of E, and we have O(|V | + |E|).

7
Connectivity in undirected graphs
In the case of unconnected graphs, if we want to count the connected components (CC), how can we do it?
for all v ∈ V
if not visited[v]
CC = CC+1
CCnum[v] = CC (To assign each node v an integer identifying the connected component to which it belongs)
explore (G, v)

B D

A E C G F

I H L

J K

In this example the connected components (CC) are: {A, B, E, I, J} , {C, D, G, H, K, L} , {F }.


When explore is started at a particular vertex, it identifies precisely the connected component containing that vertex,
and each time the DFS outer loop calls explore, a new connected component is picked out.

Theorem: Previsit and postvisit orderings


∀ node u and v, the intervals {pre[u], post[u]} and {pre[v], post[v]} are either disjoint or one is contain within the other.

Proof:
Assume pre[u] < pre[v], we have to distinguish between two cases:
1) pre[v] < post[u] means that the exploration of v started before the exploration of u was over, so we can be in one
of the two situations, pre[u] < pre[v] < post[v] < post[u] (the interval of v is entirely contained in the interval of u) ,
or pre[u] < pre[v] < post[u] < post[v] (the interval of v is in part contained in the interval of u).
2) pre[v] > post[u] , this is the case where the intervals are disjoint.

3.2 Depth-first search in directed graphs


Definition of directed graph: In a directed graph, each edge has a direction associated with it, therefore, an edge
connecting vertex A to vertex B does not necessarily imply the existence of an edge from vertex B to vertex A.

Tree edges are actually part of the DFS forest.


Forward edges lead from a node to a nonchild descendant in the DFS tree.
Back edges lead from a node to an ancestor in the DFS tree.
Cross edges lead to neither descendant nor ancestor.

Ancestor and descendant relationships, as well as edge types, can be read off directly from pre and post numbers.
Vertex u is an ancestor of vertex v exactly in those cases where u is discovered first and v is discovered during explore[u],
equivalent to say that pre[u] < pre[v] < post[v] < post[u] → Tree/Forward edges.

8
Theorem:
A directed graph has a cycle if and only if its depth-first search reveals a back edge.
Proof:
← If (u, v) is a back edge it means that u is a descendant of v, therefore in the original graph there must exist a path
from v to u, and so there is a cycle consisting of the back edge together with the path from v to u in the search tree.
→ Conversely, if the graph has a cycle v0 → v1 → ... → vk → v0 , we look at the first node on this cycle to be discovered.
Suppose it is v0 , then we know that all other vertices on the cycle are reachable from it, and they will be its descendants.
In particular we will have the edge vk → v0 , which leads from a node to its ancestor, and therefore it is a back edge.

Definition of topological sort: Let G = (V, E) be a directed graph, we say that a list L of the vertices is a
topological sort of G if for every edge (u, v) ∈ E, the vertex u comes before the vertex v in L.

Theorem:
If a directed graph G = (V, E) contains a cycle, then G cannot have a topological sort.
Proof:
By contradiction suppose that G = (V, E) is a directed graph with a cycle, and that L is a topological sort of G.
Let v be the first vertex of the cycle to appear in L, but since v has a predecessor u in the cycle, such that (u, v) ∈ E,
we have a contradiction because (u, v) is an edge and u comes after v in L.

Theorem:
If a directed graph G = (V, E) does not contains a cycle, then G has a topological sort.
Proof:
We describe an algorithm for finding a topological sort, and we will argue that it works on all graphs that are acyclic.

L = empty list
while V is not empty
let v be a vertex with zero incoming edges
L=L+v
remove v from V, and remove all the edges incident on v from E
return L

If a vertex has no incoming edges, then it is correct to put it at the beginning of a topological sort.
Then we can remove the vertex and the edges incident on it from the graph, and continue with the rest of the graph.
If the algorithm terminates, then it does terminate with a valid topological sort, the only way the algorithm fails to
terminate is that every vertex in the graph has at least one incoming edge.

Lemma: (It is a consequence of the previous theorem)


Let G = (V, E) be a directed acyclic graph (DAG). If we run the DFS on G, then for every edge (u, v) ∈ E we have
that the call to explore (G, u) terminates after the call to explore (G, v).
The only edges (u, v) in a graph for which post[u] < post[v] are back edges, and we cannot have them in a DAG.

Explanation: Every edge leads to a vertex with a lower post number, indeed for every (u, v) ∈ E we have
post[v] < post[u], and at the end of the DFS L is a topological sort of G.

explore (G, u)
visited[u] = True
for all (u, v) ∈ E
if not visited[v]
explore (G, v)
L=u+L

9
Lemma: If G = (V, E) is a directed acyclic graph, then G has at least one a vertex with no incoming edges, and at
least one vertex with no outgoing edges.
Proof:
By contradiction we assume that G = (V, E) is a directed acyclic graph that does not have at least one vertex with no
incoming edges, and does not have at least one vertex with no outgoing edges.
1) If there is no vertex with no incoming edges, it means that every vertex has at least one incoming edge.
2) If there is no vertex with no outgoing edges, it implies that every vertex in the graph has at least one outgoing edge.
We start at any arbitrary vertex in the graph, and we follow an incoming/outgoing edge to another vertex (iteratively).
Since there are no cycles in the DAG, this process must eventually terminate, however, if every vertex has at least one
incoming edge, we can continue this process indefinitely (contradiction).

3.3 Strongly connected components


Definition of strongly connected components: A graph is strongly connected if every two vertices are reachable
from each other, that means there exists a path from u to v and a path from v to u.

Definition of reverse: Given a directed graph G = (V, E), then the graph GR = (V, E R ), where (u, v) ∈ E R if and
only if (v, u) ∈ E, is called the reverse of G.
Note that GR and G have the same strongly connected components (SCCS ).

Theorem: Let be C and C ′ be distinct SCCS in a directed graph G = (V, E), and let u, v ∈ C and u′ , v ′ ∈ C ′ .
If there is a path u → u′ , then G cannot contain also a path v ′ → v.
Consequence: The strongly component graph is a DAG.

Theorem: Let be C and C ′ be distinct SCCS in a directed graph G = (V, E), if there is an edge from a node
in C to a node in C ′ , then the highest post number in C is bigger than the highest post number in C ′ .

Remark: The node with the highest post number is a source (a node with no incoming edges).
Proof: There are two cases to consider:
If the depth-first search visits component C before component C ′ , then clearly the first node visited in C will have a
higher post number than any node in C ′ .
On the other hand, if C ′ gets visited first, then the depth-first search will get stuck after seeing all of C ′ but before
seeing any of C, in which case the property follows immediately.

10
Algorithm: (How to find the SCCS in a directed graph)
1. Run depth-first search on GR , and process the vertices in decreasing order of their post numbers.
2. Run the undirected connected components algorithm on G.

4 Paths in graphs
Depth-first search identifies all the vertices of a graph that can be reached from a designated starting point, and it also
finds explicit paths to these vertices, summarized in its search tree.
However, these paths might not be the most economical ones possible.

Definition of shortest path: Let G = (V, E) be a graph and s ∈ V be a start vertex. The shortest path distance
d(v) from s to v is the minimum number of edges in any path from s to v , if there is no path d(v)=∞.

4.1 Breadth-first search


The idea behind is to compute distances, from a starting point S, to the other vertices, proceeding layer by layer: Once
we have picked out the nodes at distance 0, 1, 2, ..., d , the ones at d + 1 are precisely the as-yet-unseen nodes adjacent
to the layer at distance d.
This suggests an iterative algorithm in which two layers are active at any given time: some layer d, which has been
fully identified, and d + 1, which is being discovered by scanning the neighbors of layer d.

Algorithm:
for all u ∈ E:
dist(u) = ∞
dist(s) = 0
Q = [s]
while Q is not empty:
u = eject(Q) (Take one element from the queue)
for all (u, v) ∈ E: (We explore its neighbours)
if dist(v) = ∞:
dist(v) = dist(u) +1 (We update the distance of v)
inject(Q, v) (We put v in the queue)

Explanation: Initially the queue Q consists only of s, the one node at distance 0.
Then, there is a point in time at which Q contains all the nodes at distance d and nothing else, and as these nodes are
processed (ejected off the front of the queue), their as-yet-unseen neighbors are injected into the end of the queue.
Unlike the DFS tree we saw earlier, the BSF tree has the property that all its paths from s are the shortest possible,
it is therefore a shortest-path tree.

Correctness and efficiency


For each d = 0, 1, 2,... , there is a moment at which (1) all nodes at distance ≤ d from s have their distances correctly
set; (2) all other nodes have their distances set to ∞ and (3) the queue contains exactly the nodes at distance d.

Proof: (By induction)


Base Case: For d = 0, the starting node s has distance 0 from itself, so it satisfies all three conditions:
(1) The distance of the starting node s from itself is correctly set to 0. (2) All other nodes have their distances set to
∞ as they are not reachable yet. (3)The queue contains only the starting node s, which is at distance 0.

11
Inductive step:
When the algorithm processes nodes at distance k, it adds all nodes adjacent to nodes at distance k to the queue.
Since all nodes at distance k have their distances correctly set, and all other nodes have distances set to ∞, the algo-
rithm maintains this property by moving outward from the initial node s.
Therefore, it accurately sets the distances for nodes at distance k + 1, and after processing all nodes at distance k and
injecting their neighbors, the queue will contain exactly the nodes at distance k + 1.

Conclusion: By induction, we have shown that if the BFS algorithm correctly sets the distances for all nodes at dis-
tance k from the starting node s, then it also correctly sets the distances for nodes at distance k + 1.

Running time of BFS: For each vertex v, the BFS explores all its adjacent vertices, so in the worst case, each edge
will be traversed exactly twice (total time is O(|E|)).
BFS uses a queue to keep track of vertices, and for each vertex, it performs enqueue and dequeue operations, so each
vertex will be enqueued and dequeued once (total time is O(|V |)).
The overall running time is O(|V | + |E|).

4.2 Dijkstra’s algorithm


Breadth-first search finds shortest paths in any graph whose edges have unit length,
but can we adapt it to a more general graph G = (V, E) whose edge lengths le are positive integers?

Algorithm:
for all v ∈ V :
dist(v) = ∞
π(v) = ∅
d(s) = 0
Q = [V ] (Priority queue with all the vertices v ∈ V )
while Q is not empty:
u = remove min from Q
for all (u, v) ∈ E:
if d(v) > d(u) + l(u, v): (You can reach u and then go to v because it is cheaper)
d(v) = d(u) + l(u, v) (Update the distance of v with thw cheaper way we’ve just found)
π(v) = u
update Q

Running time of Dijkstra’s algorithm: Dijkstra’s algorithm is structurally identical to the Breadth-first search
algorithm, however, it is slower because of the priority queue (usually implemented by a binary heap).

Operation N Binary Heap Total


Initialize Q 1 O(|V | log V ) O(|V | log V )
Check Q empty |V | O(1) O(|V |)
Find min / Remove |V | O(log V ) O(|V | log V )
Update |E| O(log V ) O(|E| log V )

Therefore, the overall running time is O((|V | + |E|)logV ).

Note that: Dijkstra’s algorithm does not work if in the graph there are negative cycles, we would reach an impasse;
on the other hand also negative edges create problems, we can find a solution but most of the time it is wrong.

12
4.3 Priority queue implementations
Array: The simplest implementation of a priority queue is as an unordered array of key values for all potential ele-
ments (the vertices of the graph, in the case of Dijkstra’s algorithm).
Binary heap: Here elements are stored in a full binary tree, in which each level is filled in from left to right, and
must be full before the next level is started (except possibly for the leaves in the bottom layer).
Min heap: It is a binary heap with a special ordering constraint, indeed the key value of any node of the tree is less
than or equal to that of its children, therefore, the root always contains the smallest element.

Operations: Remove min Insert Decrease key


Array O(|V |) O(1) O(1)
Sorted list O(1) O(|V |) O(|V |)
Binary heap O(log V ) O(log V ) O(log V )

- For an array (unsorted list), insert/decreasekey is fast, because it just involves adjusting a key value, so it is an O(1)
operation. To deletemin requires a linear-time scan of the list, and so it takes O(|V |).

- For a binary heap the situation is totally different, to insert we place the new element at the bottom of the tree (in
the first available position), and let it “bubble up”: if it is smaller than its parent, swap the two and repeat.
The number of swaps is at most the height of the tree, which is log2 n when there are n elements, so it takes O(log n).
A decreasekey is similar, except that the element is already in the tree, again it takes O(log n).
To deletemin we return the root value and remove it from the heap, then we take the last node in the tree and place it
at the root, and if it is bigger than either child, swap it with the smaller child and repeat, again this takes O(log n) time.

5 Greedy algorithms
Definition of greedy algorithm: A greedy algorithm is an approach to problem-solving that makes locally optimal
choices at each step, with the hope of finding a global optimum.

5.1 Minimum spanning tree


Definition: Let G = (V, E) be an undirected connected graph, a minimum spanning tree is a subset S ⊆ E,
whose total cost is as small as possible, and suffices to connect all the vertices.
Remark: The MST is a connected path in which there is no cycle: in any cycle there are two ways to reach one vertex,
and this is unnecessary since it would increase the cost.

Kruskal’s algorithm: It constructs the tree edge by edge, simply picks whichever edge is cheapest at the moment.
S=∅
sort the edges by their cost c(u, v) in ascending order
for all (u, v) ∈ E
if e connects two previously unconnected vertices
S = S ∪ {e} (If adding the edge does not create cycles, we add it)
return S (The resulting set S is an MST)

Theorem: Any optimal solution to the problem of finding the MST is a tree (undirected, connected and acyclic).
Proof: If a connected undirected graph contains a cycle, then one edge can be removed and the graph is still connected.
If an optimal solution contains a cycle, then we can remove any one edge from it and obtain a new MST whose cost is
less then the previous one, but this is a contradiction → the previous optimal solution was not optimal.

13
Some properties:
1. Removing a cycle edge cannot disconnect a graph.
2. A tree with n nodes has n − 1 edges.
3. Any connected, undirected graph G = (V, E) , with |E| = |V | − 1 is a tree.
4. An undirected graph is a tree iff there is a unique path between any pair of nodes (in other words if it is acyclic).

Remark: If we add an edge e to an MST T , since T is connected, it already has a path between the endpoints of e,
therefore adding e would create a cycle.

Prim’s algorithm:
for all u ∈ V
c(u) = ∞
π(u) = ∅
c(s) = 0
Q = [V ]
while Q is not empty
u = remove min from Q
for each neighbor v of u:
if c(v) ≥ w(u, v) and v ∈ Q
c(v) = w(u, v)
π(v) = u
update Q

Remark: The difference between Kruskal and Prim’s algorithms lies in their approach to finding the MST:
Kruskal’s algorithm considers the entire graph, instead Prim’s algorithm starts from a specific node.

5.2 Huffman encoding


Huffman encoding is widely used for data compression. Objective: we want to find a prefix-free way
Pnof mapping each
symbol v ∈ V to a bit string C(v) of length l(v), such that we minimize the total encoding length i=1 f (v) · l(v) .

Huffman tree: It is a binary tree based on the frequency f of the symbols, where each leaf node represents a symbol,
and the path from the root to the leaf represents the binary code for that symbol.
The more frequently occurring symbols have shorter binary codes, while less frequent symbols have longer binary codes.

Algorithm: Explanation:
Input: An array f [1 . . . n] of frequencies Create a priority queue H, ordered by the frequencies f ,
Output: An encoding tree with n leaves and for each symbol i, from 1 to n, insert i into H.
Let H be a priority queue of integers, ordered by f The priority queue initially contains individual symbols
for i = 1 to n: with their corresponding frequencies.
insert(H, i) From iteration k = n + 1 to 2n − 1, perform the following steps:
for k = n + 1 to 2n − 1: - Extract the two symbols, i and j, with the lowest frequencies
i = deletemin(H), j = deletemin(H) from the priority queue.
Create a node numbered k with children i, j - Create a new node numbered k with children i and j, and set
f [k] = f [i] + f [j] the frequency k to be the sum of the frequencies of nodes i and j.
insert(H, k) - Insert the newly created node k back into the priority queue H.
After 2n − 1 iterations, H will contain only one element,
which is the root of the Huffman encoding tree.

Note that Huffman’s algorithm is a greedy algorithm since it merges the two least frequent elements recursively.

14
Example: We start from five letters and their frequency, and we apply the Huffman procedure.
Given an “alphabet” of symbols V , and a positive integer frequency f (v), we want to find a prefix-free way of mapping
each symbol v ∈ V , to a bit string C(v) of length l(v), in such a way that we minimize the total encoding length.

a, 49 b, 105 c, 33 d, 48 e, 50

a, 49 b, 105 cd, 81 e, 50

ae, 99 b, 105 cd, 81

ae, 99 b, 105 cd, 81

aecd,
b, 105
180

a e c d

Letter: Code: Frequency: Bits: Optimal solution:


a 000 49 3 147
b 1 105 1 105
c 010 33 3 99
d 011 48 3 144
e 001 50 3 150
645

The prefix-free property: It says that no codeword can be a prefix of another codeword, in other words, for any
pair of distinct codewords ci and cj in the code, neither ci or cj is a prefix of the other.
A bit string s, is said to be a prefix of a bit string s′ , if s′ is obtained by adding one, or more bits, at the end of s
(for example 010 is a prefix of 01001).
When constructing a Huffman tree, the prefix-free property naturally emerges as a result of the algorithm’s design.

15
5.3 Union-Find data structure
We can apply the Union-Find data structure the Kruskal’s algorithm.

Algorithm: Explanation:
U = Union-Find data structure ∗1 We create a Union-Find data structure U with all the vertices,
S=∅ and each vertex is initially in its own disjoint set.
sort the edges E by weight Initialize an empty set S to store the edges of the MST.
for all (u, v) ∈ E Then for each edge (u, v), in the sorted order, check if the endpoints
if Find(u) ̸= Find(v) ∗2 u and v belong to the same disjoint set in the Union-Find data structure.
Union(u, v) ∗3
If they belong to different disjoint sets, it means that adding this
S = S ∪ {(u, v)}
edge won’t create a cycle, so we perform a Union operation in the
Union-Find data structure to merge the sets containing u and v.
Add the edge (u, v) to the set S.
∗1 for all x ∈ V :
Makeset(x) , creates a singleton set containing just x.
∗2 Find(x): to which set does x belong?
∗3 Union(x, y): merge the sets containing x and y.

Union by rank
One way to store a set is as a directed tree, where the nodes of the tree are elements of the set, arranged in no particular
order, and each has parent pointers that eventually lead up to the root of the tree (a convenient representative).
Merging two sets is easy, we make the root of one point to the root of the other, but we have a choice here: if the
representatives roots of the sets are rx and ry, do we make rx point to ry or the other way around?
Since tree height is the main impediment to computational efficiency, a good strategy is to make the root of the shorter
tree point to the root of the taller one, this way, the overall height increases only if the trees merged are equally tall.

Procedure:
rx = find(x)
ry = find(y)
if rx = ry : return
if rank(rx ) > rank(ry ):
π(ry ) = rx
else:
π(rx ) = ry
if rank(rx ) = rank(ry ) :
rank(ry ) = rank(ry ) + 1

Note that, by design, the rank of a node is exactly the height of the subtree rooted at that node.

Path compression:
Path compression is an optimization technique used in the Union-Find data structure, particularly during the Find
operation, and its purpose is to reduce the height of the trees in the data structure.
In particular, when the Find operation is performed on a node to determine its representative (root) in the disjoint
set, it follows the parent pointers from the node upward until it reaches the root of the tree.
During this process, path compression modifies the parent pointers along the path, instead of simply finding the root,
each node encountered along the path is directly linked to the root.
After path compression, the path from any node to the root becomes much shorter (the ranks are unchanged).

16

You might also like