End Sem
End Sem
1
Example: A Scheduling Problem
Now if 𝑠𝑠2 co-exist with all others that follow it, 𝑔𝑔2 with
earlier or same end time can also!
So S ′ = (𝑔𝑔1 , 𝑔𝑔2 , s3 , s4 , … ) is valid (optimal).
Like this we can obtain more optimal solutions and end with
the greedy solution without ever loosing optimality.
Formal proof by contradiction:
Suppose 𝐺𝐺 is NOT optimal.
Pick an optimal 𝑆𝑆 which is closest (max |𝑆𝑆 ∩ 𝐺𝐺|) to 𝐺𝐺.
𝑆𝑆 ≠ 𝐺𝐺
We can modify 𝑆𝑆 to 𝑆𝑆𝑆 so that:
𝑆𝑆𝑆 is also optimal & 𝑆𝑆 ′ ∩ 𝐺𝐺 ≥ 𝑆𝑆 ∩ 𝐺𝐺
4 Contradiction!
Set Cover
𝐼𝐼 = 𝑆𝑆𝑖𝑖 , 𝑖𝑖 = 1: 𝑚𝑚 & ∪𝑘𝑘=1:𝑚𝑚 𝑆𝑆𝑘𝑘 = 𝐵𝐵, 𝑆𝑆𝑘𝑘 ≠ {𝜙𝜙}
Minimize 𝐼𝐼𝑠𝑠 𝑜𝑜𝑜𝑜 |𝐽𝐽| ≤ 𝑚𝑚 , 𝐼𝐼𝑠𝑠 = 𝑆𝑆𝑖𝑖 ,∪𝑖𝑖 𝑆𝑆𝑖𝑖 = 𝐵𝐵 , 𝐽𝐽 = {𝑖𝑖,∪𝑖𝑖 𝑆𝑆𝑖𝑖 = 𝐵𝐵}
Example: 𝐵𝐵 is a universal set of topics covered by 𝑚𝑚 available ALGO
courses. 𝑆𝑆𝑗𝑗 is the set of topics covered by the 𝑗𝑗𝑡𝑡𝑡 ALGO course. Taking
minimum number of which ALGO courses will allow us to learn all
the topics being covered by all the 𝒎𝒎 of them?
Courses
More
than 1
optimal! Topics
5
A graph? (number of points (n) = number of sets (m))
Example: 𝐵𝐵 is a universal set of 𝑛𝑛 towns. 𝑆𝑆𝑗𝑗 is the set of towns (that will be)
covered by the school of 𝑗𝑗𝑡𝑡𝑡 town (𝑚𝑚 = 𝑛𝑛, symmetric, not transitive). Which
minimum number of schools can cover all the 𝑛𝑛 towns. [Vertex Cover]
A school in a town will cover another adjacent (edge) town
6
Algo Skeleton:
1. Set 𝐽𝐽 = 𝜙𝜙
2. While ∪𝑖𝑖∈𝐽𝐽 𝑆𝑆𝑖𝑖 ≠ 𝐵𝐵: Pick 𝑖𝑖 ∉ 𝐽𝐽 (𝑆𝑆𝑖𝑖 ∈ 𝐼𝐼\J) and add to 𝐽𝐽
3. Return 𝐽𝐽
Strategy?
Chose the largest uncovered set? max |𝑆𝑆𝑖𝑖 \∪𝑘𝑘∈𝐽𝐽 𝑆𝑆𝑘𝑘 |
𝑖𝑖∉𝐽𝐽
1
How close to optimal?
|𝐽𝐽∗ | |𝐽𝐽𝐺𝐺 | ? 𝑚𝑚
Small Large
Upper
bound
𝑛𝑛𝑡𝑡 → No. of uncovered points after 𝑡𝑡 iterations
So 𝑛𝑛0 = 𝐵𝐵 (𝑛𝑛) 𝑛𝑛0 > 𝑛𝑛1 > 𝑛𝑛2 > ⋯ > 𝑛𝑛 𝑇𝑇
After some finite (T) iterations, we hope to get this small enough!
𝑛𝑛𝑡𝑡 � <1
𝑡𝑡=𝑇𝑇
𝑛𝑛𝑡𝑡
Property: 𝑛𝑛𝑡𝑡+1 ≤ 𝑛𝑛𝑡𝑡 −
|𝐽𝐽∗ |
𝑛𝑛𝑡𝑡
In the (𝑡𝑡 + 1)𝑡𝑡𝑡 iteration, at least points are covered by
|𝐽𝐽∗ |
the strategy of choosing the largest uncovered set
2
Proof: 𝐵𝐵
Covered 𝑛𝑛𝑡𝑡
|𝐽𝐽∗ | many sets cover the entire 𝐵𝐵, that is, the 𝑛𝑛 elements
Therefore, one set among the |𝐽𝐽∗ | must have at least
𝑛𝑛/|𝐽𝐽∗ | elements!
As at most all 𝐽𝐽∗ sets cover the 𝑛𝑛𝑡𝑡 elements, one set
must have at least 𝑛𝑛𝑡𝑡 /|𝐽𝐽∗ | of the 𝑛𝑛𝑡𝑡 elements!
1
Minimum Spanning Trees (MST)
Without
compromising
connectivity!
Observations:
2
No 2 different
paths between
same nodes.
7
Greedy Algorithms
1
Proof:
Pick any MST: 𝑇𝑇
Assume that 𝑒𝑒 ∉ 𝑇𝑇
𝑋𝑋
But the MST must cross the cut! 𝑋𝑋
So 𝑇𝑇𝑇 is a tree!
And 𝑇𝑇 ′ is an MST!
The approach:
1. Sort the edges by weight (in increasing order)
2. ∀𝑒𝑒 in this order, if 𝑒𝑒 does not create a cycle add it to 𝑋𝑋
(start with 0 edges)
3. Output 𝑇𝑇 = 𝑋𝑋
Correctness:
First step:
∅ ∪ 𝑒𝑒1 →The cheapest edge is part of some MST!
𝑉𝑉 − 1 Can connect two already
Inductive step (𝑖𝑖 𝑡𝑡𝑡 step): formed CCs (or) one CC and
one singleton (or) two
𝑋𝑋 ∪ 𝑒𝑒𝑖𝑖 = 𝑋𝑋 ′ → 𝑋𝑋 singletons. Will never connect
Can represent more than one 2 nodes in 1 CC (cycle!).
connected components (CC) on Therefore, 𝑒𝑒𝑖𝑖 goes across cut
either sides of the cut & 𝑒𝑒𝑖𝑖 is the cheapest outside 𝑋𝑋
5 that does not create a cycle.
Time complexity: Step 1: 𝑂𝑂( 𝐸𝐸 log 𝐸𝐸 )
Step 2: 𝑂𝑂( 𝐸𝐸 ( 𝑉𝑉 + 𝐸𝐸 ))
But the DFS is possibly doing a lot of redundant things every time it
is run, as the graph only changes by an edge every iteration!
A way out to use a special data structure.
Union Find
6
Faster Kruskal: # of 𝐶𝐶𝐶𝐶𝐶𝐶 at the start is 𝑉𝑉
𝑂𝑂(|𝑉𝑉|)
𝑂𝑂(log |𝑉𝑉|)
In Prim’s
- 𝑋𝑋 always forms a subtree of an MST.
- 𝑆𝑆 is chosen as the set of 𝑋𝑋’s vertices.
- 𝑋𝑋 grows into 𝑋𝑋 ← 𝑋𝑋 ∪ {𝑒𝑒}, where 𝑒𝑒 is the lightest
edge between a node in 𝑆𝑆 and a node outside 𝑆𝑆.
Equivalently: 𝑆𝑆 grows by including the vertex 𝑣𝑣 ∉ 𝑆𝑆:
9
Greedy Algorithms
1
Very similar in procedure
(not purpose) to Dijkstra’s!
2
𝑂𝑂(𝑛𝑛)
𝑂𝑂(𝑛𝑛)
Edge weights!
An example: 𝑶𝑶(𝒏𝒏𝟐𝟐 )
3
MST growth:
http://cs.brown.edu/research/pubs/pdfs/1995/Karger-1995-RLT.pdf
http://csclub.uwaterloo.ca/~gzsong/papers/Trans-
dichotomous%20Algorithms%20for%20Minimum%20Spanning%20Tr
ees%20and%20Shortest%20Paths.pdf
---
4
Huffman Encoding
Sampling and Quantization leads to:
String of length 𝑇𝑇 of alphabet ∈ Γ
260 Megabits!
00, 01, 10, 11
In the sequence at hand: All same length
5
But decoding?
1
Union Find
2
Every node also has a rank value.
Let us consider that as the height of the subtree hanging from
that node.
𝑂𝑂(1)
Every node is a singleton!
Arrays
5
Property: For any 𝑥𝑥, 𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟 𝑥𝑥 ≤ 𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟(𝜋𝜋 𝑥𝑥 )
By definition, when we move up a path in a tree toward a root
node, the rank values along the way are strictly increasing.
Property: Any root node of rank 𝑘𝑘 has at least 2𝑘𝑘 nodes in its tree
- As we start from singletons, union creates the further trees.
- In the union function, root rank increases only when rank of the
merging trees are same.
- A root node with rank 𝑘𝑘 is created by merging two trees with roots of
rank 𝑘𝑘 − 1.
- By induction:
Two 0 rank (singletons) produce one 1 rank (# nodes = 2)
Two 1 ranks produce one 2 rank (min # nodes = 4)
Two 2 ranks produce one 3 rank (min # nodes = 8)
Two 𝑘𝑘 − 1 ranks produce one 𝑘𝑘 rank (min # nodes = 2𝑘𝑘 (2𝑘𝑘−1 +2𝑘𝑘−1 ))
* one 𝑘𝑘 rank and one < 𝑘𝑘 rank can produce one 𝑘𝑘 rank
So # nodes > 2𝑘𝑘 (2𝑘𝑘 +…)
6
- Different rank 𝑘𝑘 nodes do not have common descendants! (by
Property 1 as well)
Therefore:
Property: If there are 𝑛𝑛 elements overall, there can be at most
𝑛𝑛/2𝑘𝑘 nodes of rank 𝑘𝑘.
Therefore, maximum value of 𝑘𝑘 (rank) = log 2 𝑛𝑛 , “height” ≤ log 𝑛𝑛.
7
Now, if we assume that the edges available are already sorted, then
can we make find and union even faster?
1
Consider Fibonnaci That’s Dynamic Programming!
Iterative
Bottom-up
1
Shortest Path in DAG (from starting node)
Example:
Till the
node
Bookkeeping
needed!
to get path
2 Iteration?!
S, C, A, B, D, E – 0, 1, 2, 3, 4, 5 Need implicit DAG to
avoid infinite recursion
function Sspdag(𝑛𝑛)
S / 0 – starting node create an array dist 0 … … 𝑉𝑉 − 1 with all values ∞
return Sspdagmemo(𝑛𝑛, 𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑)
Careful: m & n do not denote
# of edges & nodes here
function Sspdagmemo(𝑛𝑛, 𝑑𝑑)
if 𝑛𝑛 = 0: return 0
Can use the elif 𝑑𝑑 𝑛𝑛 ≠ ∞: return 𝑑𝑑[𝑛𝑛] #comment & see
adjacency list of elif in-degree(𝑛𝑛) = 0: return ∞
the reverse else:
graph
∀ 𝑚𝑚, 𝑛𝑛 :
𝑑𝑑 𝑛𝑛 = min(𝑑𝑑 𝑛𝑛 , 𝑙𝑙(𝑚𝑚, 𝑛𝑛)+ Sspdagmemo(𝑚𝑚, 𝑑𝑑))
return 𝑑𝑑 𝑛𝑛
Complexity? Topological sort – linear, Reverse graph – linear so In-degree –
linear, edge presence – linear PLUS loop over all edges – linear
(lookup, add, min)
Time - 𝑉𝑉 + |𝐸𝐸|, memory for memoization: 𝑉𝑉
3
Dynamic programming is a very powerful algorithmic paradigm in
which a problem is solved by identifying a collection of subproblems
and tackling them one by one, smallest first, using the answers to
small problems to help figure out larger ones, until the whole lot of
them is solved.
Conceptually DAG is implicit
The path…
function Sspdagmemo(𝑛𝑛, 𝑑𝑑, 𝑝𝑝)
if 𝑛𝑛 = 0: return 0, [𝑛𝑛]
function Sspdag(𝑛𝑛) elif 𝑑𝑑 𝑛𝑛 ≠ ∞: return 𝑑𝑑 𝑛𝑛 , 𝑝𝑝[𝑛𝑛]
create an array dist 0 … 𝑉𝑉 − 1 with all values ∞ elif in-degree(𝑛𝑛) = 0: return ∞, [ ]
create a list prev 0 … 𝑉𝑉 − 1 with all empty lists [ ] else:
𝐷𝐷, 𝑃𝑃 = Sspdagmemo(𝑛𝑛, 𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑, 𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝) ∀ 𝑚𝑚, 𝑛𝑛 :
return 𝐷𝐷, 𝑃𝑃 δ, ρ = Sspdagmemo(𝑚𝑚, 𝑑𝑑, 𝑝𝑝)
𝑣𝑣 = 𝑙𝑙 𝑚𝑚, 𝑛𝑛 + 𝛿𝛿
𝑑𝑑 𝑛𝑛 = min(𝑑𝑑 𝑛𝑛 , 𝑣𝑣)
We can also find: if 𝑑𝑑 𝑛𝑛 == 𝑣𝑣:
Longest path, 𝑝𝑝 𝑛𝑛 = 𝑝𝑝 𝑚𝑚 . 𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎(𝑚𝑚)
smallest product return 𝑑𝑑 𝑛𝑛 , 𝑝𝑝[𝑛𝑛]
(instead of sum)
4
Longest Increasing Subsequence
Sequence
Subsequence
Increasing subsequence:
𝑎𝑎𝑖𝑖1 < 𝑎𝑎𝑖𝑖2 < ⋯ < 𝑎𝑎𝑖𝑖𝑘𝑘
Find:
Example..
5
Lets build a graph:
1
Worst implementation:
Redundant computations! You need to compute again & again the same stuff!
Operations allowed:
Insert char into 𝑥𝑥, Delete char from 𝑥𝑥, substitute a char for another in 𝑥𝑥
A DP solution
3
The following edits are possible among the right most entities to match
the strings: Del Ins Sub
Can record
4 the operation c>1
0,0
Base cases:
Top down
Start from 𝐸𝐸(𝑚𝑚, 𝑛𝑛) and recursively call functions
giving ‘top’, ‘left’ and ‘top-left’ values. [very similar]
5
Run-time complexity:
1
Knapsack
Input: List of 𝑛𝑛 items with weights 𝑤𝑤𝑖𝑖 & value 𝑣𝑣𝑖𝑖 and a
knapsack with capacity 𝑊𝑊
Task: Pack items in the knapsack maximizing the total value
3
Algorithm:
Bookkeeping
Memory: 𝑂𝑂(𝑊𝑊)
Knapsack in this form is just a variant of finding the longest path in DAG!
4
Solution:
2D table: 𝑛𝑛 + 1 × 𝑊𝑊 + 1
1
Chain Matrix Multiplication
2
The Greedy approach of smallest cost first does not work!
Consider:
Sol:
3
𝑂𝑂 𝑛𝑛2 storing 𝑖𝑖 ≤ 𝑘𝑘 ≤ 𝑗𝑗
Optimal:
Running complexity:
Goal: Starting from place 1, visit all the places exactly once by
travelling the minimum distance in total.
Bookkeeping
Each Linear
Memory: 𝑛𝑛2𝑛𝑛
6
Dynamic Programming
1
Floyd-Warshall Algorithm
Subproblem:
Lets start with shortest paths between all pairs not involving
any intermediate nodes, and grow from there considering 1
node in the graph at a time till we consider all.
Vertices:
3
Reliable Shortest Path
Shortest path algos do not consider the number of hops
(edges / nodes) in them. In some application more hops
may represent a higher chance of connection drop!
Given a starting node 𝑠𝑠, get the shortest path to any node 𝑣𝑣 that
uses only 𝑘𝑘 (or less) edges.
∀𝑣𝑣
Optimal:
1
Flow in Networks
Goal: A directed graph. Send as much quantity possible from
source s to sink t. There is a maximum capacity associated
with every edge in the graph.
* Feasible flow 2
Task: Find
Given a set of linear constraints and a linear
objective function to maximize!
4
Basically, apply a path finding algorithm (like DFS). Choose the
path and send quantity through the path as much allowed by its
bottleneck/s. Reduce the edge capacities by the corresponding
current flows (remove the bottleneck edges), and then again
repeat the above. Greedy, guaranteed to stop!
O(m(m+n)), But optimality is not guaranteed!
Start
End
For every edge (u,v) in a chosen s-t path of the flow at an iteration an
edge (v,u) is introduced with zero capacity & negative flow. Then, all the
positive and negative flows are subtracted from their corresponding
edge capacities to get the available capacities for the next iteration.
The network thus formed in every iteration is called the
residual network.
0 - (-fuv) 6
- If an edge already existed in the opposite direction, that can
easily be incorporated in the framework.
- The s-t path finding, say using linear-time DFS, happens on
the residual network.
1
Cut:
(s, t)-cut
An (s, t)-cut partitions the vertices into two disjoint
groups L and R such that s is in L and t is in R. Its
capacity is the total capacity of the edges from L to
R, and is an upper bound on any flow.
Tight upper bound: (7)
Loose upper bound: (19)
1
Edmond-Karp Approach Complexity:
𝑓𝑓 − A flow: 𝑢𝑢, 𝑣𝑣 critical
𝑓𝑓 ′ − A later flow: 𝑣𝑣, 𝑢𝑢 selected (first following 𝑓𝑓)
𝑑𝑑𝑓𝑓 𝑠𝑠, 𝑣𝑣 = 𝑑𝑑𝑓𝑓 𝑠𝑠, 𝑢𝑢 + 1; 𝑢𝑢, 𝑣𝑣 ∈ 𝐸𝐸, 𝑢𝑢, 𝑣𝑣 ∈ 𝑉𝑉
Shortest hop distances in the Bottleneck @ 𝑓𝑓
res. graph Any path: 2 x 1010 iterations
Introduces (if not already) 𝑣𝑣, 𝑢𝑢 ∈ 𝐸𝐸 Shortest path: 2 iterations
𝑑𝑑𝑓𝑓′ 𝑠𝑠, 𝑢𝑢 = 𝑑𝑑𝑓𝑓′ 𝑠𝑠, 𝑣𝑣 + 1; 𝑣𝑣, 𝑢𝑢 Chosen @ 𝑓𝑓′ Notes:
• Edge weights taken 1 in BFS
≥ 𝑑𝑑𝑓𝑓 𝑠𝑠, 𝑣𝑣 + 1 = 𝑑𝑑𝑓𝑓 𝑠𝑠, 𝑢𝑢 + 2 • Shortest path from s to t will
Maximum shortest path (no looping!) distance contain shortest path from s to
𝑛𝑛−2 all the intermediate nodes!
from s to any u≠t: 𝑛𝑛 − 2 → max times an • A shortest path distance for 𝒇𝒇 ≤
2
edge 𝑢𝑢, 𝑣𝑣 will be a bottleneck as 𝑑𝑑𝑓𝑓 𝑠𝑠, 𝑢𝑢 ≥ the same for 𝒇𝒇’
0. So, # of Iterations: O(mn) (someone becomes critical in each iteration) 2
Annexure: Proof of “ A shortest path distance for 𝒇𝒇 ≤ the same for 𝒇𝒇’ ”
𝑓𝑓 is obtained using the augmented path chosen that changes the 𝐺𝐺𝑓𝑓 to 𝐺𝐺𝑓𝑓1
Take the base case: 𝑑𝑑𝑓𝑓 𝑠𝑠, 𝑠𝑠 = 𝑑𝑑𝑓𝑓1 𝑠𝑠, 𝑠𝑠 = 0 satisfying
Let us assume there is a vertex 𝑣𝑣 for which 𝑑𝑑𝑓𝑓 𝑠𝑠, 𝑣𝑣 > 𝑑𝑑𝑓𝑓1 𝑠𝑠, 𝑣𝑣 [contradiction]
… with 𝑑𝑑𝑓𝑓1 𝑠𝑠, 𝑣𝑣 = 𝑘𝑘
In residual graph 𝐺𝐺𝑓𝑓1 take the 𝑢𝑢 so that: 𝑑𝑑𝑓𝑓1 𝑠𝑠, 𝑣𝑣 = 𝑑𝑑𝑓𝑓1 𝑠𝑠, 𝑢𝑢 + 1
[𝑢𝑢 is the node just before 𝑣𝑣 in the shorted path from 𝑠𝑠 to 𝑣𝑣]
So (𝑢𝑢, 𝑣𝑣) does not exist in 𝐺𝐺𝑓𝑓 but it does in 𝐺𝐺𝑓𝑓1 ! So (𝑣𝑣, 𝑢𝑢) existed in 𝐺𝐺𝑓𝑓 that became
critical while augmenting 𝑓𝑓 that yielded (𝑢𝑢, 𝑣𝑣) in 𝐺𝐺𝑓𝑓1 ((𝑣𝑣, 𝑢𝑢) was removed).
So 𝑑𝑑𝑓𝑓 𝑠𝑠, 𝑢𝑢 = 𝑑𝑑𝑓𝑓 𝑠𝑠, 𝑣𝑣 + 1
𝑑𝑑𝑓𝑓 𝑠𝑠, 𝑣𝑣 = 𝑑𝑑𝑓𝑓 𝑠𝑠, 𝑢𝑢 − 1 ≤ 𝑑𝑑𝑓𝑓1 𝑠𝑠, 𝑢𝑢 − 1 = 𝑑𝑑𝑓𝑓1 𝑠𝑠, 𝑣𝑣 − 2
Indicating 𝑑𝑑𝑓𝑓 𝑠𝑠, 𝑣𝑣 ≤ 𝑑𝑑𝑓𝑓1 𝑠𝑠, 𝑣𝑣 that REJECTS our assumption!
As 𝑓𝑓 1 is a flow just after augmenting the path to flow 𝑓𝑓, we have:
𝑑𝑑𝑓𝑓 𝑠𝑠, 𝑣𝑣 ≤ 𝑑𝑑𝑓𝑓′ 𝑠𝑠, 𝑣𝑣 for any 𝑓𝑓′ that is obtained after 𝑓𝑓.
NP-Complete & NP-Hard
1
Introduction
Saw efficient POLYNOMIAL TIME algorithms for finding a
shortest path in a graph, a graph’s minimum spanning tree, etc.
A graph with n vertices can have upto nn-2 spanning trees (complete
graph) [Kirchhoff’s Theorem], and a typical graph has an exponential
number of paths from s to t (complete: (n-2)!).
Input size 4
Travelling Salesman Problem (TSP)
Seeing TSP as a search problem. (slightly different!)
Check whether:
A
Exhaustive search would be worse than exponential! Our DP TSP
algorithm was also exponential!
5
Non-deterministic* polynomial time
We denote the class of all search problems by NP
(Set Cover)
(Knapsack)
Largest vertex subset that forms a
complete graph (all edges present)
8
Why only P, NP?
Subset to Superset? P then NP?
Starting point
Cost (lower bound): Cost of completed partial tour + cost of completing the
partial tour (at least) 10
Lower bound on the cost of Sum
completing the partial tour:
Example:
TSP sol.
11
Approximation
for TSP Assume complete graph and that the distances (edge length)
satisfy triangle inequality.
If we remove an edge from the TSP solution, it will
be a spanning tree!