Advanced Algorithms
Advanced Algorithms
Mohsen Ghaffari
These notes will be updated regularly. Please read critically; there are typos
throughout, but there might also be mistakes. Feedback and comments would be
greatly appreciated and should be emailed to [email protected].
Last update: September 20, 2023
Contents
2 Approximation schemes 13
2.1 Knapsack . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.2 Bin packing . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.3 Minimum makespan scheduling . . . . . . . . . . . . . . . . . 22
16 Paging 189
16.1 Types of adversaries . . . . . . . . . . . . . . . . . . . . . . . 190
16.2 Random Marking Algorithm (RMA) . . . . . . . . . . . . . . 191
16.3 Lower Bound for Paging via Yao’s Principle . . . . . . . . . . 194
1
Pr[X] ≥ 1 −
poly(n)
1
say, Pr[X] ≥ 1 − nc
for some constant c ≥ 2.
• min{x1 , . . . , xn } ∼ Exp(λ1 + · · · + λn )
• Pr[k | xk = min{x1 , . . . , xn }] = λk
λ1 +···+λn
Useful inequalities
• ( nk )k ≤ nk ≤ ( en
k
)k
• nk ≤ nk
• limn→∞ (1 − n1 )n = e−1
P∞ 1 π2
• i=1 i2 = 6
• (1 + x2 ) ≥ ex , for x ∈ [−1, 0]
• 1
1−x
≤ 1 + 2x for x ≤ 1
2
Theorem (Variance).
1
Pr[|X − µ| ≥ kσ] ≤
k2
−2 E(X)
Pr[X ≥ (1 + ) · E(X)] ≤ exp( ) for 0 <
3
−2 E(X)
Pr[X ≤ (1 − ) · E(X)] ≤ exp( ) for 0 < < 1
2
−2 E(X)
Pr[|X − E(X)| ≥ · E(X)] ≤ 2 exp( )
3
e
∀ > 0, Pr[X ≥ (1 + )E(X)] ≤ ( )E(X)
(1 + )1+
Pr[ω] > 0 ⇐⇒ ∃ω ∈ Ω
Combinatorics taking k elements out of n:
• no repetition, no ordering: nk
• no repetition, ordering: n!
(n−k)!
• repetition, ordering: nk
Part I
Basics of Approximation
Algorithms
1
Chapter 1
Greedy algorithms
3
4 CHAPTER 1. GREEDY ALGORITHMS
Example
S1 e1
S2 e2
S3 e3
S4 e4
e5
Algorithm 1 GreedySetCover(U, S, c)
T ←∅ . Selected subset of S
C←∅ . Covered vertices
while C 6= U do
Si ← arg minSi ∈S\T |Sc(S i)
i \C|
. Pick set with lowest price-per-item
T ← T ∪ {Si } . Add Si to selection
C ← C ∪ Si . Update covered vertices
end while
return T
Pn
Hn = i=1 1i = ln(n) + γ ≤ ln(n) + 0.6 ∈ O(log(n)), where γ is the Euler-Mascheroni
2
U
not in OP T e1
ek−1
ek
ek+1
OP T
OP Tk en
n − k + 1 = |U \ Ck−1 |
≤ |O1 ∩ (U \ Ck−1 )| + · · · + |Op ∩ (U \ Ck−1 )|
p
X
= |Oj ∩ (U \ Ck−1 )|
j=1
c(Oj )
3. By definition, for each j ∈ {1, . . . , p}, ppi(Oj ) = |Oj ∩(U \Ck−1 )|
.
Since the greedy algorithm will pick a set in S \ Tk with the lowest price-per-
item, price(ek ) ≤ ppi(Oj ) for all j ∈ {1, . . . , p}. Substituting this expression
into the last equation and rearranging the terms we get:
2 4 8 = 2 · 22 2 · 2k−1
elts elts elements elements
... ... Sk+1
S1 S2 S3 Sk
Remark We apply the same greedy algorithm for small ∆ but analyzed in
a more localized manner. Crucially, in this analysis, we always work with
the exact degree d and only use the fact d ≤ ∆ after summation. Observe
that ∆ ≤ n and the approximation factor equals that of Theorem 1.3 when
∆ = n.
a b c d e f
Algorithm 2 GreedyMaximalMatching(V, E)
M ←∅ . Selected edges
C←∅ . Set of incident vertices
while E 6= ∅ do
ei = {u, v} ← Pick any edge from E
M ← M ∪ {ei } . Add ei to the matching
C ← C ∪ {u, v} . Add endpoints to incident vertices
Remove all edges in E that are incident to u or v
end while
return M
...
Vertex cover C,
Maximal matching M
... where |C| = 2 · |M |
[h]
Sketch of Proof Let C be the set of all vertices involved in the greedily
selected hyperedges. In a similar manner as the proof in Theorem 1.8, C can
be showed to be an f -approximation.
12 CHAPTER 1. GREEDY ALGORITHMS
Chapter 2
Approximation schemes
13
14 CHAPTER 2. APPROXIMATION SCHEMES
2.1 Knapsack
Definition 2.3 (Knapsack problem). Consider a set S with n items. Each
item i has size(i) ∈ Z+ and profit(i) ∈ Z+ . Given a budget B, find a
subset S ∗ ⊆ S such that:
Let pmax = maxi∈{1,...,n} profit(i) denote the highest profit for an item.
Also, notice that any item which has size(i) > B cannot be chosen, due to
the size constraint, and therefore we can it. In O(n) time, we can remove any
such item and relabel the remaining ones as items 1, 2, 3, ... Thus, without
loss of generality, we can assume that size(i) ≤ B, ∀i ∈ {1, . . . , n}.
Observe that pmax ≤ profit(OP T (I)) because we can always pick at
least one item, namely the highest valued one.
Since each cell can be computed in O(1) using DP via the above recurrence,
matrix M can be filled in O(n2 pmax ) and S ∗ may be extracted from the table
M [., .]: we find the maximum value j ∈ [pmax , npmax ] for which M [n, j] < B
and we back-track from there to extract the optimal set S ∗ .
Algorithm 3 FPTAS-Knapsack(S, B, )
k ← max{1, b n pmax c} . Choice of k to be justified later
for i ∈ {1, . . . , n} do
profit0 (i) = b profit(i)
k
c . Round and scale the profits
end for
Run DP in Section 2.1.1 with B, size(i), and re-scaled profit0 (i).
return Items selected by DP
Proof. Suppose we are given a knapsack instance I = (S, B). Let loss(i)
denote the decrease in value by using rounded profit0 (i) for item i. By the
profit rounding definition, for each item i,
profit(i)
loss(i) = profit(i) − k · b c≤k
k
16 CHAPTER 2. APPROXIMATION SCHEMES
n
X
loss(i) ≤ nk loss(i) ≤ k for any item i
i=1
< · pmax Since k = b pmax c
n
≤ · profit(OP T (I)) Since pmax ≤ profit(OP T (I))
Example Consider
Pn S = {0.5, 0.1, 0.1, 0.1, 0.5, 0.4, 0.5, 0.4, 0.4}, where |S| =
n = 9. Since i=1 size(i) = 3, at least 3 bins are needed. One can verify
that 3 bins suffice: b1 = b2 = b3 = {0.5, 0.4, 0.1}. Hence, |OP T (S)| = 3.
2.2. BIN PACKING 17
b1 b2 b3
Algorithm 4 FirstFit(S)
B→∅ . Collection of bins
for i ∈ {1, . . . , n} do
if size(i) ≤ free(b) for some bin b ∈ B. then
Pick the smallest such b.
free(b) ← free(b) − size(i) . Put item i into existing bin b
else
B ← B ∪ {b0 } . Put item i into a fresh bin b0
free(b0 ) = 1 − size(i)
end if
end for
return B
Lemma 2.6. Using FirstFit, at most one bin is less than half-full. That
is, |{b ∈ B : size(b) ≤ 21 }| ≤ 1, where B is the output of FirstFit.
Proof. Suppose, for contradiction, that there are two bins bi and bj such that
i < j, size(bi ) ≤ 12 and size(bj ) ≤ 12 . Then, FirstFit could have put all
items in bj into bi , and would not have created bj . This is a contradiction.
Theorem 2.7. FirstFit is a 2-approximation algorithm for bin packing.
Proof.
Pn Suppose FirstFit terminates with |B| = m bins. By Pn Lemma 2.6,
m−1
i=1 size(i) > 2 , as m−1 bins are at least half-full. Since i=1 size(i) ≤
18 CHAPTER 2. APPROXIMATION SCHEMES
That is, m ≤ 2 · |OP T (I)| since both m and |OP T (I)| are integers.
Recall example with S = {0.5, 0.1, 0.1, 0.1, 0.5, 0.4, 0.5, 0.4, 0.4}. First-
Fit will use 4 bins: b1 = {0.5, 0.1, 0.1, 0.1}, b2 = b3 = {0.5, 0.4}, b4 = {0.4}.
As expected, 4 = |FirstFit(S)| ≤ 2 · |OP T (S)| = 6.
b1 b2 b3 b4
Remark If we first sort the item weights in non-increasing order, then one
can show that running FirstFit on the sorted item weights will yield a
3
2
-approximation algorithm for bin packing. See footnote for details1 .
It is natural to wonder whether we can do better than a 32 -approximation.
Unfortunately, unless P = N P, we cannot do so efficiently. To prove this, we
show that if we can efficiently derive a ( 23 − )-approximation for bin packing,
then the partition problem (which is N P-hard) can be solved efficiently.
Assumption (1) All items have at least size , for some > 0.
arrangements in one bin. Since at most n bins areR needed,Rthe total number
n+R
of bin configurations is at most R ≤ (n + R) = O(n ). Since k and
are constant, R is also constant and one can enumerate over all possible bin
configurations (denote this algorithm as A ) to exactly solve bin packing, in
this special case, in O(nR ) ∈ poly(n) time.
Algorithm 5 PTAS-BinPacking(I = S, )
k ← d 12 e
Q ← bn2 c
Partition n items into k non-overlapping groups, each with ≤ Q items
for i ∈ {1, . . . , k} do
imax ← maxitem j in group i size(j)
for item j in group i do
size(j) ← imax
end for
end for
Denote the modified instance as J
return A (J)
J1 J2 J rounds up Jk
.. .. ..
0 Item sizes
Figure 2.1: Partition items into k groups, each with ≤ Q items; Label
groups in ascending sizes; J rounds up item sizes, J 0 rounds down item sizes.
Proof. By Lemma 2.10 and the fact that |OP T (J 0 )| ≤ |OP T (I)|.
Proof. By Assumption (1), all item sizes are at least , so |OP T (I)| ≥ n.
Then, Q = bn2 c ≤ · |OP T (I)|. Apply Lemma 2.11.
Algorithm 6 Full-PTAS-BinPacking(I = S, )
0 ← min{ 12 , 2 } . See analysis why we chose such an 0
X ← Items with size < 0 . Ignore small items
0
P ← PTAS-BinPacking(S \ X, ) . By Theorem 2.12,
. |P | = (1 + 0 ) · |OP T (S \ X)|
P 0 ← Using FirstFit, add items in X to P . Handle small items
return Resultant packing P 0
Proof. If FirstFit does not open a new bin, the theorem trivially holds.
Suppose FirstFit opens a new bin (using m bins in total), then we know
that at least (m − 1) bins are strictly more than (1 − 0 )-full.
n
X
|OP T (I)| ≥ size(i) Lower bound on |OP T (I)|
i=1
> (m − 1)(1 − 0 ) From above observation
Hence,
|OP T (I)|
m< +1 Rearranging
1 − 0
1 1
< |OP T (I)| · (1 + 20 ) + 1 Since 0
≤ 1 + 20 , for 0 ≤
1− 2
1
≤ (1 + ) · |OP T (I)| + 1 By choice of 0 = min{ , }
2 2
M3 p6 p7
M2 p3 p4
M1 p1 p2 p5
0 Time
3 5 7 Makespan = 11
As Graham assigns greedily to the least loaded machine, all machines take
at least t time, hence
n
1 X
t≤ pi ≤ |OP T (I)|.
m i=1
M3 p3 p6
M2 p2 p5 p7
M1 p1 p4
0 Time
3 4 5 8 9 10 Makespan = 14
Let plast be the last job that finishes running. We consider the two cases
plast > 13 · |OP T (I)| and plast ≤ 31 · |OP T (I)| separately for the analysis.
2.3. MINIMUM MAKESPAN SCHEDULING 25
1
Lemma 2.17. If plast > 3
· |OP T (I)|, then |ModifiedGraham(I)| = |OP T (I)|.
Proof. For m ≥ n, |ModifiedGraham(I)| = |OP T (I)| by trivially putting
one job on each machine. For m < n, without loss of generality3 , we can
assume that every machine has a job.
Suppose, for a contradiction, that |ModifiedGraham(I)| > |OP T (I)|.
Then, there exists a sequence of jobs with descending sizes I = {p1 , . . . , pn }
such that the last, smallest job pn causes ModifiedGraham(I) to have a
makespan larger than OP T (I)4 . That is, |ModifiedGraham(I \ {pn })| ≤
|OP T (I)| and plast = pn . Let C be the configuration of machines after
ModifiedGraham assigned {p1 , . . . , pn−1 }.
Observation 1 In C, each machine has either 1 or 2 jobs.
If there exists machine Mi with ≥ 3 jobs, Mi will take > |OP T (I)|
time because all jobs take > 31 · |OP T (I)| time. This contradicts the
assumption |ModifiedGraham(I \ {pn })| ≤ |OP T (I)|.
Let us denote the jobs that are alone in C as heavy jobs, and the machines
they are on as heavy machines.
Observation 2 In OP T (I), all heavy jobs are alone.
By assumption on pn , we know that assigning pn to any machine (in
particular, the heavy machines) in C causes the makespan to exceed
|OP T (I)|. Since pn is the smallest job, no other job can be assigned to
the heavy machines otherwise |OP T (I)| cannot attained by OP T (I).
Suppose there are k heavy jobs occupying a machine each in OP T (I). Then,
there are 2(m − k) + 1 jobs (two non-heavy jobs per machine in C, and pn ) to
be distributed across m − k machines. By the pigeonhole principle, at least
one machine M ∗ will get ≥ 3 jobs in OP T (I). However, since the smallest
job pn takes > 13 · |OP T (I)| time, M ∗ will spend > |OP T (I)| time. This is
a contradiction.
Theorem 2.18. ModifiedGraham is a 43 -approximation algorithm.
Proof. By similar arguments as per Theorem 2.15, |ModifiedGraham(I)| =
t + plast ≤ 34 · |OP T (I)| when plast ≤ 13 · |OP T (I)|. Meanwhile, when plast >
1
3
· |OP T (I)|, |ModifiedGraham(I)| = |OP T (I)| by Lemma 2.17.
3
Suppose there is a machine Mi without a job, then there must be another machine
Mj with more than 1 job (by pigeonhole principle). Shifting one of the jobs from Mj to
Mi will not increase the makespan.
4
If adding pj for some j < n already causes |ModifiedGraham({p1 , . . . , pj })| >
|OP T (I)|, we can truncate I to {p1 , . . . , pj } so that plast = pj . Since pj ≥ pn >
1
3 · |OP T (I)|, the antecedent still holds.
26 CHAPTER 2. APPROXIMATION SCHEMES
M3 p3 p6
M2 p7 p5
M1 p4 p2 p1
0 Time
5 6 10 Makespan = 13
inequality). Note that running binary search on t also works, but we only
care about poly-time.
Lemma 2.19. For any t > 0, |α(I, t, )| ≤ Bin(I, t).
Proof. If FirstFit does not open a new bin, then |α(I, t, )| ≤ Bin(I, t) since
α(I, t, ) uses an additional (1 + ) buffer on each bin. If FirstFit opens a
new bin (say, totalling b bins), then there are at least (b − 1) produced bins
from A (exact solving on rounded down items of size > t) that are more than
(t(1 + ) − t) = t-full. Hence, any bin packing algorithm must use strictly
more than (b−1)t
t
= b − 1 bins. In particular, Bin(I, t) ≥ b = |α(I, t, )|.
Theorem 2.20. PTAS-Makespan is a (1 + )-approximation for the min-
imum makespan scheduling problem.
Proof. Let t∗ = |OP T (I)| and tα be the minimum t ∈ {L, L + L, L +
2L, . . . , 2L} such that |α(I, t, )| ≤ m. It follows that tα ≤ t∗ + L. Since
L ≤ |OP T (I)| and since we consider bins of final size tα (1+) to accomodate
for the original sizes, we have |PTAS-Makespan(I)| = tα (1 + ) ≤ (t∗ +
L)(1 + ) ≤ (1 + )2 · |OP T (I)|. For ∈ [0, 1] we have (1 + )2 ≤ (1 + 3)
and thus the statement follows.
Theorem 2.21. PTAS-Makespan runs in poly(|I|, m) time.
Pn
Proof. There are at most max{ pmax
1
, m 1
i=1 pi } ∈ O( ) values of t to try.
Filtering small jobs and rounding remaining jobs takes O(n). From the
h+1
previous lecture, A runs in O( 1 · n ) and FirstFit runs in O(nm).
28 CHAPTER 2. APPROXIMATION SCHEMES
Chapter 3
Randomized approximation
schemes
• A runs in poly(|I|, 1 )
29
30 CHAPTER 3. RANDOMIZED APPROXIMATION SCHEMES
Any clause with both xi and ¬xi is trivially false. As they can be removed
in a single scan of F , we assume that F does not contain such trivial clauses.
However, there are exponentially many terms and there exist instances where
truncating the sum yields arbitrarily bad approximation.
α1 α2 ... αf (F )
C1 0 1 ... 0
C2 1 1 ... 1
C3 0 0 ... 0
.. .. .. ..
... . . . .
Cm 0 1 ... 1
Table 3.1: Visual representation of the matrix M . Red 1’s indicate the
topmost clause Ci satisfied for each assignment αj
32 CHAPTER 3. RANDOMIZED APPROXIMATION SCHEMES
Let |M | denote the total number of 1’s in M ; it is the sum of the number
of clauses satisfied by each assignment that satisfies F . Recall that PmSi is the
n−|Ci |
set of assignments
Pm n−|Ci | that satisfy C i . Since |Si | = 2 , |M | = i=1 |Si | =
i=1 2 .
We are now interested in the number of “topmost” 1’s in the matrix, where
“topmost” is defined column-wise. As every column represents a satisfying
assignment, at least one clause must be satisfied for each assignment and this
proves that there are exactly f (F ) “topmost” 1’s in the matrix M (i.e. one
by column).
DNF-Count estimates the fraction of “topmost” 1’s in M , then returns
this fraction times |M | as an estimate of f (F ).
To estimate the fraction of “topmost” 1’s:
• Pick a clause according to its length: shorter clauses are more likely.
Algorithm 10 DNF-Count(F, )
X←0 . Empirical number of “topmost” 1’s sampled
9m
for k = 2 times do
n−|C |
Ci ← Sample one of m clauses, where Pr[Ci chosen] = 2 |M | i
αj ← Sample one of 2n−|Ci | satisfying assignments of Ci
IsTopmost ← True
for l ∈ {1, . . . , i − 1} do . Check if αj is “topmost”
if Cl [α] = 1 then . Checkable in O(n) time
IsTopmost ← False
end if
end for
if IsTopmost then
X ←X +1
end if
end for
return |Mk|·X
h i
|M |·X
Lemma 3.4. In DNF-Count, Pr k
− f (F ) ≤ · f (F ) ≥ 43 .
Proof. Let Xi be the indicator variable whether the i-th sampled assignment
is “topmost”, where p = Pr[Xi = 1]. By Lemma 3.3, p = Pr[Xi = 1] = f|M (F )
|
.
Pk
Let X = i=1 Xi be the empirical number of “topmost” 1’s. Then, E(X) =
kp by linearity of expectation. By picking k = 9m
2
,
|M | · X
Pr − f (F ) ≥ · f (F )
k
k · f (F ) · k · f (F ) k
= Pr X − ≥ Multiply by
|M | |M | |M |
f (F )
= Pr [|X − kp| ≥ kp] Since p =
|M |
2
kp
≤ 2 exp − By Chernoff bound
3
3m · f (F ) 9m f (F )
= 2 exp − Since k = and p =
|M | 2 |M |
≤ 2 exp(−3) Since |M | ≤ m · f (F )
1
≤
4
Negating, we get:
|M | · X 1 3
Pr − f (F ) ≤ · f (F ) ≥ 1 − =
k 4 4
34 CHAPTER 3. RANDOMIZED APPROXIMATION SCHEMES
Observation 3.7. For an edge cut of size k edges, the probability of its
disconnection is pk .
Reduction to DNF counting: If the graph had only a few cuts, there
would be a natural and easy way to formulate the problem as a variant of
DNF counting: each edge would be represented by a variable, and every
clause would correspond to a cut postulating that all of its edges have failed.
The probability of disconnecting can then be inferred from the fraction of
satisfying assignments, when each variable is true with probability p and false
otherwise. We note that the latter can be computed by an easy extension of
the DNF counting algorithm discussed in the previous section.
large interest, since the motivation behind studying this problem is the
understanding how to build reliable networks, and network with rather
small cut is not. Nevertheless, since the probability of disconnection
is rather high, Monte-Carlo method of sampling subsets of edges and
checking whether they are cuts is sufficient, since we only need Õ(n4 )
samples to achieve concentration.
1
• When pc ≤ 4 , we show that the large cuts do not contribute to the
n
probability of disconnection too much, and therefore they can be safely
ignored. Recall that the number of cuts of size αc for α ≥ 1 is at most
O(n2α )1 . When one selects a threshold γ = max{O(logn ε−1 ), Θ(1)} on
the cut size, we can express the contribution of large cuts – those of
size γc and higher — to the overall probability as
Z ∞
n2α · pαc dα < εpc .
γ
and has mixing time at most O(n log n). Therefore, after k = O(n log n )
times resamplings, the distribution of the coloring will be -close to a uniform
distribution on all valid q-colorings.
|Ωi | ≥ (∆ + 1) · |Ωi−1 \ Ωi |
⇐⇒ |Ωi | ≥ (∆ + 1) · (|Ωi−1 | − |Ωi |)
⇐⇒ |Ωi | + (∆ + 1) · |Ωi | ≥ (∆ + 1) · |Ωi−1 |
⇐⇒ (∆ + 2) · |Ωi | ≥ (∆ + 1) · |Ωi−1 |
|Ωi | ∆+1
⇐⇒ ≥
|Ωi−1 | ∆+2
|Ωi | ∆+1
This implies that ri = |Ωi−1 |
≥ ∆+2
≥ 43 since ∆ ≥ 2.
|Ω1 |
Since f (G) = |Ωm | = |Ω0 | · |Ω0 |
. . . |Ω|Ωm−1
m|
|
= |Ω0 | · Πm
= q n · Πm i=1 ri
i=1 ri , if
we can find a good estimate for each ri with high probability, then we have
a FPRAS for counting the number of valid graph colorings for G.
Algorithm 12 Color-Count(G, )
m ← 0
rb1 , . . . , rc . Estimates for ri
for i = 1, . . . , m do
3
for k = 128m 2
times do
c ← Sample coloring of Gi−1 . Using SampleColor
if c is a valid coloring for Gi then
rbi ← rbi + k1 . Update empirical count of ri = |Ω|Ωi−1
i|
|
end if
end for
end for
return q n Πm i=1 r
bi
Proof. Let Xj be the indicator variable whether the j-th sampled coloring
for Ωi−1 is a valid coloring for Ωi , where p = Pr[Xj = 1]. From above, we
know that p = Pr[Xj = 1] = |Ω|Ωi−1
i| 3
Pk
|
≥ 4
. Let X = j=1 Xj be the empirical
number of colorings that is valid for both Ωi−1 and Ωi , captured by k · rbi .
3
Then, E(X) = kp by linearity of expectation. Picking k = 128m2
,
2
h i ( 2m ) kp
Pr |X − kp| ≥ kp ≤ 2 exp − By Chernoff bound
2m 3
128m3
32mp
= 2 exp − Since k =
3 2
3
≤ 2 exp(−8m) Since p ≥
4
1 1
≤ Since exp(−x) ≤ for x > 0
4m x
Dividing by k and negating, we have:
h i h i 1
Pr |b
ri − ri | ≤ · ri = 1 − Pr |X − kp| ≥ kp ≥ 1 −
2m 2m 4m
m
n
X h i
Pr [|q Πm
i=1 r
bi − f (G)| > f (G)] ≤ Pr |b
ri − ri | > · ri
i=1
2m
1 1
≤m· =
4m 4
Hence, Pr [|q n Πm bi − f (G)| ≤ f (G)] ≥ 3/4.
i=1 r
Remark Recall from Claim 3.9 that SampleColor actually gives an ap-
proximate uniform coloring. A more careful analysis can absorb the approx-
imation of SampleColor under Color-Count’s factor.
3
See https://www.wolframalpha.com/input/?i=e%5Ex+%3C%3D+1%2B2x
40 CHAPTER 3. RANDOMIZED APPROXIMATION SCHEMES
Chapter 4
Linear programming (LP) and integer linear programming (ILP) are versa-
tile models but with different solving complexities — LPs are solvable in
polynomial time while ILPs are N P-hard.
minimize cT x
subject to Ax ≥ b
x≥0
ILPs are defined similarly with the additional constraint that variables
take on integer values. As we will be relaxing ILPs into LPs, to avoid confu-
sion, we use y for ILP variables to contrast against the x variables in LPs.
minimize cT y
subject to Ay ≥ b
y≥0
y ∈ Zn
41
42 CHAPTER 4. ROUNDING LINEAR PROGRAM SOLUTIONS
Remark We can define LPs and ILPs for maximization problems similarly.
One can also solve maximization problems with a minimization LPs using
the same constraints but negated objective function. The optimal value from
the solved LP will then be the negation of the maximized optimal value.
In this chapter, we illustrate how one can model set cover and multi-
commodity routing as ILPs, and how to perform rounding to yield approx-
imations for these problems. As before, Chernoff bounds will be a useful
inequality in our analysis toolbox.
Example
S1 e1
S2 e2
S3 e3
S4 e4
e5
ILPSet cover
m
X
minimize yi · c(Si ) / Cost of chosen set cover
i=1
X
subject to yi ≥ 1 ∀j ∈ [n] / Every item ej is covered
i:ej ∈Si
Upon solving ILPSet cover , the set {Si : i ∈ [n] ∧ yi∗ = 1} is the optimal
solution for a given set cover instance. However, as solving ILPs is N P-hard,
we consider relaxing the integral constraint by replacing binary yi variables
by real-valued/fractional xi ∈ [0, 1]. Such a relaxation will yield the corre-
sponding LP:
LPSet cover
m
X
minimize xi · c(Si ) / Cost of chosen fractional set cover
i=1
X
subject to xi ≥ 1 ∀j ∈ [n] / Every item ej is fractionally covered
i:ej ∈Si
Since LPs can be solved in polynomial time, we can find the optimal
fractional solution to LPSet cover in polynomial time.
Example The corresponding ILP for the example set cover instance is:
After relaxing:
Proof. Since x∗ is a feasible (not to mention, optimal) solution for LPSet cover ,
in each constraint, there is at least one x∗i that is greater or equal to f1 . Hence,
every element is covered by some set yi in the rounding.
1
Using Microsoft Excel. See tutorial: http://faculty.sfasu.edu/fisherwarre/lp_
solver.html
Or, use an online LP solver such as: http://online-optimizer.appspot.com/?model=
builtin:default.mod
4.1. MINIMUM SET COVER 45
Since e−1 ≈ 0.37, we would expect the rounded y not to cover several
items. However, one can amplify the success probability by considering in-
dependent roundings and taking the union (See ApxSetCoverILP).
Algorithm 13 ApxSetCoverILP(U, S, c)
ILPSet cover ← Construct ILP of problem instance
LPSet cover ← Relax integral constraints on indicator variables y to x
x∗ ← Solve LPSet cover
T ←∅ . Selected subset of S
for k · ln(n) times (for any constant k > 1) do
for i ∈ [m] do
yi ← Set to 1 with probability x∗i
if yi = 1 then
T ← T ∪ {Si } . Add to selected sets T
end if
end for
end for
return T
(iv) (Single path): All demand for commodity i passes through a single path
pi (no repeated vertices).
(v) (Congestion factor): ∀e ∈ E, ki=1 di 1e∈pi ≤ λ · c(e), where indicator
P
1e∈pi = 1 ⇐⇒ e ∈ pi .
(vi) (Minimum congestion): λ is minimized.
48 CHAPTER 4. ROUNDING LINEAR PROGRAM SOLUTIONS
s1 13 17
a t1
7 8
19 5
s2 20
b 11 t2
8 5 7
s3 6
c t3
s1 10 10
a t1
s2 b t2
s3 c t3
s1 5
a t1
5 5
5 5
5
s2 b t2
s3 c t3
4.2. MINIMIZING CONGESTION IN MULTI-COMMODITY ROUTING49
s1 a t1
s2 5 5
b t2
5 5 5
s3 5
c t3
s1 10 10
a t1
s2 b t2
s3 c t3
s1 a t1
10
10
10
s2 b t2
s3 c t3
50 CHAPTER 4. ROUNDING LINEAR PROGRAM SOLUTIONS
s1 a t1
10
s2 10
b 10 t2
10
s3 c t3
ILPMCR-Given-Paths
minimize λ / (1)
k
X X
subject to di · yi,p ≤ λ · c(e) ∀e ∈ E / (2)
i=1 p∈Pi ,e∈p
X
yi,p = 1 ∀i ∈ [k] / (3)
p∈Pi
Relax the integral constraint on yi,p to xi,p ∈ [0, 1] and solve the correspond-
ing LP. Define λ∗ = obj(LPMCR-Given-Paths ) and denote x∗ as a fractional path
4.2. MINIMIZING CONGESTION IN MULTI-COMMODITY ROUTING51
selection that achieves λ∗ . To obtain a valid path selection, for each com-
x∗
modity i ∈ [k], pick path p ∈ Pi with weighted probability P i,p x∗ = x∗i,p .
p∈Pi i,p
Note that by constraint (3), p∈Pi x∗i,p = 1.
P
Remark 1 For a fixed i, a path is selected exclusively (only one!) (cf. set
cover’s roundings where we may pick multiple sets for an item).
2c log m
Theorem 4.9. Pr[obj(y) ≥ log log m
max{1, λ∗ }] ≤ 1
mc−1
k
X
E(Ye ) = E( di · Ye,i )
i=1
k
X
= di · E(Ye,i ) By linearity of expectation
i=1
Xk X X
= di xi,p Since Pr[Ye,i = 1] = xi,p
i=1 p∈Pi ,e∈p p∈Pi ,e∈p
For every edge e ∈ E, applying2 the tight form of Chernoff bounds with
2 log n Ye
(1 + ) = log log n
on variable c(e) gives
Ye 2c log m 1
Pr[ ≥ max{1, λ∗ }] ≤ c
c(e) log log m m
ILPMCR-Given-Network
minimize λ / (1)
X X
subject to f (e, i) − f (e, i) = 1 ∀i ∈ [k] / (2)
e∈out(si ) e∈in(si )
X X
f (e, i) − f (e, i) = 1 ∀i ∈ [k] / (3)
e∈in(ti ) e∈out(ti )
X X
f (e, 1) − f (e, 1) = 0 ∀e ∈ E, / (4)
e∈out(v) e∈in(v)
∀v ∈ V \ {s1 ∪ t1 }
.. ..
. .
X X
f (e, k) − f (e, k) = 0 ∀e ∈ E, / (4)
e∈out(v) e∈in(v)
∀v ∈ V \ {sk ∪ tk }
k
X X
di · yi,p ≤ λ · c(e) ∀e ∈ E As before
i=1 p∈Pi ,e∈p
X
yi,p = 1 ∀i ∈ [k] As before
p∈Pi
mine∈pi f (e, i) on the path as the selection probability (as per xe,i in the pre-
vious section). By selecting the path pi with probability mine∈pi f (e, i), one
can show by similar arguments as before that E(obj(y)) ≤ obj(x∗ ) ≤ obj(y ∗ ).
Naive LP. The most obvious way is to take xij to be an indicator for
assigning job i to machine j and then optimizing the following objective:
min t
n
X
s.t. ∀machine j, tij · xij ≤ t
i=1
m
X
and ∀job i, xij ≥ 1
j=1
Here, t is an additional variable giving an upper bound for the finishing time
of the last job.
The problem with this LP is that the best fractional solution and the
best integral solution can be far apart, i.e., the LP has a large “integrality
gap”. Namely, the fractional solution to the LP is allowed to distribute a
54 CHAPTER 4. ROUNDING LINEAR PROGRAM SOLUTIONS
LP(λ) :
X
∀machine j, tij · xij ≤ λ
(i,j)∈Sλ
X
∀job i, xij ≥ 1
(i,j)∈Sλ
∀(i, j) ∈ Sλ , xij ≥ 0
This time, we just want to check for feasibility. We have constraints (defining
a polytope), but there is no objective function. Using binary search4 , we can
find the smallest λ∗ for which we can find a fractional solution of LP(λ∗ ).
Note that it is easy to initialize the binary search, as there are trivial lower
and upper bounds on λ∗ , for example 0 and the sum of all processing times.
Now, somebody gives us a fractional solution of LP(λ∗ ): Suppose this
solution is x∗ ∈ [0, 1]|Sλ | . Instead of just assuming that x∗ is an arbitrary
solution, we will also assume that x∗ is a vertex of the polytope.5
Feasibility of LP(λ) is a monotone property in λ, as for λ ≤ λ0 , a solution of LP(λ)
4
1. For edges (i, j) such that xij = 1, assign job i to machine j. Let I be
the set of jobs assigned in this step. (I ⊆ J.)
2. Let H be the bipartite graph on jobs and machines where job i and
machine j are connected iff xi,j ∈ (0, 1).
Jobs assigned in step 1 take at most time λ∗ to complete. With the matching,
each machine gets at most one more job, whose cost is also at most λ∗ (by
definition of the set Sλ∗ ). Therefore, we can construct a solution with cost
at most 2 · λ∗ . As λ∗ is a lower bound on the optimal cost, this proves that
the algorithm is a 2-approximation. This is a much more careful rounding
method than randomized rounding.
Selected Topics in
Approximation Algorithms
59
Chapter 5
Distance-preserving tree
embedding
Many hard graph problems become easy if the graph is a tree: in partic-
ular, some N P-hard problems are known to admit exact polynomial time
solutions on trees, and for some other problems, we can obtain much better
approximations on tree. Motivated by this fact, one hopes to design the
following framework for a general graph G = (V, E) with distance metric
dG (u, v) between vertices u, v ∈ V :
1. Construct a tree T
61
62 CHAPTER 5. DISTANCE-PRESERVING TREE EMBEDDING
• (T is a probability space):
P
T ∈T Pr[T ] = 1
dG (u,v)
(B) ∀u, v ∈ V , Pr[u and v not in same partition] ≤ α · D
, for some α
N2
N1
N3
N5
N4
N7
N8
N6
Observation 5.5. If Vi is the first partition that cuts B(u, r), a necessary
condition is that in the random permutation π, vi appears before any vj with
j < i. (i.e. π(vi ) < π(vj ), ∀1 ≤ j < i).
Proof. Consider the largest 1 ≤ j < i such that π(vj ) < π(vi ):
• If B(u, r) ⊆ B(vj , θ), then vertices in B(u, r) would have been removed
before vi is considered.
• If B(u, r) ∩ B(vj , θ) 6= ∅ and B(u, r) 6⊆ B(vj , θ), then Vi is not the first
partition that cuts B(u, r) since Vj (or possibly an earlier partition)
has already cut B(u, r).
In any case, if there is a 1 ≤ j < i such that π(vj ) < π(vi ), Vi does not cut
B(u, r).
2r
Observation 5.6. Pr[Vi cuts B(u, r)] ≤ D/8
Proof. We ignore all the other partitions, only considering the sufficient con-
dition for a partition to cut a ball. Vi cuts B(u, r) means ∃u1 ∈ B(u, r), s.t.u1 ∈
B(vi , θ) ∩ ∃u2 ∈ B(u, r), s.t.u2 ∈
/ B(vi , θ).
2r
Therefore, Pr[Vi cuts B(u, r)] ≤ Pr[θ ∈ [dG (u, vi ) − r, dG (u, vi ) + r]] ≤ D/8
.
66 CHAPTER 5. DISTANCE-PRESERVING TREE EMBEDDING
Thus,
Pr[B(u, r) is cut]
[n
= Pr[ Event that Vi first cuts B(u, r)]
i=1
n
X
≤ Pr[Vi first cuts B(u, r)] Union bound
i=1
n
X
= Pr[π(vi ) = min π(vj )] Pr[Vi cuts B(u, r)] Require vi to appear first
j≤i
i=1
n
X 1
= · Pr[Vi cuts B(u, r)] By random permutation π
i=1
i
n
X 1 2r D D
≤ · diam(B(u, r)) ≤ 2r, θ ∈ [ , ]
i=1
i D/8 8 4
n
r X 1
= 16 Hn Hn =
D i=1
i
r
∈ O(log(n)) ·
D
5.1.3 Construction of T
Using ball carving, ConstructT recursively partitions the vertices of a
given graph until there is only one vertex remaining. At each step, the upper
5.1. A TIGHT PROBABILISTIC TREE EMBEDDING CONSTRUCTION67
2D D D 4D
i
≤ dT (u, v) = 2 · ( i + i+1 + · · · ) ≤ i
2 2 2 2
See picture — r is the auxiliary node at level i which splits nodes u and v.
r r
D D D D
2i 2i 2i 2i
u ∈ Vu ... v ∈ Vv
u ∈ Vu ... v ∈ Vv
D D
2i+1 2i+1
.. ..
. .
u v
68 CHAPTER 5. DISTANCE-PRESERVING TREE EMBEDDING
level-0 r0 r0 r0 level-0
D D D D D D
V1,1,...,1 level-i
D
2i
..
.
Figure 5.2: Recursive ball carving with dlog2 (D)e levels. Red vertices are
auxiliary nodes that are not in the original graph G. Denoting the root as
the 0th level, edges from level i to level i + 1 have weight 2Di .
log(D)−1
X
E[dT (u, v)] = Pr[Ei ] · [dT (u, v), given Ei ] Definition of expectation
i=0
log(D)−1
X 4D
≤ Pr[Ei ] · By Lemma 5.8
i=0
2i
log(D)−1
X dG (u, v) 4D
≤ (α · )· i Property (B) of ball carving
i=0
D/2i 2
= 4α log(D) · dG (u, v) Simplifying
We can remove the log(D) factor, and prove that the tree embedding built
by the algorithm has stretch factor c = O(log n). For that, we need a tighter
analysis of the ball carving process, by only considering vertices that may
cut B(u, dG (u, v)) instead of all n vertices, in each level of the recursive
partitioning. This sharper analysis is presented as a separate section below.
See Theorem 5.13 in Section 5.1.4.
Remark Although lemma 5.11 is not a very useful inequality (since any
probability ≤ 1), we use it to partition the value range of r so that we can
say something stronger in the next lemma.
D
Lemma 5.12. For i ∈ N, if r ≤ 16
, then
r |B(u, D/2)|
Pr[B(u, r) is cut] ≤ O(log( ))
D |B(u, D/16)|
D D
Proof. Vi cuts B(u, r) only if 8
− r ≤ dG (u, vi ) ≤ 4
+ r, we have dG (u, vi ) ∈
D 5D D D
[ 16 , 16 ] ⊆ [ 16 , 2 ].
D
D 2
2 D
16
u u
D v1 vj vj+1 . . . vk Dist from u
16
• j − 1 = |B(u, 16
D
)| as the number of nodes that have distance ≤ D
16
from
u
We see that only vertices vj , vj+1 , . . . , vk have distances from u in the range
D D
[ 16 , 2 ]. Pictorially, only vertices in the shaded region could possibly cut
B(u, r). As before, let π(v) be the ordering in which vertex v appears in
5.1. A TIGHT PROBABILISTIC TREE EMBEDDING CONSTRUCTION71
Pr[B(u, r) is cut]
k
[
= Pr[ Event that Vi cuts B(u, r)] Only Vj , Vj+1 , . . . , Vk can cut
i=j
k
X
≤ Pr[π(vi ) < min{π(vz )}] · Pr[Vi cuts B(u, r)] Union bound
z<i
i=j
k
X 1
= · Pr[Vi cuts B(u, r)] By random permutation π
i=j
i
k
X 1 2r D D
≤ · diam(B(u, r)) ≤ 2r, θ ∈ [ , ]
i=j
i D/8 8 4
k
r X 1
= (Hk − Hj ) where Hk =
D i=1
i
r |B(u, D/2)|
∈ O(log( )) since Hk ∈ Θ(log(k))
D |B(u, D/16)|
Proof. As before, let Ei be the event that “vertices u and v get separated at
the ith level. For Ei to happen, the ball B(u, r) = B(u, dG (u, v)) must be cut
at level i, so Pr[Ei ] ≤ Pr[B(u, r) is cut at level i].
72 CHAPTER 5. DISTANCE-PRESERVING TREE EMBEDDING
(6) Simplifying
i∗ −i
(7) Since log(D)−1 ≤ 25
P
i=i∗ −4 2
Algorithm 16 Contract(T )
while T has an edge (u, w) such that u ∈ V and w is an auxiliary node
do
Contract edge (u, w) by merging subtree rooted at u into w
Identify the new node as u
end while
Multiply weight of every edge by 4
return Modified T 0
Since we do not contract actual vertices, at least one of the (u, w) or (v, w)
edges of weight 2Di will remain. Multiplying the weights of all remaining edges
by 4, we get dT (u, v) ≤ 4 · 2Di = dT 0 (u, v).
Suppose we only multiply the weights of dT (u, v) by 4, then dT 0 (u, v) = 4 · dT (u, v).
Since we contract edges, dT 0 (u, v) can only decrease, so dT 0 (u, v) ≤ 4 · dT (u, v).
Remark Claim 5.14 tells us that one can construct a tree T 0 without aux-
iliary variables by incurring an additional constant factor overhead.
•
P
e∈E f (ce ) · le is minimized
Let us denote I = (G, f, {si , ti , di }ki=1 ) as the given instance. Let OP T (I, G)
be the optimal solution on G. The general idea of our algorithm NetworkDesign
is first transforming the original graph G into a tree T by probabilistic tree
embedding method, contracting the tree as T 0 , then finding an optimal solu-
tion on T 0 and map it back to graph G. Let A(I, G) be the solution produced
by our algorithm on graph G. Denote the costs as |OP T (I, G)| and |A(I, T )|
respectively.
We now compare the solutions OP T (I, G) and A(I, T ) by comparing
edge costs (u, v) ∈ E in G and tree embedding T . For the three claims
below, we provide just proof sketches, without diving into the notation-heavy
calculations. Please refer to Section 8.6 in [WS11] for the formal arguments.
Proof. (Sketch) This follows from two facts: (1) For any edge xy ∈ T , all of
the paths sent in A(I, T ) along edge xy are now sent along the shortest path
connecting x and y in G, which by the first property of the tree embedding
has length at most equal to the length of the xy edge. (2) Sevaral paths of
, corresponding to different edges in the tree T , might end up being routed
through the same graph G edge e. But by subadditivity, the cost on edge e
is at most the summation of the costs of those paths.
Claim 5.18. E[|OP T (I, T )| using edges in T ] ≤ O(log n) · |OP T (I, G)|
Proof. (Sketch) Using subadditivity, we can upper bound the cost of OP T (I, T )
by the summation over all edges e ∈ G of the cost for the capacity of this
edge in the optimal solution OP T (I, G) multiplied by the length of the path
connecting the two endpoints of e in the tree T . We know that T stretches
edges by at most a factor of O(log n), in expectation. Hence, the cost is in
expectation upper bounded by the summation over all edges e ∈ G of the
cost for the capacity of this edge in the optimal solution OP T (I, G) mul-
tiplied by the length of the edge e in G. The latter is simply the cost of
OP T (I, G).
By the three claims above, NetworkDesign gives a O(log n)-approximation
to the buy-at-bulk network design problem, in expectation.
76 CHAPTER 5. DISTANCE-PRESERVING TREE EMBEDDING
Chapter 6
In this section we will study the minimum s-t cut problem in undirected
graphs. Given a graph G = (V, E) we define a cut as a partition of the
vertices in two sets(S, V \ S). We will typically work with connected graphs,
in this situation any non-trivial cut (S 6= ∅, V \ S 6= ∅) will have some edges
going across the cut. The minimum s-t cut problem searches for cuts that
separate a source s ∈ V from a target t ∈ V minimizing the capacity of the
edges across the cut.
An intuitive way of thinking of the minimum s-t cut problem is considering
the capacities as prices to pay to remove the edges and trying to disconnect
the node s from t while minimizing the price paid.
The minimum s-t cut problem can be solved in polynomial time by formulat-
ing it as a linear programming. We will follow a slightly different approach,
solving a relaxation of the problem and constructing a minimum cost cut
using the solution of the relaxed problem. The solution that we present is
not computationally better than solving the minimum s-t cut directly but
presents a framework that can be generalized to more complex cut problems.
77
78 CHAPTER 6. L1 METRIC EMBEDDING & SPARSEST CUT
With the solution of the linear program d∗ we can find the minimum s-t
cut.
Claim 6.2. For the minimum s-t cut problem, the relaxed optimization prob-
lem has the same optimum value as the original problem.
Proof. First we note that since d∗ is a relaxation of the original problem, for
any cut (S, V \ S) it holds that
Such cut S ∗ will be a minimum s-t cut. To find it we first observe that
since d∗ is the optimum of the linear program it must satisfy the constraint
d∗ (s, t) = 1. Then we order the vertices according to their distance to
the source, defining vi ∈ V as the ith farthest vertex to the source and
xi = d∗ (s, vi ) it’s corresponding distance to the source. We also define the
increments yi = xi+1 − xi .
Now we can define the natural cuts as the cuts that separate the vertices ac-
cording to their distance to the source, Si = {v ∈ V |d∗ (s, v) ≤ xi }. Figure
6.1 shows the vertices and the natural cuts.
We now show that one of the natural cuts achieves the optimum and there-
v4
s v1 v2 v3 v5 v6 t
d∗ (s, vi )
0 x1 x2 x3 x4 x5 1
y2
fore is a minimum s-t cut. First we observe that for any edge e = {vi , vj } with
xi ≤ xj w.l.g. the triangular inequality gives us d(s, vj ) ≤ d(s, vi ) + d(vi , vj ),
hence
X
d(e) = d(vi , vj ) ≥ d(s, vj ) − d(s, vi ) = xj − xi = yk
cuts k that are crossed by the edge e
80 CHAPTER 6. L1 METRIC EMBEDDING & SPARSEST CUT
Pn−1
Since d∗ (s, t) = 1 and the vertices are ordered, k=0 yk = xn ≥ xt = 1,
hence
( )
X
≥ min ce
k cuts
edges e that cross the k-th cut
Where P S ∗ is defined as the natural cut with the lowest capacity, i.e. it
minimizes edges e that cross the k-th cut ce . Therefore the cut S ∗ achieves a lower
cost than the cost of the relaxed problem and is a minimum s-t cut.
where E(S, V \ S) denotes the set of edges that cross the cut {e = (u, v) ∈
E|u ∈ S and v ∈ V \ S}.
Observation 6.3 (Sparse cuts in complete graphs). For complete graphs,
the number of edges between any cut S and V \ S is exactly |S| · |V \ S|, so
L(S) = 1.
We can interpret the objective function (6.2) as comparison between the
cut S in the given graph and the cut S in a complete graph on the same
vertices. We also note that sparse cuts penalize cuts where the two sets have
6.2. SPARSEST CUT VIA L1 EMBEDDING 81
Now, as we did in the minimum s-t cut problem, we will relax the prob-
lem. We observe that for any cut (S, V \ S) the expression dS (u, v) =
P |1S (u)−1S (v)|
P
|1S (u)−1S (v)|
defines a pseudo-metric and satisfies u,v∈V dS (u, v) =
u,v∈V
P |1S (u)−1S (v)| = 1. Therefore we will relax the problem to any pseudo-
u,v∈V |1S (u)−1S (v)|P
metric that satisfies u,v∈V duv = 1. We write the relaxed problem as the
following linear program:
Definition 6.5 (Sparsest cut relaxation).
X
minimise duv (6.5)
e={u,v}∈E
Now from the optimal pseudo-metric d∗ we would like to obtain a cut that
approximates the sparsest cut. Finding such cut from d∗ is not easy, therefore
we approximate the optimal pseudo-metric d∗ with the L1 distance of an
Rk space for some k. This approximation can be done using the following
embedding result.
82 CHAPTER 6. L1 METRIC EMBEDDING & SPARSEST CUT
aj Pk
mink+1
j=1 bj (1 +
bi
i=1 bk+1 )
≥
1 + ki=1 bk+1
bi
P
k+1
aj
= min .
j=1 bj
Let jmin be the index that minimizes the quotient on the right hand side.
Since the objective function is invariant to affine transformation to f (fˆ(u) =
a · f (u) + b), we may assume that
max fjmin (u) = 1 and min fjmin (u) = 0.
u∈V u∈V
Now let τ ∈ [0, 1] be a uniformly distributed threshold and define the cut
Sτ = {v ∈ V : fjmin (v) ≤ τ } (6.8)
Note that
0 if fjmin (u), fjmin (v) ≤ τ
|1Sτ (u)−1Sτ (v)| = 1 if min{fjmin (u), fjmin (v)} ≤ τ < max{fjmin (u), fjmin (v)}
0 if fjmin (u), fjmin (v) > τ
and
E[|1Sτ (u) − 1Sτ (v)|] = |fjmin (u) − fjmin (v)|.
Putting everything together, we obtain
P P
{u,v}∈E kf (u) − f (v)k1 {u,v}∈E |fjmin (u) − fjmin (v)|
P ≥ P
u,v∈V kf (u) − f (v)k1 |fj (u) − fjmin (v)|
P u,v∈V min
{u,v}∈E E[|1Sτ (u) − 1Sτ (v)|]
= P
u,v∈V E[|1Sτ (u) − 1Sτ (v)|]
The last step follows from choosing the minimal among n different cuts Sτ ,
and using lemma (6.7) again.
Finally, we can show that the cut obtained from the L1 metric is a good
approximation of the sparsest cut.
Proof of Theorem 6.4. Let d∗ be an optimal pseudo-metric solution to the
linear program (6.5), function f : V → Rk an embedding as in Theorem 6.6,
and set S ⊆ V the cut extraction described in Claim 6.8. Then with high
probability we have
P
{u,v}∈E kf (u) − f (v)k1
L(S) ≤ P
u,v∈V kf (u) − f (v)k1
∗
P
{u,v}∈E d (u, v)
≤ O(log(n)) P ∗
u,v∈V d (u, v)
≤ O(log(n))L(SOP T ).
Where the first inequality comes from the cut extraction, Claim 6.8, the sec-
ond inequality comes from the L1 embedding theorem and the last inequality
comes from d∗ being the solution of the relaxed optimization problem.
Note that random threshold cuts Sτ can be defined along any dimension
j ∈ {1, . . . , k} of f . The theorem shows that among these at most n · k =
O(n log2 (n)) cuts Sτ , one of them is an O(log(n)) approximation of the
sparsest cut, with high probability.
6.3 L1 Embedding
In the previous section, we saw how we can find a Θ(log n) approximation
of the sparsest cut by using in a black-box manner an embedding that maps
the points to a space with L1 norm while stretching pairwise distances by
at most an Θ(log n) factor. In this section, we prove the existence of this
embedding, i.e.,
Lemma 6.9. Given a pseudo-metric d : V × V → R+ for an n-point space
V , we construct a mapping f : V → Rk for k = Θ(log2 n) such that for any
two vertices u, v ∈ V , we have d(u, v)/Θ(log n) ≤ ||f (u) − f (v)||1 ≤ d(u, v).
Algorithm 18 L1 EMBEDDING
for i = 1 to L = log n do
for h = 1 to H = 1000 log n do . Probability amplification
Define Sih by including each v ∈ V in it independently with prob. 1/2i
Define the coordinate of f by f(i−1)H+h (u) = d(u, Sih )/LH
end for
end for
The embedding is then given as f : V → RLH , where each fi is defined above
2. After showing that this result holds for one component we will augment
the probability to make it hold for components fih for all i ∈ {1, . . . , L}
and a fraction of h ∈ {1, . . . H}. This will allow us to provide the
desired bound for the L1 embedding.
Where Br (u) and Br (v) are closed balls of radius r. From the definition of
ρt there exists an index j such that ρj < d(u, v)/2 and ρj+1 ≥ d(u, v)/2. We
define the truncated sequence,
(
ρi if i = 0, 1, . . . , j
ρ̂i =
d(u, v)/2 if i = j + 1, . . . , L
Where Sih are the set of nodes that defines the coordinate fih . Because the
two balls are disjoint, events Aih and Bih are independent. When the events
Aih and Bih both happen we have d(u, Sih ) ≥ ρ̂i and d(v, Sih ) ≤ ρ̂i−1 , so
ρ̂i − ρ̂i−1
|fih (u) − fih (v)| = |d(Sih , u) − d(Sih , v)| ≥
LH
88 CHAPTER 6. L1 METRIC EMBEDDING & SPARSEST CUT
To conclude the proof of the first step it remains to show that the event
Aih ∩ Bih happens with a constant probability. The probabilities of Sih not
sampling a node in Bρ̂open
i
(u) and Sih sampling a node in Bρ̂i−1 (v) can be
calculated as
|Bρ̂open (u)| 2i
1 i 1 1
Pr[Aih ] = 1 − i ≥ 1− i ≥ 4−1 = .
2 2 4
|Bρ̂ (v)| 2i−1
1 i−1 1
Pr[Bih ] = 1 − 1 − i ≥1− 1− i ≥ 1 − e−1/2 .
2 2
In the preceding calculation, we use the fact that 4−x ≤ 1 − x ≤ e−x for
x ∈ [0, 1/2]. Since Aih and Bih are independent events, then
Where we have defined c := (1 − e−1/2 )/4. And the first step of the proof is
finished, the inequality |fih (u) − fih (v)| ≥ ρ̂i −ρ̂
LH
i−1
holds in a component fih
with constant probability.
In the second step of the proof we amplify the success probability. For every
i ∈ {1, . . . , L} let xi be the number of indices such that the inequality holds,
using the Chernoff bound we have
h c i h c i
Pr xi ≤ H ≤ Pr |x − c · H| ≥ H
2 2
−c · H 2 2
≤ 2 exp ≤ 1000c/12 ≤ 5
12 n n
And for every i ∈ {1, . . . , L} the inequality proved in Step 1 will hold in
at least cH/2 indices with high probability 1 − 1/n5 . To make this result
hold for all i ∈ {1, . . . , L} the probability of failure will be amplified to
Θ( log
n5
n
) ≤ Θ( n14 ).
Now, since the result holds for every index i and a significant fraction of the
indices h we have
L X
X H
||f (u) − f (v)||1 = |fih (u) − fih (v)|
i=1 h=1
L
c X ρ̂i − ρ̂i−1 d(u, v)
≥ H =
2 i=1
LH 2L
Where the last equality comes from a telescopic sum and the definition of
ρ̂0 = 0 and ρ̂L = d(u,v)
2
.
6.3. L1 EMBEDDING 89
This proves that the embedding is valid for a pair of vertices u, v ∈ V that
we have fixed at the beginning of the proof. If we consider the probability
that the embedding works for any pair of vertices the failure probability will
2
be amplified to Θ( nn4 ) = Θ( n12 ). Therefore the L1 embedding satisfies the
distance contraction for any pair of nodes with high probability.
90 CHAPTER 6. L1 METRIC EMBEDDING & SPARSEST CUT
Chapter 7
Oblivious Routing,
Cut-Preserving Tree
Embedding, and Balanced Cut
91
92 CHAPTER 7. CUT-PRESERVING TREE EMBEDDING
In this case, we can achieve oblivious routing since for all vertices u, v ∈ V
there is a unique shortest path between them, and all other paths from u to
v must cover this path as well. The solution is therefore to send all of the
demand duv along this path, as outlined below.
v
7.2. OBLIVIOUS ROUTING VIA TREES 93
General Graphs, routing via one tree Our hope is to use a similar idea
for any arbitrary graph. Consider the following graph G.
One strategy would be to pick a spanning tree of G and require that all de-
mands are routed through this tree. For example take the following spanning
tree T ⊆ G.
Note that if we pick an edge {x, y} from the edge set of T then removing
this edge from T disconnects the tree into two connected components. Let
S(x, y) denote the vertices which are in the same connected component as x,
as in the following figure.
S(x, y)
Now any demand duv that has exactly one endpoint in S(x, y) will be routed
through {x, y} in our spanning tree. These are the demands that have to go
through cut(S(x, y), V \ S(x, y)). Since we are routing only along the chosen
tree, all these demands have to traverse through the edge {x, y} in the tree.
We therefore define
X
D(x, y) = duv ,
u∈S(x,y),v∈V \S(x,y)
94 CHAPTER 7. CUT-PRESERVING TREE EMBEDDING
as the amount that will be passed through edge {x, y} in our scheme. Hence,
in our routing, the congestion on an edge e = {x, y} ∈ T is exactly D(x,y)
cxy
.
On the other hand, in any routing scheme on G, this demand D(x, y) has
to be sent through edges in cut(S(x, y), V \ S(x, y)). We can therefore lower
bound the optimum congestion in any scheme by
D(x, y)
OP T ≥ ,
C(x, y)
where X
C(x, y) = ce .
e∈cut(S(x,y),V \S(x,y))
Thus, if we had the following edge condition, then our routing would be
α-competitive.
Claim 7.3. The edge condition is necessary for the routing scheme on the
tree to be α-competitive in congestion.
Proof. We wish to show that if our scheme is α-competitive, then the edge
condition holds. Note that if our scheme is α-competitive, then it must be
so for any possible demand. Consider the case in which the demands which
we want to send are equal to the capacities, i.e. duv = cuv for all {u, v} ∈ G.
This problem can be routed optimally with congestion 1, by sending each
demand completely along its corresponding edge. For some edge {u, v} ∈ T ,
our scheme would try to send D(u, v), which in this case is C(u, v), along
the edge. Therefore the congestion for all edges {u, v} ∈ T is C(u,v)
cuv
. So if
C(u,v)
our scheme is α-competitive, we must have that cuv ≤ α.
1. Instead of routing along one tree, we route along many trees, namely a
collection T = {T1P, T2 , ..., Tk } where each tree Ti has probability λi of
being chosen. So ki=1 λi = 1 and with probability λi we pick Ti and
route through it.
Pi (x, y)
y
Claim 7.5. The update edge condition is necessary for the routing scheme
to be α-competitive in congestion.
Proof. As in the proof of claim 7.3, we consider the case in which all demands
duv are equal to the capacities cuv . The optimal congestion in this case is 1.
We now consider the congestion achieved by our scheme. Consider an edge
{u, v} ∈ G. Suppose our scheme chooses tree Ti out of the collection of trees
T . Note that for every edge {x, y} ∈ Ti , if the path Pi (x, y) goes through
{u, v} then our scheme will send Di (x, y), which in this case is Ci (x, y),
through {u, v}. So the demand routed through {u, v} in Ti will be
X
Ci (x, y).
{x,y}∈Ti
s.t. {u,v}∈Pi (x,y)
96 CHAPTER 7. CUT-PRESERVING TREE EMBEDDING
But each tree Ti is chosen with probability λi out of our collection of trees T .
Therefore we can write the expected demand which will be routed through
{u, v} as X X
λi Ci (x, y).
i {x,y}∈Ti
s.t. {u,v}∈Pi (x,y)
Next, we will show that this updated tree condition is also sufficient for
achieving α-competitiveness, i.e. that having inequality 7.4 satisfied is enough
to imply our routing scheme to be α-competitive in congestion.
Claim 7.6. The updated edge condition is sufficient for the routing scheme
being α-competitive in congestion.
Proof. Assume the updated edge condition is satisfied, i.e. inequality 7.4
holds. Let’s consider an arbitrary set of demands and see how they must
be routed through our collection T of trees T1 , . . . , Tk as compared to the
optimal routing in G.
It is clear that the amount routed through some tree edge {x, y} ∈ Ti
must in G be routed through any edge {u, v} that lies on the fixed path
corresponding to {x, y}, i.e. through any {u, v} ∈ Pi (x, y).
If a tree Ti is chosen, we decide to route Di (x, y) amount of flow through
{u, v} ∈ Pi (x, y). With this strategy, we end up sending a total expected
amount of X X
λi Di (x, y)
i {x,y}∈Ti
s.t. {u,v}∈Pi (x,y)
through edge {u, v} ∈ G, where the expectation is taken over the possible
choices of a tree Ti . Together with our assumption of 7.4 being satisfied, we
can now upper bound the congestion on edge {u, v} by
X X
λi Di (x, y)
i {x,y}∈Ti
s.t. {u,v}∈Pi (x,y) (∗) Di (x, y) (∗∗)
X X ≤ α · max ≤ α · OP T
1
λ i C i (x, y) i Ci (x, y)
α
i {x,y}∈Ti
s.t. {u,v}∈Pi (x,y)
a1 +···+ak ai
Here, (∗) follows from b1 +···+bk
≤ maxi bi
for any a1 , . . . , ak , b1 , . . . , bk ∈ R
Di (x,y)
and (∗∗) is because, as seen before, maxi Ci (x,y) is a lower bound on OP T .
7.3. EXISTENCE OF THE TREE COLLECTION 97
LPOblivious Routing :
Dual-LPOblivious Routing
maximize z / (1)
X
subject to cuv luv ≤ 1 / (2)
{u,v}∈G
X X
z− Ci (x, y) · luv ≤ 0 ∀i ∈ [k] / (3)
{u,v}∈G {x,y}∈Ti
s.t. {u,v}∈Pi (x,y)
X X
z≤ Ci (x, y) luv ∀i ∈ [k].
{x,y}∈Ti {u,v}∈Pi (x,y)
| {z }
=: Ai
To arrive at the tightest condition on z, and thus also the tightest lower
bound on OP T , we are interested in the tree Ti that minimizes Ai under the
constraint that our distances satisfy condition (2). In the following, we show
that there exists such a tree Ti among our collection T with Ai ≤ O(log n).
From ETi [Ai ] = O(log n) it follows directly that one of the Ai ’s must be in
O(log n). Therefore, OP T of Dual-LPOblivious Routing is in O(log n) and
OP T of LPOblivious Routing is in O(log n).
Since any valid solution to LPOblivious Routing with objective value α cor-
responds to an α-congestion-competitive oblivious routing scheme, we have
shown that O(log n)-competitiveness can be achieved.
Notice that we have only given an existence proof for a satisfactory mix-
ture of trees, not a construction. Additionally, we have not shown yet how
many trees are necessary in order to achieve O(log n)-competitiveness. In
section 8.3, we show how to construct a polynomially large mixture of trees
in polynomial time.
100 CHAPTER 7. CUT-PRESERVING TREE EMBEDDING
2. Virtually compute the edge costs as the weight of the induced cut in Ti
X
costTi (x, y) = Ci (x, y) = ce
e∈(Si (x,y),V \Si (x,y))
4. Take the tree Xi∗ with minimal costs on graph G as the solution
Definition 7.9
P (Cut preserving tree embedding). We call a collection of
trees T with i λi = 1, λi ≥ 0 a cut preserving tree embedding if and only if
7.4. THE BALANCED CUT PROBLEM 101
Claim 7.10. The minimum balanced cut of a cut preserving tree embedding
yields an O(log(n)) approximation for the minimum balanced cut of the whole
graph.
The collection of trees we have used for the oblivious routing problem in
the previous section fulfills both properties of a cut preserving tree embed-
ding. We will prove this in lemmas 7.11 and 7.12.
Lemma 7.11. For every tree Ti in the collection of trees T it holds that
Proof. The key insight to why the inequality holds is that each edge in the
original cut (on the left) gets counted at least once on the right hand side.
Let’s look at an arbitrary edge u ∈ S, v ∈ V \ S over the cut (S, V \ S).
On the unique path from u to v in the tree Ti there exists at least one
102 CHAPTER 7. CUT-PRESERVING TREE EMBEDDING
P
Lemma 7.12. For every mixture of trees satisfying condition 7.4 with i λi =
1, λi ≥ 0 it holds that
X
λi CutTi (V, V \ S) ≤ O(log(n))CutG (S, V \ S)
i
X X X
Ci (x, y) ≤ Ci (x, y)
xy∈Ti uv∈(S,V \S) xy∈Ti
xy∈(S,V \S) uv∈Pi (x,y)
Now we can apply the definitions and combine them with the two attained
7.4. THE BALANCED CUT PROBLEM 103
Toy setting Consider a stock market with only a single stock. Every day,
we decide whether to buy the stock or not. At the end of the day, the stock
value will be revealed and we incur a mistake/loss of 1 if we did not buy
when the stock value rose, or bought when the stock value fell. Let σ be the
sequence of true outcomes. Furthermore, we denote the true outcome on day
j as σj .
105
106 CHAPTER 8. MULTIPLICATIVE WEIGHTS UPDATE (MWU)
Days/σ 1 1 0 0 1
e1 1 1 0 0 1
e2 1 0 0 0 1
e3 1 1 1 1 0
Warm up: Perfect expert exists As a warm up, suppose there exists a
perfect expert. Then the problem would be easy to solve: Do the following
on each day:
• Make a decision by taking the majority vote of the remaining experts.
• On each day:
∗
(1 − )m ≤ n(1 − )m Expert e∗ ’s weight is part of the overall weight
2
∗
⇒ m log(1 − ) ≤ log n + m log(1 − ) Taking log on both sides
2
1
⇒ m∗ (− − 2 ) ≤ log n + m(− ) Since −x − x2 ≤ log(1 − x) ≤ −x for x ∈ (0, )
2 2
2 log n
⇒ m ≤ 2(1 + )m∗ + Rearranging
∗ |σ|
(1 − )m ≤ n · Πj=1 (1 − · Fj ) Expert e∗ ’s weight is part of the overall weight
∗
P|σ|
⇒ (1 − )m ≤ n · e j=1 (−·Fj ) Since (1 − x) ≤ e−x
|σ|
m∗
X
−·E[m]
⇒ (1 − ) ≤n·e Since E[m] = Fj
j=1
Generalization
The above results can be generalized, in a straightforward manner: Denote
the loss of expert i on day t as mti ∈ [−ρ, ρ], for some constant ρ. When we
mt
incur a loss, update the weights of affected experts from wi to (1 − ρi )wi .
mti
Note that ρ
is essentially the normalized loss ∈ [−1, 1].
Remark If each expert has a different ρi , one can modify the update rule
and claim to use ρi instead of a uniform ρ accordingly.
Covering LP:
n
X
minimize ci x i
i=1
Xn
subject to aij xi ≥ bj ∀j ∈ {1, 2, . . . , m}
i=1
xi ≥ 0 ∀i ∈ {1, 2, . . . , n}
In the covering LP, xi can be viewed as the number of objects of type i that
is bought. The goal is to minimize the total cost of all bought objects such
that all covering constraints are satisfied. For the set cover we assumed all a
and b to be 1 (it is enough if any set covers an object).
n
X
ci xti = K, and
i=1
m
X n
X m
X
ptj aij xti ptj · bj
≥
j=1 i=1 j=1
This is a much simpler problem with just two constraints. If the aforemen-
tioned feasibility problem has a YES answer—i.e. if it is feasible—then the
same solution satisfies the above inequalities. We can easily find a solution
xt = (xt1 , xt2 , . . . , xtn ) that satisfies these two constraints (assuming the feasi-
bility of the original problem for objective value K). We can maximize the
left hand side of the inequality, subject to the equality constraint, using a
greedy approach. We’ll find the index P t
i with the best value/cost ratio, i.e.,
p aij
the one which maximizes the ratio j cij . Intuitively, the numerator can
be seen as gain and the denominator as cost per unit. Then, we set variable
xti = K/ci to satisfy the equality constraint, while we set all other variables
as xti0 = 0.
Weights update Once we have found such a solution xt = (xt1 , xt2 , . . . , xtn ),
we have to see how it fits the original feasibility problem. If this xt already
satisfies the original feasibility problem, we are done. More interestingly,
suppose that this is not the case; then we need to update the weights of con-
straints. Intuitively, we want to increase the importance of the constraints
that are violated by this solution xt and decrease the importance of those
that are satisfied. This should also take into account how strongly the con-
straint Pis violated/satisfied. Concretely, for the j th constraint, we define
mtj = ( ni=1 aij xti ) − bj . Notice that mtj is positive if we fulfill the inequal-
ity and negative if not. Then, we update the weight of this constraint as
mt
wj ← wj (1 − ε ρj ). Here, ε is a small positive constant; we will discuss later
how to set is so that we get the aforementioned approximation with additive
112 CHAPTER 8. MULTIPLICATIVE WEIGHTS UPDATE (MWU)
Note that ρ does not depend on the current iteration t and can be computed
using only the problem input.
Now that we have updated the weights we can start the next iteration, up-
dating the ptj first and then using the same greedy approach again.
X n
X
ptj aij xti = pt mt ≥ 0
− bj
j i=1
Hence, we conclude that the left hand side of 8.1 is non-negative, i.e., t pt mt ≥
P
0. Therefore, so is the right hand side. In particular, for any j, we have
X X ρ log n
0≤ mtj + |mtj | +
t t
8.2. APPROXIMATING COVERING/PACKING LPS VIA MWU 113
X X ρ log m
0≤ mtj + |mtj | +
t t
X ρ log m
⇒ 0≤ mtj + T ρ + ρ ≥ mtj for all j
t
n
X X
t ρ log m
by definition of mtj
⇒ 0≤ aij xi − bj + T ρ +
t
i=1
Pn t
i=1 a ij x i − bj
X ρ log m
⇒ 0≤ + ρ + divide by T
t
T ·T
δ 10ρ2 log m
Now, since ε = 4ρ
and T = δ2
, we see that:
Pn t
i=1 aij xi − bj
X δ 4ρ2 log m
0≤ + + plug in
t
T 4 δ·T
Pn t
i=1 aij xi − bj
X δ 2δ
⇒ 0≤ + + plug in T
t
T 4 5
Pn t
X i=1 aij xi − bj
⇒ 0≤ +δ add and round δ
t
T
X bj X Pn aij xt
i=1 i
⇒ −δ ≤ split up sum and move to LHS
T
T t
T
n
X LHS: sum is independent of t
⇒ bj − δ ≤ aij x̄i
i=1
RHS: defintion of x̄
That is, the output x̄ = (x̄1 , x̄2 , . . . , x̄m ) satisfies the j th inequality up to
additive error δ. This is proved for all constraints.
Note that if the LP is not feasible, it will be detected during one of the
iterations of the algorithm when no greedy solution will be found.
114 CHAPTER 8. MULTIPLICATIVE WEIGHTS UPDATE (MWU)
X X
λi Ci (x, y) ≤ O(log n)cuv
i {x,y}∈Ti
s.t. {u,v}∈Pi (x,y)
Recall that in the above, for each tree Ti and each tree edge {x, y} ∈ Ti ,
we use Pi (x, y) to denote the path that corresponds to the virtual edge {x, y}
and connects x and y in G. Moreover, we have
X
Ci (x, y) = ce
e∈cut(S(x,y),V \S(x,y))
where S(x, y) is one side of the cut that results from removing the edge {x, y}
from the tree Ti . For convenience, let us define
P
{x,y}∈Ti Ci (x, y)
s.t. {u,v}∈Pi (x,y)
loadi (u, v) =
cuv
as the relative load that the ith tree in the collection places on the edge {u, v}.
Thus, our task is to find a collection so that
X
λi · loadi (u, v) ≤ O(log n)
i
Construction Plan We start with an empty collection and add trees to the
collection. In iteration j, we add a new tree Tj withPa coefficient λj ∈ (0, 1].
jend
The construction ends once we find jend suchP that i=1 λi ≥ 1. During the
construction, we think of each constraint i λi · loadi (u, v) ≤ O(log n) as
one of our experts. We have one constraint for each edge {u, v} ∈ G. To
8.3. CONSTRUCTIVE OBLIVIOUS ROUTING VIA MWU 115
The initial potential is equal to the number of the edges of the graph. Thus,
Φ0 < n2 . When we add the tree Tj with coefficient λj in iteration j of the
construction, we want:
P Pj
Φj {u,v}∈G exp i=1 λi · loadi (u, v)
=P Pj−1 ≤ exp(λj · O(log n))
Φj−1 {u,v}∈G exp i=1 λ i · load i (u, v)
2
P
P Φjend ≤ n exp( i λi · O(log 2n)). Note that at this point we
This ensures
have 1 ≤ i λi ≤ 2 and thus Φjend ≤ n exp(O(log n)) = exp(O(log n)).
Monotonicity of the logarithm then implies for each edge {u, v} ∈ G:
jend
X
λi · loadi (u, v) ≤ O(log n)
i=1
It remains to show how we find a tree in each iteration and to bound the
number of iterations.
We choose λj small enough such that for each edge {u, v} ∈ G, we have
λj · loadj (u, v) ≤ 1. Observing this, we can make use of the inequality
ez ≤ 1 + 2z, for all z ≤ [0, 1]. We can now upper bound
116 CHAPTER 8. MULTIPLICATIVE WEIGHTS UPDATE (MWU)
P Pj−1
Φj {u,v}∈G i=1 exp
λ i · loadi (u, v) · exp(λ j · load j (u, v)) − 1
=1+ Pj−1
Φj−1
P
{u,v}∈G exp i=1 λ i · load i (u, v)
P Pj−1
{u,v}∈G exp i=1 λ i · load i (u, v) · 2λ i · load j (u, v)
≤1+ P Pj−1
{u,v}∈G exp i=1 λ i · load i (u, v)
P Pj−1
{u,v}∈G exp λi · loadi (u, v) · 2 · loadj (u, v)
i=1
P Pj−1 ≤ O(log n)
{u,v}∈G exp i=1 λi · loadi (u, v)
Φj
This then gives Φj−1
≤ exp(λj · O(log n)), with the use of the inequality
1 + y ≤ ey .
Pj−1
exp λi · loadi (u, v)
i=1
`(u, v) = P Pj−1
cuv · {u0 ,v0 }∈G exp λ · load (u 0, v0)
i=1 i i
After inserting the definition of the relative load of an edge, the condition on
tree Tj from above translates to:
X X
`(u, v) · Ci (x, y) ≤ O(log n)
{u,v}∈G {x,y}∈Ti
s.t. {u,v}∈Pi (x,y)
X X
ET ∈T `(u, v) · C(x, y)
{u,v}∈G {x,y}∈T
s.t. {u,v}∈P (x,y)
X X X
= ET ∈T `(u, v) · ce ,
{u,v}∈G {x,y}∈T e∈cut(S(x,y),V \S(x,y))
s.t. {u,v}∈P (x,y)
X X X
= ET ∈T ce `(u, v)
e {x,y}∈T {u,v}∈P (x,y)
s.t. e∈cut(S(x,y),V \S(x,y))
X X
= ET ∈T ce · distT (e) = ce · ET ∈T [distT (e)]
e e
X X
≤ ce · O(log n) · `(e) = O(log n) · ce · `(e)
e e
Pj−1
X exp
i=1 λi · load i (e)
= O(log n) · ce · P Pj−1
e c e · {u,v}∈G exp i=1 λi · loadi (u, v)
Pj−1
exp i=1 λi · loadi (e)
X
= O(log n) · P Pj−1
e {u,v}∈G exp i=1 λi · loadi (u, v)
= O(log n)
The equations above show that taking one probabilistic tree from the dis-
tribution satisfies the inequality in expectation. Hence, thanks to Markov’s
inequality and by increasing the left hand side by a constant factor, we know
that the inequality is satisfied with probability at least 1/2. Therefore, if we
run the algorithm for O(log n) independent repetitions, with high probability,
we find one tree that satisfies the inequality. This is the tree Tj that we add to
our collection. Moreover, we set λj such that max{u,v}∈G λj · loadj (u, v) = 1.
v1 v4
13 10
v3 20
v2 11 8 v5
Upon seeing σ(1) = hv1 , v4 , 5i, in an online algorithm (red edges) we commit
to P1 = v1 – v3 – v4 as it minimizes the congestion to 5/10. When σ(2) =
hv5 , v2 , 8i appears, P2 = v5 – v3 – v2 minimizes the congestion given that we
committed to P1 . This causes the congestion to be 8/8 = 1. On the other
hand, the optimal offline algorithm (blue edges) can attain a congestion of
8/10 via P1 = v1 – v3 – v5 – v4 and P2 = v5 – v4 – v3 – v2 .
v1 v4 v1 v4 v1 v4
5/13 5/10 5/13 5/10 5/13 8/10
v3 0/21 v3 0/21 v3 13/21
• le∗ (j) is the optimal offline algorithm’s relative load of edge e after re-
quest j.
In other words, the objective is to minimize maxe∈E le (|σ|) for a given se-
quence σ. Denoting Λ as the (unknown) optimal congestion factor, we nor-
∗
malize pee (i) = peΛ(i) , e le∗ (j) = leΛ(j) . Let a be a constant to be
le (j) = leΛ(j) , and e
determined. Consider the algorithm A which does the following on request
i + 1:
• Denote the cost of edge e by ce = ale (i)+epe (i+1) − ale (i) .
e e
stant γ ≥ 2. Because e∗ e∗
P of normalization, le (j) ≤ 1, so γ − le (j) ≥ 1. Initially,
we have Φ(0) = e∈E γ = mγ.
1 x
Lemma 8.9. For γ ≥ 1 and 0 ≤ x ≤ 1, then (1 + 2γ
) < 1 + γx .
1 x x x
Proof. By Taylor series2 , (1 + 2γ
) =1+ 2γ
+ O( 2γ ) < 1 + γx .
1
Lemma 8.10. For a = 1 + 2γ
and γ ≥ 1, then Φ(j + 1) − Φ(j) ≤ 0.
∗
Proof. Let Pj+1 be the path that algorithm A found and let Pj+1 be the
th
path that the optimal offline algorithm assigned to the (j + 1) request
hs(j + 1), t(j + 1), d(j + 1)i. For any edge e, observe the following:
• If e 6∈ Pj+1
∗
, the load on e caused by the optimal offline algorithm
remains unchanged. That is e le∗ (j + 1) = ele∗ (j). On the other hand, if
∗
e ∈ Pj+1 le∗ (j + 1) = e
, then e le∗ (j) + pee (j + 1).
Using the observations above together with Lemma 8.9 and the fact that A
computes a shortest path, one can show that Φ(j + 1) − Φ(j) ≤ 0. In detail:
Φ(j + 1) − Φ(j)
X e
= le∗ (j + 1)) − ale (j) (γ − e
ale (j+1) (γ − e le∗ (j))
e
e∈E
X
= le∗ (j))
(ale (j+1) − ale (j) )(γ − e (1)
e e
∗
e∈Pj+1 \Pj+1
X
+ le∗ (j) − pee (j + 1)) − ale (j) (γ − e
ale (j+1) (γ − e le∗ (j))
e e
∗
e∈Pj+1
X X
= (ale (j+1) − ale (j) )(γ − lee∗ (j)) − ale (j+1) pee (j + 1)
e e e
e∈Pj+1 ∗
e∈Pj+1
X X
≤ (ale (j+1) − ale (j) )γ − ale (j+1) pee (j + 1) (2)
e e e
e∈Pj+1 ∗
e∈Pj+1
X X
≤ (ale (j+1) − ale (j) )γ − ale (j) pee (j + 1) (3)
e e e
e∈Pj+1 ∗
e∈Pj+1
X X
= (ale (j)+epe (j+1) − ale (j) )γ − ale (j) pee (j + 1) (4)
e e e
e∈Pj+1 ∗
e∈Pj+1
X X
= ce γ − ale (j) pee (j + 1) (5)
e
e∈Pj+1 ∗
e∈Pj+1
X
≤ ce γ − ale (j) pee (j + 1) (6)
e
∗
e∈Pj+1
X
= (ale (j)+epe (j+1) − ale (j) )γ − ale (j) pee (j + 1)
e e e
∗
e∈Pj+1
X
= ale (j) (apee (j+1) − 1)γ − pee (j + 1)
e
∗
e∈Pj+1
X 1 pee (j+1)
ale (j)
= (1 + ) − 1 γ − pee (j + 1) (7)
e
∗
e∈Pj+1
2γ
≤ 0 (8)
8.4. OTHER APPLICATIONS: ONLINE ROUTING OF VIRTUAL CIRCUITS121
le∗ (j) ≥ 0
(2) e
le (j + 1) ≥ e
(3) e le (j) and a > 1
123
Chapter 9
Thus far, we have been ensuring that our algorithms run fast. What if our
system does not have sufficient memory to store all data to post-process it?
For example, a router has relatively small amount of memory while tremen-
dous amount of routing data flows through it. In a memory constrained set-
ting, can one compute something meaningful, possible approximately, with
limited amount of memory?
More formally, we now look at a slightly different class of algorithms
where data elements from [n] = {1, . . . , n} arrive in one at a time, in a stream
S = a1 , . . . , am , where ai ∈ [n] arrives in the ith time step. At each step, our
algorithm performs some computation1 and discards the item ai . At the end
of the stream2 , the algorithm should give us a value that approximates some
value of interest.
125
126CHAPTER 9. BASICS AND WARM UP WITH MAJORITY ELEMENT
Algorithm 19 Robust(A, I, , δ)
C←∅ . Initialize candidate outputs
for k = O(log 1δ ) times do
sum ← 0
for j = O( 12 ) times do
sum ← sum + A(I)
end for
Add sum
j
to candidates C . Include new sample of mean
end for
return Median of C . Return median
129
130 CHAPTER 10. ESTIMATING THE MOMENTS OF A STREAM
m
X
E[2Xm+1 ] = E[2Xm+1 |Xm = j] Pr[Xm = j] Condition on Xm
j=1
Xm
= (2j+1 · 2−j + 2j · (1 − 2−j )) · Pr[Xm = j] Increment x w.p. 2−j
j=1
Xm
= (2j + 1) · Pr[Xm = j] Simplifying
j=1
Xm m
X
= 2j · Pr[Xm = j] + Pr[Xm = j] Splitting the sum
j=1 j=1
m
X
= E[2Xm ] + Pr[Xm = j] Definition of E[2Xm ]
j=1
m
X
= E[2Xm ] + 1 Pr[Xm = j] = 1
i=1
= (m + 1) + 1 Induction hypothesis
=m+2
Proof. Exercise.
m2
Claim 10.3. Var(2Xm − 1) = E[(2Xm − 1 − m)2 ] ≤ 2
Proof.
Xm Var(2Xm − 1)
Pr[|(2 − 1) − m| > m] ≤ Chebyshev’s inequality
(m)2
m2 /2
≤ 2 2 By Claim 10.3
m
1
= 2
2
Remark Using the discussion in Section 9.1, we can run Morris multiple
times to obtain a (1 ± )-approximation of the first moment of a stream that
succeeds with probability > 1 − δ. For instance, repeating Morris 10 2
times
1
and reporting the mean m, b Pr[|mb − m| > m] ≤ 20 because the variance is
2
reduced by 10 .
Since we are randomly hashing elements into the range [0, 1], we expect
1 1
the minimum hash output to be D+1 , so E[ z1 − 1] = D. Unfortunately,
storing a uniformly random hash function that maps to the interval [0, 1] is
infeasible. As storing real numbers is memory intensive, one possible fix is to
discretize the interval [0, 1], using O(log n) bits per hash output. However,
storing this hash function would still require O(n log n) space.
1
See https://en.wikipedia.org/wiki/Order_statistic
132 CHAPTER 10. ESTIMATING THE MOMENTS OF A STREAM
and Xr = m
P
i=1 Xi,r = |{ai ∈ S : zeros(h(ai )) ≥ r}|. Notice that Xn ≤ Xn−1 ≤ · · · ≤ X1
since zeros(h(ai )) ≥ r + 1 ⇒ zeros(h(ai )) ≥ r. Now,
m
X m
X
E[Xr ] = E[ Xi,r ] Since Xr = Xi,r
i=1 i=1
m
X
= E[Xi,r ] By linearity of expectation
i=1
m
X
= Pr[Xi,r = 1] Since Xi,r are indicator variables
i=1
m
X 1
= h is a uniform hash
i=1
2r
D
= r Since h hashes same elements to the same value
2
√
Denote τ1 as the smallest integer
√ such that 2τ1 · 2 > 3D, and τ2 as the
largest integer such that 2τ2 · 2 < D3 . We see that if τ1 < Z < τ2 , then
√
2Z · 2 is a 3-approximation of D.
r τ2 τ1 0
τ2 + 1 log2 ( √D2 )
√ √
• If Z ≥ τ1 , then 2Z · 2 ≥ 2τ1 · 2 > 3D
√ √
• If Z ≤ τ2 , then 2Z · 2 ≤ 2τ2 · 2< D
3
10.2. ESTIMATING THE ZEROTH MOMENT OF A STREAM 135
D √ √
Pr[( > 2Z · 2) or (2Z · 2 > 3D)]
3
D √ √
≤ Pr[ ≥ 2Z · 2] + Pr[2Z · 2 ≥ 3D] By union bound
√3
2 2
≤ From above
3 √
2 2
=1−C For C = 1 − >0
3
probability C > 0.5 (for instance, after t ≥ 17 repetitions), one can then call
the routine k times independently and return the median (Recall Trick 2).
While Tricks 1 and 2 allows us to strength the success probability C, more
work needs to be done to improve the approximation factor from 3 to (1 + ).
To do this, we look at a slight modification of FM, due to [BYJK+ 02].
If tN
Z
> (1 + )D, then (1+)DtN
> Z = tth smallest hash value, implying
tN
that there are ≥ t hashes smaller than (1+)D . Since the hash uniformly
distributes [n] over [N ], for each element ai ,
tN
tN (1+)D t
Pr[h(ai ) ≤ ]= =
(1 + )D N (1 + )D
tN
Pr[ > (1 + )D] ≤ Pr[X ≥ t] Since the former implies the latter
Z
= Pr[X − E[X] ≥ t − E[X]] Subtracting E[X] from both sides
t
≤ Pr[X − E[X] ≥ t] Since E[X] = ≤ (1 − )t
2 (1 + ) 2
≤ Pr[|X − E[X]| ≥ t] Adding absolute sign
2
Var(X)
≤ By Chebyshev’s inequality
(t/2)2
E[X]
≤ Since Var(X) ≤ E[X]
(t/2)2
4(1 − /2)t t
≤ Since E[X] = ≤ (1 − )t
2 t2 (1 + ) 2
4 c
≤ Simplifying with t = 2 and (1 − ) < 1
c 2
Similarly, if tN
Z
tN
< (1 − )D, then (1−)D < Z = tth smallest hash value,
tN
implying that there are < t hashes smaller than (1−)D . Since the hash
uniformly distributes [n] over [N ], for each element ai ,
tN
tN (1−)D t
Pr[h(ai ) ≤ ]= =
(1 − )D N (1 − )D
and Y = D tN
P
i=1 Yi is the number of hashes that are smaller than (1−)D . From
t t
above, Pr[Yi = 1] = (1−)D . By linearity of expectation, E[Y ] = (1−) . Then,
by Lemma 10.7, Var(Y ) ≤ E[Y ]. Now,
138 CHAPTER 10. ESTIMATING THE MOMENTS OF A STREAM
tN
Pr[ < (1 − )D]
Z
≤ Pr[Y ≤ t] Since the former implies the latter
= Pr[Y − E[Y ] ≤ t − E[Y ]] Subtracting E[Y ] from both sides
t
≤ Pr[Y − E[Y ] ≤ −t] Since E[Y ] = ≥ (1 + )t
(1 − )
≤ Pr[−(Y − E[Y ]) ≥ t] Swap sides
≤ Pr[|Y − E[Y ]| ≥ t] Adding absolute sign
Var(Y )
≤ By Chebyshev’s inequality
(t)2
E[Y ]
≤ Since Var(Y ) ≤ E[Y ]
(t)2
(1 + 2)t t
≤ Since E[Y ] = ≤ (1 + 2)t
2 t2 (1 − )
3 c
≤ Simplifying with t = 2 and (1 + 2) < 3
c
Putting together,
tN tN tN
Pr[| − D| > D]] ≤ Pr[ > (1 + )D]] + Pr[ < (1 − )D]] By union bound
Z Z Z
≤ 4/c + 3/c From above
≤ 7/c Simplifying
≤ 1/4 For c ≥ 28
10.3.1 k=2
For each element i ∈ [n], we associate a random variable ri ∈u.a.r. {−1, +1}.
Lemma 10.10. In AMS-2, Pn if random variables {ri }i∈[n] are pairwise in-
2 2
dependent, then E[Z ] = i=1 fi = F2 . That is, AMS-2 is an unbiased
estimator for the second moment.
Proof.
Xn n
X
2
E[Z ] = E[( ri fi )2 ] Since Z = ri fi at the end
i=1 i=1
Xn X Xn
= E[ ri2 fi2 + 2 ri rj fi fj ] Expanding ( ri f i ) 2
i=1 1≤i<j≤n i=1
n
X X
= E[ri2 fi2 ] + 2 E[ri rj fi fj ] Linearity of expectation
i=1 1≤i<j≤n
Xn X
= E[ri2 ]fi2 + 2 E[ri rj ]fi fj fi ’s are (unknown) constants
i=1 1≤i<j≤n
Xn X
= fi2 + 2 E[ri rj ]fi fj Since ri2 = 1, ∀i ∈ [n]
i=1 1≤i<j≤n
Xn X
= fi2 + 2 E[ri ]E[rj ]fi fj Since {ri }i∈[n] are pairwise independent
i=1 1≤i<j≤n
n
X
= fi2 Since E[ri ] = 0, ∀i ∈ [n]
i=1
n
X
= F2 Since F2 = fi2
i=1
140 CHAPTER 10. ESTIMATING THE MOMENTS OF A STREAM
So we have an unbiased estimator for the second moment but we are also
interested in the probability of error. We want a small probability for the
output Z 2 to deviate more than (1 + ) from the true value, i.e.,
Pr[|Z 2 − F2 | > F2 ] should be small.
Lemma 10.11. In AMS-2, if random variables {ri }i∈[n] are 4-wise inde-
pendent2 , then Var[Z 2 ] ≤ 2(E[Z 2 ])2 .
Proof. As before, E[ri ] = 0 and ri2 = 1 for all i ∈ [n]. By 4-wise independence,
the expectation of any product of at most 4 different ri ’s is the product of
their expectations. Thus we get E[ri rj rk rl ] = E[ri ]E[rj ]E[rk ]E[rl ] = 0, as well
as E(ri3 rj ) = E(ri rj ) = 0 and E(ri2 rj rk ) = E(rj rk ) = 0, where the indices
i, j, k, l are pairwise different. This allows us to compute E[Z 4 ]:
Xn n
X
4
E[Z ] = E[( ri fi )4 ] Since Z = ri fi at the end
i=1 i=1
n
X X
= E[ri4 ]fi4 + 6 E[ri2 rj2 ]fi2 fj2 L.o.E. and 4-wise independence
i=1 1≤i<j≤n
Xn X
= fi4 + 6 fi2 fj2 Since ri4 = ri2 = 1, ∀i ∈ [n] .
i=1 1≤i<j≤n
4
Note that the coefficient of 1≤i<j≤n E[ri2 rj2 ]fi2 fj2 is
P
2
= 6 and that all
other terms vanish by the computation above.
Proof.
We can again apply the mean trick to decrease the variance by a factor
of k and have a smaller upper bound on the probability of error.
In particular, if we pick k = 10
2
repetitions of ASM-2 and output the mean
value of the output Z 2 we have :
Var[Z 2 ] k1 1 2 1
Pr[error] ≤ 2 2
≤ · 2 =
(E[Z ]) k 5
Claim 10.13. O(k log n) bits of randomness suffices to obtain a set of k-wise
independent random variables.
k−1
X
{hak−1 ,ak−2 ,...,a1 ,a0 : h(x) = ai x i mod p
i=1
= ak−1 xk−1 + ak−2 xk−2 + · · · + a1 x + a0 mod p,
∀ak−1 , ak−2 , . . . , a1 , a0 ∈ Zp }
This requires k random coefficients, which can be stored with O(k log n)
bits.
3
See https://en.wikipedia.org/wiki/K-independent_hashing
142 CHAPTER 10. ESTIMATING THE MOMENTS OF A STREAM
Observe that the above analysis only require {ri }i∈[n] to be 4-wise in-
dependent. Claim 10.13 implies that AMS-2 only needs O(4 log n) bits to
represent {ri }i∈[n] .
Although the failure probability 22 is large for small , one can repeat t
times and output the mean (Recall Trick 1). With t ∈ O( 12 ) samples, the
failure probability drops to t22 ∈ O(1). When the failure probability is less
than 12 , one can then call the routine k times independently, and return the
median (Recall Trick 2). On the whole, for any given > 0 and δ > 0,
O( log(n) log(1/δ)
2 ) space suffices to yield a (1 ± )-approximation algorithm that
succeeds with probability > 1 − δ.
10.3.2 General k
E[Z | aJ = i]
1 1 1
= [m(fik − (fi − 1)k )] + [m((fi − 1)k − (fi − 2)k )] + · · · + [m(1k − 0k )]
fi fi fi
m k
= [(fi − (fi − 1)k ) + ((fi − 1)k − (fi − 2)k ) + · · · + (1k − 0k )]
fi
m k
= fi .
fi
Thus,
n
X
E[Z] = E[Z | aJ = i] · Pr[aJ = i] Condition on the choice of J
i=1
n
X fi
= E[Z | aJ = i] · Since choice of J is uniform at random
i=1
m
n
X m fi
= fik · From above
i=1
fi m
Xn
= fik Simplifying
i=1
n
X
= Fk Since Fk = fik .
i=1
Pn
Proof. Let M = maxi∈[n] fi , then fi ≤ M for any i ∈ [n] and M k ≤ i=1 fik .
Hence,
144 CHAPTER 10. ESTIMATING THE MOMENTS OF A STREAM
Xn Xn Xn n
X
( fi )( fi2k−1 ) ≤ ( fi )(M k−1 fik ) Since fi2k−1 ≤ M k−1 fik
i=1 i=1 i=1 i=1
Xn Xn n
X n
X
k (k−1)/k
≤( fi )( fi ) ( fik ) Since M ≤k
fik
i=1 i=1 i=1 i=1
Xn Xn
=( fi )( fik )(2k−1)/k Merging the last two terms
i=1 i=1
n
X Xn n
X n
X
≤ n1−1/k ( fik )1/k ( fik )(2k−1)/k Fact: ( fi )/n ≤ ( fik /n)1/k
i=1 i=1 i=1 i=1
Xn
1−1/k
=n ( fik )2 Merging the last two terms .
i=1
1
Remark f1 = n k , f2 = · · · = fn = 1 is a tight example for Lemma 10.15,
up to a constant factor.
1
Theorem 10.16. In AMS-k, Var(Z) ≤ kn1− k (E[Z])2 .
m k
E[Z 2 ] = [ (1 − 0k )2 + (2k − 1k )2 + · · · + (f1k − (f1 − 1)k )2 (1)
m
+ (1k − 0k )2 + (2k − 1k )2 + · · · + (f2k − (f2 − 1)k )2
+ ...
+ (1k − 0k )2 + (2k − 1k )2 + · · · + (fnk − (fn − 1)k )2 ]
≤ m[ k1k−1 (1k − 0k ) + k2k−1 (2k − 1k ) + · · · + kf1k−1 (f1k − (f1 − 1)k ) (2)
+ k1k−1 (1k − 0k ) + k2k−1 (2k − 1k ) + · · · + kf2k−1 (f2k − (f2 − 1)k )
+ ...
+ k1k−1 (1k − 0k ) + k2k−1 (2k − 1k ) + · · · + kfnk−1 (fnk − (fn − 1)k )]
≤ m[kf12k−1 + kf22k−1 + · · · + kfn2k−1 ] (3)
= kmF2k−1 (4)
= kF1 F2k−1 (5)
(5) F1 = ni=1 fi = m
P
Then,
Remark Proofs for Lemma 10.15 and Theorem 10.16 were omitted in class.
The above proofs are presented in a style consistent with the rest of the scribe
notes. Interested readers can refer to [AMS96] for details.
Remark One can apply an analysis similar to the case when k = 2, then
use Tricks 1 and 2.
e 1− k2 ) is known.
Claim 10.17. For k > 2, a lower bound of Θ(n
Proof. Theorem 3.1 in [BYJKS04] gives the lower bound. See [IW05] for
algorithm that achieves it.
146 CHAPTER 10. ESTIMATING THE MOMENTS OF A STREAM
Chapter 11
Graph sketching
147
148 CHAPTER 11. GRAPH SKETCHING
Thus XORu represents the bit-wise XOR of the identifiers of all edges
that are adjacent to u.
• All vertices send the coordinator their value XORu and the coordinator
computes
XORA = ⊕{XORu : u ∈ A}
3
In reality, the algorithm simulates all the vertices’ actions so it is not a real multi-party
computation setup.
11.1. FINDING THE SINGLE CUT EDGE 149
S = {h{v1 , v2 }, +i, h{v2 , v3 }, +i, h{v1 , v3 }, +i, h{v4 , v5 }, +i, h{v2 , v5 }, +i, h{v1 , v2 }, −i}
and we query for the cut edge {v2 , v5 } with A = {v1 , v2 , v3 } at t = |S|. The
figure below shows the graph G6 when t = 6:
v1 v4
v2
v3 v5
Vertex v1 sees {h{v1 , v2 }, +i, h{v1 , v3 }, +i, and h{v1 , v2 }, −i}. So,
Proof. For any edge e = (a, b) such that a, b ∈ A, id(e) contributes to both
XORa and XORb . So, XORa ⊕ XORb will cancel out the contribution
of id(e) because id(e) ⊕ id(e) = 0. Hence, the only remaining value in
XORA = ⊕{XORu : u ∈ A} will be the identifier of the cut edge since only
one of its endpoints lies in A.
Remark Bit tricks are often used in the random linear network coding
literature (e.g. [HMK+ 06]).
2
using O(log n) bits of memory, with high probability?
Pr[|E 0 | = 1]
= k · Pr[Cut edge {u, v} is marked; others are not]
= k · (1/b k))k−1
k)(1 − (1/b Edges marked ind. w.p. 1/bk
k
b
≥ (bk/2)(1/b k))k
k)(1 − (1/b Since ≤ k ≤ b k
b
2
1 1
≥ · 4−1 Since 1 − x ≥ 4−x for x ≤
2 2
1
≥
10
11.2. FINDING ONE OUT OF K > 1 CUT EDGES 151
Remark The above analysis assumes that vertices can locally mark the
edges in a consistent manner (i.e. both endpoints of any edge make the same
decision whether to mark the edge or not). This can be achieved with a
sufficiently large string of randomness, shared across all the nodes.
Therefore, if we sample two or more edges from the cut, with high proba-
bility the XOR of their identifiers will be distinguishable from the identifier
of another edge, allowing us to determine whether |E 0 | = 1.
The probability of sampling a single edge across the cut is P r[|E 0 | = 1] ≥
1
10
.To amplify it we can perform t = C log(n) = O(log n) parallel repetitions,
each time sampling the edges independently and computing for every node
the XOR of the sampled edges. We succeed to find an edge across the cut
if at least one repetition succeeds, which happens with probability at least
9 C log n
≥ 1 − n110 , by setting the constant term C appropriately.
1 − 10
Overall, the size of the message that each node sends to the coordinator
is O(log2 n) bits.
152 CHAPTER 11. GRAPH SKETCHING
1
probability at least 1 − n10 .
For the values of b
k that are much bigger than k, the sampling probability
is such that in expectation no edges across the cut are sampled. Conversely,
if b
k is much smaller than k, more than one edge across the cut is expected to
be sampled, but with high probability the XOR of their identifiers will not
be a valid edge ID.
In total, there are dlog n2 e + 1 = O(log n) powers of 2 which are used
as an estimate for k, so overall each node sends an O(log3 n) bits message
to the coordinator: for each estimate b k, we perform O(log n) independent
samplings of edges (to amplify success probability) and every time each node
computes the XOR of the identifiers of the incident sampled edges, that has
size O(log n).
Since here we do not care about edge weights, the step of finding the
cheapest edge leaving each component amounts to finding one out of many
cut edges, which we solved in section 11.3.
Therefore, each node sends to the coordinator a message of size O(log4 n),
obtained by repeating log n times independently the message construction
outlined in section 11.3. The entire procedure is presented in ComputeS-
ketch.
Algorithm 26 ComputeSketch(v ∈ V )
for h = 1 to log n do . Iterations of Borůvka
for i ∈ {0, 1, . . . , dlog n2 e} do . dlog n2 e + 1 guesses of k
for t = C log n times do . Amplify success probability
1
Sample each edge w.p p = 2i
Send XOR of sampled edges incident to v
end for
end for
end for
When the coordinator receives all the messages from the nodes, it is able
to compute a maximal forest by simulating the steps of Borůvka’s algorithm:
• initially, every node is a single component of the graph;
• for each step h ∈ {1, . . . , log n}:
– use part h of the messages (of size O(log3 n)) to find, for every
component, a single edge connecting it to another component (if
there is one) with high probability;
– merge the newly formed components and exclude any edges form-
ing a cycle;
This scheme can also be implemented in the streaming setting, by main-
taining a message of size O(log4 n) for every node as described above, and
updating them after every edge insertion or deletion.ComputeSketches
and StreamingMaximalForest outline this procedure.
More precisely, every vertex in ComputeSketches maintains O(log3 n)
copies of edge XORs using random edge IDs and marking probabilities. In
order to have consistent edge IDs and marking probabilities among vertices,
we use a source of shared randomness R. Then, StreamingMaximalFor-
est simulates Borůvka using the output of ComputeSketches. In total,
this requires O(n log4 n) memory to compute a maximal forest of the graph.
154 CHAPTER 11. GRAPH SKETCHING
At each step, we fail to find one cut edge leaving a connected component
1 t
with probability ≤ (1 − 10 ) , which can be made to be in O( n110 ). Applying
union bound over all O(log3 n) computations of XORA , we see that
log3 n 1
Pr[Any XORA corresponds wrongly to some edge ID] ≤ O( 18 ) ⊆ O( 10 )
n n
So, StreamingMaximalForest succeeds with high probability.
Remark One can drop the memory constraint per vertex from O(log4 n)
to O(log3 n) by using a constant t instead of t ∈ O(log n) such that the
success probability is a constant larger than 1/2. Then, simulate Borůvka
for d2 log ne steps. See [AGM12] (Note that they use a slightly different
sketch).
Theorem 11.5. Any randomized distributed sketching protocol for computing
spanning forest with success probability > 0 must have expected average
sketch size Ω(log 3 n), for any constant > 0.
Proof. See [NY18].
156 CHAPTER 11. GRAPH SKETCHING
5
See https://en.wikipedia.org/wiki/Small-bias_sample_space
Part IV
Graph sparsification
157
Chapter 12
Preserving distances
Remark The first inequality is because G0 has less edges than G. The
second inequality upper bounds how much the distances “blow up” in the
sparser graph G0 .
159
160 CHAPTER 12. PRESERVING DISTANCES
Remark One way to prove the existence of an (α, β)-spanner is to use the
probabilistic method : Instead of giving an explicit construction, one designs
a random process and argues that the probability that the spanner existing
is strictly larger than 0. However, this may be somewhat unsatisfying as such
proofs do not usually yield a usable construction. On the other hand, the
randomized constructions shown later are explicit and will yield a spanner
with high probability 1 .
v k
• g(G00 ) ≥ g(G0 ) ≥ 2k + 1, since girth does not decrease with fewer edges.
n ≥ |V 00 | By construction
k
X
≥ |{v}| + |{u ∈ V 00 : dG00 (u, v) = i}| Look only at k-hop neighbourhood from v
i=1
k
X
≥1+ (n1/k + 1)(n1/k )i−1 Vertices distinct and have deg ≥ n1/k + 1
i=1
(n1/k )k − 1
= 1 + (n1/k + 1) Sum of geometric series
n1/k − 1
> 1 + (n − 1) Since (n1/k + 1) > (n1/k − 1)
=n
Let us consider the family of graphs G on n vertices with girth > 2k. It can
be shown by contradiction that a graph G with n vertices with girth > 2k can-
not have a proper (2k − 1)-spanner2 : Assume G0 is a proper (2k − 1)-spanner
with edge {u, v} removed. Since G0 is a (2k − 1)-spanner, dG0 (u, v) ≤ 2k − 1.
Adding {u, v} to G0 will form a cycle of length at most 2k, contradicting the
assumption that G has girth > 2k.
Let g(n, k) be the maximum possible number of edges in a graph from G.
By the above argument, a graph on n vertices with g(n, k) edges cannot have
a proper (2k−1)-spanner. Note that the greedy construction of Theorem 12.2
will always produce a (2k − 1)-spanner with ≤ g(n, k) edges. The size of the
spanner is asymptotically tight if Conjecture 12.3 holds.
1. E[|S|] = np
1.
X
E[|S|] = E[ Xv ] By construction of S
v∈V
X
= E[Xv ] Linearity of expectation
v∈V
X
= p Since E[Xv ] = Pr[Xv = 1] = p
v∈V
= np Since |V | = n
2.
X
E[|N (v) ∩ S|] = E[ Xv ] By definition of N (v) ∩ S
v∈N (v)
X
= E[Xv ] Linearity of expectation
v∈N (v)
X
= p Since E[Xv ] = Pr[Xv = 1] = p
v∈N (v)
2. Initialize E20 = ∅.
164 CHAPTER 12. PRESERVING DISTANCES
Stretch factor Consider two arbitrary vertices u and v with the shortest
path Pu,v in G. Let h be the number of heavy vertices in Pu,v . We split the
analysis into two cases: (i) h ≤ 1; (ii) h ≥ 2. Recall that a heavy vertex has
degree at least n1/2 .
Case (i) All edges in Pu,v are adjacent to a light vertex and are thus in E10 .
Hence, dG0 (u, v) = dG (u, v), with additive stretch 0.
Case (ii)
Claim 12.6. Suppose there exists a vertex w ∈ Pu,v such that (w, s) ∈
E for some s ∈ S, then dG0 (u, v) ≤ dG (u, v) + 2.
s∈S
... ...
... ...
u w v
12.2. β-ADDITIVE SPANNERS 165
Proof.
Let w be a heavy vertex in Pu,v with degree d(w) > n1/2 . By Claim
12.4 with p = 10n−1/2 log n, Pr[|N (w) ∩ S| = 0] ≤ e−10 log n = n−10 .
Taking union bound over all possible pairs of vertices u and v,
n −10
Pr [∃u, v ∈ V, Pu,v has h ≥ 2 and no neighbour in S] ≤ n ≤ n−8
2
Then, Claim 12.6 tells us that the additive stretch factor is at most 2
with probability ≥ 1 − n18 .
1
Therefore, with high probability (≥ 1 − n8
), the construction yields a 2-
additive spanner.
Remark A way to remove log factors from Theorem 12.5 is to sample only
n1/2 nodes into S, and then add all edges incident to nodes that don’t have an
adjacent node in S. The same argument then shows that this costs O(n3/2 )
edges in expectation.
Theorem 12.7. [Che13] Every graph G on n vertices has a 4-additive span-
e 7/5 ) edges.
ner with O(n
Proof.
Construction Partition vertex set V into light vertices L and heavy vertices
H, where
2. Initialize E20 = ∅.
3. Initialize E30 = ∅.
Number of edges
• Since there are ≤ n heavy vertices, ≤ n edges of the form (v, s0 ) for
v ∈ H, s0 ∈ S 0 will be added to E30 . Then, for shortest s − s0 paths
with ≤ n1/5 heavy internal vertices, only edges adjacent to the heavy
vertices need to be counted because those adjacent to light vertices are
already accounted for in E10 . By Claim 12.4 with p = 10n−2/5 log n,
E[|S 0 |] = n · 10n−2/5 log n = 10n3/5 log n. As |S 0 | is highly concentrated
around itsexpectation, we have E[|S 0 |2 ] ∈ O(n
e 6/5 ). So, E 0 contributes
3
0
≤ n + |S2 | · n1/5 ∈ O(ne 7/5 ) edges to the count of |E 0 |.
3
Though we may have repeated edges
12.2. β-ADDITIVE SPANNERS 167
Stretch factor Consider two arbitrary vertices u and v with the shortest
path Pu,v in G. Let h be the number of heavy vertices in Pu,v . We split the
analysis into three cases: (i) h ≤ 1; (ii) 2 ≤ h ≤ n1/5 ; (iii) h > n1/5 . Recall
that a heavy vertex has degree at least n2/5 .
Case (i) All edges in Pu,v are adjacent to a light vertex and are thus in E10 .
Hence, dG0 (u, v) = dG (u, v), with additive stretch 0.
Case (ii) Denote the first and last heavy vertices in Pu,v as w and w0 re-
spectively. Recall that in Case (ii), including w and w0 , there are
at most n1/5 heavy vertices between w and w0 . By Claim 12.4, with
p = 10n−2/5 log n,
2/5 ·10n−2/5
Pr[|N (w) ∩ S 0 | = 0], Pr[|N (w0 ) ∩ S 0 | = 0] ≤ e−n log n
= n−10
• By definition of Ps,s
∗ ∗ 0 0 0
0 , we have l ≤ dG (s, w)+dG (w, w )+dG (w , s ) =
dG (w, w0 ) + 2.
• Since there are no internal heavy vertices between u − w and
w0 − v, Case (i) tells us that dG0 (u, w) = dG (u, w) and dG0 (w0 , v) =
dG (w0 , v).
Thus,
dG0 (u, v)
≤ dG0 (u, w) + dG0 (w, w0 ) + dG0 (w0 , v) (1)
≤ dG0 (u, w) + dG0 (w, s) + dG0 (s, s0 ) + dG0 (s0 , w0 ) + dG0 (w0 , v) (2)
≤ dG0 (u, w) + dG0 (w, s) + l∗ + dG0 (s0 , w0 ) + dG0 (w0 , v) (3)
≤ dG0 (u, w) + dG0 (w, s) + dG (w, w0 ) + 2 + dG0 (s0 , w0 ) + dG0 (w0 , v) (4)
= dG0 (u, w) + 1 + dG (w, w0 ) + 2 + 1 + dG0 (w0 , v) (5)
= dG (u, w) + 1 + dG (w, w0 ) + 2 + 1 + dG (w0 , v) (6)
≤ dG (u, v) + 4 (7)
s ∈ S0 ∗
Ps,s 0 of length l
∗ s0 ∈ S 0
...
Case (iii)
Claim 12.8. There cannot be a vertex y that is a common neighbour
to more than 3 heavy vertices in Pu,v .
|N (w)| · 31 . Let
S P
Claim 12.8 tells us that | w∈H∩Pu,v N (w)| ≥ w∈H∩Pu,v
Applying Claim 12.4 with p = 30 · n−3/5 · log n and Claim 12.8, we get
1 2/5
Pr[|Nu,v ∩ S| = 0] ≤ e−p·|Nu,v | ≤ e−p· 3 ·|H∩Pu,v |·n = e−10 log n = n−10 .
12.2. β-ADDITIVE SPANNERS 169
Then, Claim 12.6 tells us that the additive stretch factor is at most 4
with probability ≥ 1 − n18 .
1
Therefore, with high probability (≥ 1 − n8
), the construction yields a 4-
additive spanner.
Concluding remarks
Preserving cuts
Definition 13.1 (Cut and minimum cut). Consider a graph G = (V, E).
for every S ⊆ V, S 6= ∅, S 6= V .
1
This can also be generalized to weighted graphs.
2
For now, sparse means almost linear number of edges in n–we will make this concrete
soon
171
172 CHAPTER 13. PRESERVING CUTS
1. Let p = Ω( log n
2 n
)
3. Define w(e) = 1
p
for each edge e ∈ E 0
One can check that this suffices for G = Kn . For that, fix an arbitrary cut
(any cut size is ≥ n − 1), and analyze the probability of the above condition
on the size of the cut being badly estimated in H. Then, take a union bound
over all cuts. In the exercise, we discuss an even more general form of this
warm up, where G is a graph with a constant edge expansion.
The rest of this section is devoted to proving a similar result for general
graphs. For that, we first need to review cut counting results that you might
have seen in previous courses (e.g., Algorithms, Probability, and Computing).
Theorem 13.2. For a fixed minimum cut S∗ in the graph, RandomCon-
traction returns it with probability ≥ 1/ n2 .
gives a cut with size smaller than k), so there are at least (n − i)k/2 edges
in the graph. Thus,
k k k
Pr[Success] ≥ 1 − · 1− ··· 1 −
nk/2 (n − 1)k/2 3k/2
2 2 2
= 1− · 1− ··· 1 −
n n−1 3
n−2 n−3 1
= · ···
n n−1 3
2
=
n(n − 1)
1
= n
2
n
Corollary 13.3. There are at most 2
minimum cuts in a graph.
1
Pr[Ci is found by RandomContraction(G)] ≥ n .
2
Now, we observe that for each two distinct indices i, j ∈ [N ], the events “Ci is
found by RandomContraction(G)” and “Cj is found by RandomCon-
traction(G)” are disjoint. Therefore, it follows that
...
µ(G)
In general, we can generalize the bound on the number of cuts that are
of size at most α · µ(G) for α ≥ 1.
Proof. The proof is analogous to that of Theorem 13.2, except for taking into
account that we now want the probability of any fixed α-minimum cut being
output. We continue contract until there are r = d2αe vertices remaining,
and then pick one of the 2r−1 cuts of the resulting graph uniformly at random.
Following the calculations of Theorem 13.2 and more cautious lower bounding
on the success probability, it shows that the probability that we pick the fixed
cut successfully is at least 1/n2α , which thus means the number of α-minimum
cuts is at most n2α . For more details, see Lemma 2.2 and Appendix A (in
particular, Corollary A.7) of a version4 of [Kar99].
3. Define w(e) = 1
p
for each edge e ∈ E 0 .
k0
S V \S
Using Theorem 13.4 and union bound over all possible cuts in G,
Theorem 13.6 can be proved by using a variant of the earlier proof. In-
terested readers can see Theorem 2.1 of [Kar99].
176 CHAPTER 13. PRESERVING CUTS
Running the uniform edge sampling will not sparsify the above dumbbell
graph as µ(G) = 1 leads to large sampling probability p.
Definition 13.9 (edge strength). Given an edge e, its strength (or strong
connectivity) ke is the maximum k such that e is in a k-strong component.
We say an edge is k-strong if ke ≥ k.
1
P
4. e∈E ke ≤ n − 1.
Intuition: If the graph is a tree, each ke equal to one. Because there
are n − 1 edges in a tree, the sum is equal to n − 1. If there are a lot of
edges (the graph is not a tree), then many of them have high strength
and the sum is therefore less than n − 1.
Proof.
k1 -strong components
k2 -strong components
4. Consider a minimum cut CG (S, V \ S). Since ke ≥ µ(G) for all edges
e ∈ CG (S, V \ S), these edges contribute at most µ(G) · k1e ≤ µ(G) ·
1
µ(G)
= 1 to the summation. Remove these edges from G and repeat
the argument on the remaining connected components (excluding iso-
lated vertices). Since each cut removal contributes at most 1 to the
summation and the process stops when we reach n components, then
1
P
e∈E ke ≤ n − 1.
For a graph G with minimum cut size µ(G) = k, consider the following
procedure to construct H:
c log n
1. Set q = 2
for some constant c.
178 CHAPTER 13. PRESERVING CUTS
3. Define w(e) = 1
pe
for each edge e ∈ E 0 .
X
E[|E 0 |] = E[ Xe ] By definition
e∈E
X
= E[Xe ] Linearity of expectation
e∈E
X
= pe Since E[Xe ] = Pr[Xe = 1] = pe
e∈E
X q q
= Since pe =
e∈E
ke ke
X 1
≤ q(n − 1) Since ≤n−1
e∈E
ke
n log n c log n
∈O Since q = for some constant c
2 2
• F1 = G
c log n c log n
= Since q =
2 2
Since this holds for any cut in F1 , we can apply Theorem 13.6 to conclude
that, with high probability, all cuts in F1 have size within (1 ± ) of their
expectation. Note that the same holds after scaling the edge weights in
k1 −k0
q
F1 = kq1 · F1 .
In a similar way, consider any other subgraph Fi ⊆ G as previously
defined. Since an Fi contains the edges from the ki -strong components of
180 CHAPTER 13. PRESERVING CUTS
181
Chapter 14
We now study the class of online problems where one has to commit to
provably good decisions as data arrive in an online fashion. To measure the
effectiveness of online algorithms, we compare the quality of the produced
solution against the solution from an optimal offline algorithm that knows
the whole sequence of information a priori. The tool we will use for doing
such a comparison is competitive analysis.
Definition 14.2 (Ski rental problem). Suppose we wish to ski every day but
we do not have any skiing equipment initially. On each day, we can choose
between:
In the toy setting where we may break our leg on each day (and cannot ski
thereafter), let d be the (unknown) total number of days we ski. What is the
best online strategy for renting/buying?
183
184 CHAPTER 14. WARM UP: SKI RENTAL
Claim 14.3. A = “Rent for B days, then buy on day B+1” is a 2-competitive
algorithm.
Linear search
Free swap Move the queried paper from position i to the top of the stack
for 0 cost.
Paid swap For any consecutive pair of items (a, b) before i, swap their rel-
ative order to (b, a) for 1 cost.
What is the best online strategy for manipulating the stack to minimize total
cost on a sequence of queries?
Remark One can reason that the free swap costs 0 because we already
incurred a cost of i to reach the queried paper.
185
186 CHAPTER 15. LINEAR SEARCH
O(1) unless the hash table is almost full or almost empty, in which case we
double or halve the hash table of size m, incurring a runtime of O(m).
Worst case analysis tells us that dynamic resizing will incur O(m) run
time per operation. However, resizing only occurs after O(m) insertion/dele-
tion operations, each costing O(1). Amortized analysis allows us to conclude
that this dynamic resizing runs in amortized O(1) time. There are two equiv-
alent ways to see it:
• Split the O(m) resizing overhead and “charge” O(1) to each of the
earlier O(m) operations.
• The total run time for every sequential chunk of m operations is O(m).
Hence, each step takes O(m)/m = O(1) amortized run time.
15.2 Move-to-Front
Move-to-Front (MTF) [ST85] is an online algorithm for the linear search
problem where we move the queried item to the top of the stack (and do no
other swaps). We will show that MTF is a 2-competitive algorithm for linear
search. Before we analyze MTF, let us first define a potential function Φ and
look at examples to gain some intuition.
Let Φt be the number of pairs of papers (i, j) that are ordered differently
in MTF’s stack and OPT’s stack at time step t. By definition, Φt ≥ 0 for
any t. We also know that Φ0 = 0 since MTF and OPT operate on the same
initial stack sequence.
1 2 3 4 5 6
MTF’s stack a b c d e f
OPT’s stack a e b d c f
15.2. MOVE-TO-FRONT 187
Now, we have the inversions (b, e), (c, d), (c, e) and (d, e), so Φ = 4.
Scenario 2 We swap (e, d) in OPT’s stack — The inversion (d, e) was de-
stroyed due to the swap.
1 2 3 4 5 6
MTF’s stack a b c d e f
OPT’s stack a b d e c f
In either case, we see that any paid swap results in ±1 inversions, which
changes Φ by ±1.
Proof. We will consider the potential function Φ as before and perform amor-
tized analysis on any given input sequence σ. Let at = cM T F (t) + (Φt − Φt−1 )
be the amortized cost of MTF at time step t, where cM T F (t) is the cost MTF
incurs at time t. Suppose the queried item x at time step t is at position k
in MTF’s stack. Denote:
F ∪B
k ≥ |F | = f
x
x
≥ |B| = b
MTF OPT
With at = cM T F (t) + (Φt − Φt−1 ) and using the fact that the sum over the
change in potential is telescoping, we get:
|σ| |σ|
X X
at = cM T F (t) + (Φt − Φt−1 )
t=1 t=1
|σ|
X
= cM T F (t) + (Φ|σ| − Φ0 )
t=1
P|σ|
Since Φt ≥ 0 = Φ0 and cM T F (σ) = t=1 cM T F (t):
|σ| |σ|
X X
cM T F (t) + (Φ|σ| − Φ0 ) ≥ cM T F (t) = cM T F (σ)
t=1 t=1
We have shown that cM T F (σ) ≤ 2 · cOP T (σ) which completes the proof.
Chapter 16
Paging
189
190 CHAPTER 16. PAGING
On the other hand, since OP T can see the entire sequence σ, OP T can
choose to evict the page i that is requested furthest in the future. The next
request for page i has to be at least k requests ahead in the future, since by
definition of i all other pages j 6= i ∈ {1, ..., k +1} have to be requested before
i. Thus, in every k steps, OP T has ≤ 1 cache miss. Therefore, cOP T ≤ |σ| k
which implies: k · cOP T ≤ |σ| = cA (σ).
Proof. For any given input sequence σ, partition σ into m maximal phases
— P1 , P2 , . . . , Pm — where each phase has k distinct pages, and a new phase
is created only if the next element is different from the ones in the current
phase. Let xi be the first item that does not belong in Phase i.
Phase 1 Phase 2
Adaptive At each time step t, the adversary knows all randomness used
by algorithm A thus far. In particular, it knows the exact state of the
algorithm. With these in mind, it then picks the (t + 1)-th element in
the input sequence.
Fully adaptive The adversary knows all possible randomness that will be
used by the algorithm A when running on the full input sequence σ. For
16.2. RANDOM MARKING ALGORITHM (RMA) 191
– If p is not in cache,
∗ If all pages in cache are marked, unmark all
∗ Evict a random unmarked page
– Mark page p
marked pages
time
that is when an old page is requested that has already been kicked out upon
the arrival of a new request.
Order the old requests in the order which they appear in the phase and
let xj be the j th old request, for j ∈ {1, . . . , oi }. Define lj as the number of
distinct new requests before xj .
For j ∈ {1, . . . , oi }, consider the first time the j th old request xj occurs.
Since the adversary is oblivious, xj is equally likely to be in any position in
the cache at the start of the phase. After seeing (j − 1) old requests and
marking their cache positions, there are k − (j − 1) initial positions in the
cache that xj could be in. Since we have only seen lj new requests and (j −1)
old requests, there are at least1 k − lj − (j − 1) old pages remaining in the
cache. So, the probability that xj is in the cache when requested is at least
k−lj −(j−1)
k−(j−1)
. Then,
oi
X
Cost due to old requests = Pr[xj is not in cache when requested] Sum over old requests
j=1
oi
X lj
≤ From above
j=1
k − (j − 1)
oi
X mi
≤ Since lj ≤ mi = |N |
j=1
k − (j − 1)
k
X 1
≤ mi · Since oi ≤ k
j=1
k − (j − 1)
k
X 1
= mi · Rewriting
j=1
j
n
X 1
= mi · Hk Since = Hn
i=1
i
Since every new request incurs a unit cost, the cost due to these requests
is mi .
Together for new and old requests, we get cRM A (Phase i) ≤ mi + mi · Hk .
We now analyze OPT’s performance. By definition of phases, among all
requests between two consecutive phases (say, i − 1 and i), a total of k + mi
distinct pages are requested. So, OPT has to incur at least ≥ mi to bring in
1
We get an equality if all these requests kicked out an old page.
194 CHAPTER 16. PAGING
Therefore, we have:
X X
cRM A (σ) ≤ (mi + mi · Hk ) = O(log k) mi ≤ O(log k) · cOP T (σ)
i i
Proof.
X
C= qx · C Sum over all possible inputs x
x
X
≥ qx Ep [c(A, x)] Since C = max Ep [c(A, x)]
x∈X
x
X X
= qx pa c(a, x) Definition of Ep [c(A, x)]
x p
X X
= pa qx c(a, x) Swap summations
a q
X
= pa Eq [c(a, X)] Definition of Eq [c(a, X)]
a
X
≥ pa · D Since D = min Eq [c(a, X)]
a∈A
a
=D Sum over all possible algorithms a
Remark We do not fix the starting positions of the k servers, but we com-
pare the performance of OPT on σ with same initial starting positions.
The paging problem is a special case of the k-server problem where the
points are all possible pages, the distance metric is unit cost between any
two different points, and the servers represent the pages in cache of size k.
197
198 CHAPTER 17. THE K-SERVER PROBLEM
..
.
s∗
0 1+2+
Without loss of generality, suppose all servers currently lie on the left of “0”.
For > 0, consider the sequence σ = (1 + , 2 + , 1 + , 2 + , . . . ). The first
request will move a single server s∗ to “1 + ”. By the greedy algorithm,
subsequent requests then repeatedly use s∗ to satisfy requests from both
“1 + ” and “2 + ” since s∗ is the closest server. This incurs a total cost of
≥ |σ| while OPT could station 2 servers on “1 + ” and “2 + ” and incur a
constant total cost on input sequence σ.
• If request r is on one side of all servers, move the closest server to cover
it
Before r Before r
After r After r
17.1. SPECIAL CASE: POINTS ON A LINE 199
Proof. Suppose, for a contradiction, that both xi and xi+1 are moving
away from their partners. That means yi ≤ xi < r < xi+1 ≤ yi+1 at
the end of OPT’s action (before DC moved xi and xi+1 ). This is a
contradiction since OPT must have a server at r but there is no server
between yi and yi+1 by definition.
Since at least one of xi or xi+1 is moving closer to its partner, ∆DC (Φt,1 ) ≤
z − z = 0.
Meanwhile, since xi and xi+1 are moved a distance of z towards each
other, (xi+1 − xi ) = −2z while the total change against other pairwise
distances cancel out, so ∆DC (Φt,2 ) = −2z.
Hence,
cDC (t)+∆(Φt ) = cDC (t)+∆DC (Φt )+∆OP T (Φt ) ≤ 2z−2z+kx = kx ≤ k·cOP T (t)
In all cases, we see that cDC (t) + ∆(Φt ) ≤ k · cOP T (t). Hence,
|σ| |σ|
X X
(cDC (t) + ∆(Φt )) ≤ k · cOP T (t) Summing over σ
t=1 t=1
|σ|
X
⇒ cDC (t) + (Φ|σ| − Φ0 ) ≤ k · cOP T (σ) Telescoping
t=1
|σ|
X
⇒ cDC (t) − Φ0 ≤ k · cOP T (σ) Since Φt ≥ 0
t=1
|σ|
X
⇒ cDC (σ) ≤ k · cOP T (σ) + Φ0 Since cDC (σ) = cDC (t)
t=1
[AB17] Amir Abboud and Greg Bodwin. The 43 additive spanner expo-
nent is tight. Journal of the ACM (JACM), 64(4):28, 2017.
[ADD+ 93] Ingo Althöfer, Gautam Das, David Dobkin, Deborah Joseph,
and José Soares. On sparse spanners of weighted graphs. Discrete
& Computational Geometry, 9(1):81–100, 1993.
[AGM12] Kook Jin Ahn, Sudipto Guha, and Andrew McGregor. Ana-
lyzing graph structure via linear measurements. In Proceedings
of the twenty-third annual ACM-SIAM symposium on Discrete
Algorithms, pages 459–467. SIAM, 2012.
[AHK12] Sanjeev Arora, Elad Hazan, and Satyen Kale. The multiplica-
tive weights update method: a meta-algorithm and applications.
Theory of Computing, 8(1):121–164, 2012.
[AMS96] Noga Alon, Yossi Matias, and Mario Szegedy. The space com-
plexity of approximating the frequency moments. In Proceedings
of the twenty-eighth annual ACM symposium on Theory of com-
puting, pages 20–29. ACM, 1996.
i
ii Advanced Algorithms