0% found this document useful (0 votes)

59 views32 pages

Spectral - Graph - Theory - 5

Spectral graph theory studies graphs through the eigenvalues and eigenvectors of their Laplacian matrices. The Laplacian matrix L measures how a potential function assigned to graph vertices varies across edges. L's eigenvalues provide information about the graph's structure. For example, the smallest eigenvalue of L is always 0, and the largest eigenvalue is 2 if and only if the graph is bipartite. Analyzing L reveals properties like connectivity, bipartiteness, and the degree distribution of a graph.

Uploaded by

Thảo Ngọc

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

59 views32 pages

Spectral - Graph - Theory - 5

Uploaded by

Thảo Ngọc

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 32

Chapter 4

Introduction to Spectral Graph

Theory

Spectral graph theory is the study of a graph through the properties of the eigenvalues and
eigenvectors of its associated Laplacian matrix. In the following, we use G = (V, E) to
represent an undirected n-vertex graph with no self-loops, and write V = {1, . . . , n}, with
the degree of vertex i denoted di . For undirected graphs
Pour convention will be that if there
is an edge then both (i, j) ∈ E and (j, i) ∈ E. Thus (i,j)∈E 1 = 2|E|. If we wish to sum
P
over edges only once, we will write {i, j} ∈ E for the unordered pair. Thus {i,j}∈E 1 = |E|.

4.1 Matrices associated to a graph

Given an undirected graph G, the most natural matrix associated to it is its adjacency
matrix :
Definition 4.1 (Adjacency matrix). The adjacency matrix A ∈ {0, 1}n×n is defined as
(
1 if {i, j} ∈ E,
Aij =
0 otherwise.
Note that A is always a symmetric matrix with exactly di ones in the i-th row and the i-th
column. While A is a natural representation of G when we think of a matrix as a table of
numbers used to store information, it is less natural if we think of a matrix as an operator,
a linear transformation which acts on vectors. The most natural operator associated with
a graph is the diffusion operator, which spreads a quantity supported on any vertex equally
onto its neighbors. To introduce the diffusion operator, first consider the degree matrix :
Definition 4.2 (Degree matrix). The degree matrix D ∈ Rn×n is defined as the diagonal
matrix with diagonal entries (d1 , . . . , dn ).
Definition 4.3 (Normalized Adjacency Matrix). The normalized adjacency matrix is defined
as
A = AD−1 .

1
Note that this is not necessarily a symmetric matrix. But it is a column-stochastic matrix:
each column has non-negative entries that sum to 1. This means that if p ∈ Rn+ is a
probability vector defined over the vertices, then Ap is another probability vector, obtained
by “randomly walking” along an edge, starting from a vertex chosen at random according to
p. We will explore the connection between A and random walks on G in much more detail
in the next lecture.
Finally we introduce the Laplacian matrix, which will provide us with a very useful
quadratic form associated to G:
Definition 4.4 (Laplacian and normalized Laplacian Matrix). The Laplacian matrix is
defined as
L = D − A.
The normalized Laplacian is defined as
L = D−1/2 LD−1/2 = I − D−1/2 AD−1/2 .
Note that L and L are always symmetric. They are best thought of as quadratic forms:
for any x ∈ Rn , X X X
xT Lx = di x2i − xi xj = (xi − xj )2 .
i (i,j)∈E {i,j}∈E

For the normalized Laplacian, we have the following claim.

Claim 4.5. ∀x ∈ Rn , we have
!2
X x x
xT Lx = √ i − pj . (4.1)
{i,j}∈E
di dj

If G is d-regular, then this simplifies to

1 X
xT Lx = (xi − xj )2 .
d
{i,j}∈E

Proof.
xT Lx = xT x − xT D−1/2 AD−1/2 x
X X xi x
= x2i − √ Aij p j
i i,j
di dj
X xi 2 X xi x
= di √ − √ · √j
i
di (i,j)∈E
di di
!2
X xi xj
= √ −p .
{i,j}∈E
d i d j

2
Claim 4.5 provides the following interpretation of the Laplacian: if we think of the vector
x as assigning a weight, or “potential” xi ∈ R to every vertex v ∈ V , then the Laplacian
measures the average variation of the potential over all edges. The expression xT Lx will be
small when the potential x is close to constant across all edges (when appropriately weighted
by the corresponding degrees), and large when it varies a lot, for instance when potentials
associated with endpoints of an edge have a different sign.
We will return to this interpretation soon. Let’s first see some examples. It will be
convenient to always order the eigenvalues of A in deceasing order, µ1 ≥ · · · ≥ µn , and those
of L in increasing order, λ1 ≤ · · · ≤ λn . So what do the eigenvalues of A or L have to say?
Example 4.6. Consider the graph shown in Figure. 4.1.

Figure 4.1: A single edge

The adjacency matrix is

0 1
A= .
1 0
Note that in writing down A we have some liberty in ordering the rows and columns. But
this does not change the spectrum as simultaneous reordering of the rows and the columns
corresponds to conjugation by a permutation, which is orthogonal and thus preserves the
spectral decomposition. We can also compute

1 0 1 −1
D= and L = = L.
0 1 −1 1

The spectrum of L is given by λ1 = 0, λ2 = 2. The corresponding eigenvectors are

1 1 1 1
√ and √ .
2 1 2 −1
Example 4.7. Consider the graph shown in Figure. 4.2.

Figure 4.2: The triangle graph

3
The adjacency matrix is given by
 
0 1 1
A= 1 0 1 
1 1 0
We can also compute
   
2 0 0 1 −1/2 −1/2
D= 0 2 0  and L =  −1/2 1 −1/2  .
0 0 2 −1/2 −1/2 1

The eigenvalues of L are 0, 3/2, 3/2 with corresponding eigenvectors

     
1 2 0
1 1 1
√  1  , √ −1 , √  1  ,
3 1 6 −1 2 −1

where since the second eigenvalue 3/2 is degenerate we have freedom in choosing a basis for
the associated 2-dimensional subspace.
Example 4.8. As a last example, consider the path of length two, pictured in Figure. 4.3.

Figure 4.3: The path of length 2

The adjacency matrix is given by

 
0 1 0
A= 1 0 1 
0 1 0
We can also compute
   √ 
1 0 0 1√ −1/ 2 0√
D= 0 2 0  and L =  −1/ 2 1√ −1/ 2  .
0 0 1 0 −1/ 2 1

The eigenvalues of L are 0, 1, 2 with corresponding eigenvectors

     
√1 −1 −1
1 1   1 √ 
2 , √
 0 , 2 .
2 2 2
1 1 −1

4
We’ve seen three examples — do you notice any pattern? 0 seems to always be the
smallest eigenvalue. Moreover, in two cases the associated eigenvector has all its coefficients
equal. In the case of the path, the middle coefficient is larger — this seems to reflect the
degree distribution in some way. Anything else? The largest eigenvalue is not always the
same. Sometimes there is a degenerate eigenvalue.
Exercise 1. Show that the largest eigenvalue λn of the normalized Laplacian of a connected
graph G is such that λn = 2 if and only if G is bipartite.
We will see that much more can be read about combinatorial properties of G from L in
a systematic way. The main connection is provided by the Courant-Fisher theorem:
Theorem 4.9 (Variational Characterization of Eigenvalues). Let M ∈ Rn×n be a symmetric
matrix with eigenvalues µ1 ≥ · · · ≥ µn , and let the corresponding eigenvectors be v1 , · · · , vn .
Then
xT M x xT M x xT M x xT M x
µ1 = sup , µ2 = sup , ... µn = sup = inf .
x∈Rn x x
T
x∈Rn x x
T
x∈ n R xT x x∈Rn xT x
x6=0 x⊥v1 x⊥v1 ,...,vn−1 x6=0

Proof. By the spectral theorem, we can write

n
X
M= µi vi viT , (4.2)
i=1

where {v1 , . . . , vn } are an orthonormal basis of Rn formed of eigenvectors of M . For 1 ≤

k ≤ n, we have
xT M x
µk ≤ sup (4.3)
x∈ nR xT x
x⊥v1 ,...,vk−1

because by taking x = vk and using (4.2) together with viT vk = 0 for i 6= k we immediately
get
xT M x
= µk .
xT x
T
Pn observe thatPany2 x such that x x = 1 and x ⊥ v1 , . . . , vk−1
To show the reverse inequality,
can be decomposed as x = j=k αj vj with j αj = 1. Now
n X
X n n
X
xT M x = µl αi αj viT vl vlT vj = µl αl2 ≤ µk
i,j=k l=1 l=k

since the eigenvalues are ordered in decreasing order. Thus

xT M x
µk ≥ sup ,
R
x∈ n xT x
x⊥v1 ,...,vk−1

which together with (4.3) concludes the proof.

5
4.2 Eigenvalues of the Laplacian
Using the variational characterization of eigenvalues given in Theorem 4.9, we can connect
the quadratic form associated with the normalized Laplacian in Claim 4.5 to the eigenvalues
of L.
Claim 4.10. For any graph G with normalized Laplacian L, 0 ≤ L ≤ 2I. Moreover, if λ1 is
the smallest eigenvalue of L then λ1 = 0 with multiplicity equal to the number of connected
components of G.
Proof. From (4.1) we see that xT Lx ≥ 0 for any x, and using (a − b)2 ≤ 2(a2 + b2 ) we also
have xT Lx ≤ 2xT x. Using the variational characterization
xT Lx xT Lx
λ1 = inf , λn = sup ,
x6=0 xT x x6=0 xT x

where λn is the largest eigenvalue, we see that 0 ≤ L ≤ 2I. To see that λ1 = 0 always with
multiplicity at least 1 it suffices to consider the vector
 √ 
d1
 .. 
v1 =  .  ,
√
dn

for which v1T Lv1 = 0. √

Now suppose G has exactly L connected components. By choosing a vector equal to di
for all i that belong to a given connected component and 0 elsewhere we can construct as
many orthogonal vectors v such that v T Lv = 0 as there are connected components. Thus the
multiplicity of the eigenvalue 0 is at least as large as the number of connected components.
To show the converse, note that √ from (4.1) we see that up to normalization any v such
that v T Lv = 0 must be such that vi / di is constant across each connected component. Thus
the dimension of the subspace of all v such that v T Lv = 0 is effectively at most the number
of connected components, and there can be at most k linearly independent such vectors: the
multiplicity of the eigenvalue 0 is at most the number of connected components.
An immediate corollary worth stating explicitly is as follows:
Claim 4.11. For any graph G, the second smallest eigenvalue λ2 (L) > 0 if and only if G is
connected.
These claims show that the small eigenvalues of L tell us whether the graph is connected
or not. We will make this statement more quantitative by showing that, not only is the
question of connectedness related to the question of λ2 being equal to 0, but in fact the
magnitude of λ2 can be used to quantify, in a precise way, how “well-connected” the graph
is. Let us look at a natural measure of connectedness of a graph, its conductance. Given a
set of vertices ∅ ( S ( V , the boundary of S is defined as

∂S = {i, j} ∈ E : i ∈ S, j ∈
/S .

6
The conductance of S is
|∂S|
φ(S) = ,
min (d(S), d(V \S))
P
where d(S) := i∈S di is a natural measure of volume: the total number of edges incident
on vertices in S. If G is d-regular, then this simplifies to
|∂S|
φ(S) = .
d · min(|S|, |V \S|)
Definition 4.12 (Conductance). The conductance of a graph G is defined as

φ(G) = min φ(S).

S:S6=∅,S6=V

If G is d-regular, this simplifies to

|∂S|
Φ(G) = min ,
S, 1≤|S|≤n/2 d · |S|

the fraction of edges incident on S that have one endpoint outside of S.

The conductance is a measure of how well connected G is. Here are some examples
demonstrating this point.
Example 4.13.
• Clearly G is disconnected if and only if there exists a set S 6= ∅, S 6= V such that
|∂S| = 0, i.e. if and only if φ(G) = 0.

• If G is a clique, then
k(n − k) n 1
φ(G) = min = ≈ .
1≤k≤n/2 (n − 1)k 2(n − 1) 2

• If G is a cycle, then
2 2
φ(G) = min = .
1≤k≤n/2 2k n
Exercise 2. Compute the conductance of the hypercube G = (V, E) where V = {0, 1}n and
E = {{u, v} ∈ V : dH (u, v) = 1}, where dH is the Hamming distance.
The following theorem is a fundamental result relating conductance and the second small-
est eigenvalue of the normalized Laplacian.
Theorem 4.14 (Cheeger’s inequality). Let G be an undirected graph with normalized Lapla-
cian L = I − D−1/2 AD−1/2 . Let 0 = λ1 ≤ λ2 ≤ · · · ≤ λn be the eigenvalues of L. Then
λ2 p
≤ φ(G) ≤ 2λ2 .
2
7
Remark 4.15. • Both sides of the inequality are interesting. The left-hand side says
that if there is a good cut, that is a cut of small conductance, then there is an eigen-
vector orthogonal to the smallest eigenvector with small eigenvalue. This is called the
“easy” side of Cheeger.
• The right-hand side says that if λ2 is small, then there must exist a poorly connected
set. This is called the “hard” side of Cheeger.
• We will give “algorithmic” proofs of both inequalities: for the left-hand side, given a
set S of low conductance we will show how to construct a vector v ⊥ v1 that achieves
a low value in (4.1). For the right-hand side, given a vector v2 ⊥ v1 achieving a low
value in (4.1) we will construct a set S of low conductance.
• The next exercise shows that both sides of the inequality are tight.
Exercise 3. Show that the left-hand side of Cheeger’s inequality is tight by computing the
eigenvalues and eigenvectors of the hypercube (hint: Fourier basis). Show that the right-hand
side is also tight by considering the example of the n-cycle.
Proof of Cheeger’s inequality. We first prove the “easy side”. Let S be a set of vertices such
that φ(S) = φ(G). We claim
xT Lx
λ2 = minn ≤ 2φ(G).
x∈
√ R √ xT x
x⊥v1 =( d1 ,··· , dn )

To see this, define p p

x=( di , . . . , −σ dj , . . .),
| {z } | {z }
vertices in S vertices in S
d(S)
where σ = d(V \S)
is defined so that
X X
xT v 1 = di − σ di = 0.
i∈S i∈S

We then have
X X d(S)2 d(S)d(V )
xT x = di + σ 2 di = d(S) + · d(V \S) = ,
i∈S
d(V \S)2 d(V \S)
i∈V \S

and
!2
X x x
xT Lx = √ i − pj
{i,j}∈E
di dj
X X d(V \S) + d(S) 2
2
= (1 + σ) =
d(V \S)
{i,j}∈∂S {i,j}∈∂S
2
d(V )
= |∂S| .
d(V \S)2

8
This finally implies

xT Lx |∂S|d(V ) 2|∂S|
T
= ≤ = 2φ(G),
x x d(S)d(V \S) min(d(S), d(V \S))

where the inequality can be seen by considering the cases d(S) ≤ d(V \S) and d(S) > d(V \S)
separately.
Now let’s turn to the “hard side” of the inequality. Let y ∈ Rn be such that

y T Ly
≤ λ2 (4.4)
yT y
√ √
and y ⊥ v1 = ( d1 , . . . , dn )T . Our main idea is going to be to think of the coordinates of
y as providing an ordering of the vertices, with each coordinate telling us how likely (small,
negative yi ) or unlikely (large, positive yi ) the vertex is to be in a set with small conductance:
this intuition comes from the form of the vector we found in the proof of the easy case, which
provides such an ordering.
Rather than use the inequality (4.4) as starting point, it will be more convenient to work
with the analogue formulation for the Laplacian L, so we start with a few manipulations.
Let z = D−1/2 y − σ1 for some σ to be determined soon and 1 = (1, . . . , 1)T . Since 1T L1 = 0,
we see that z T Lz = y T Ly. Moreover, D1 = v1 , thus

z T Dz = y T y − 2σ v1T · y +σ 2 d(V ) ≥ y T y,
| {z }
=0

and using (4.4) we get that for any σ,

z T Lz
≤ λ2 .
z T Dz
We make the following conventions, without loss of generality:

• Order the coordinates of z so that z1 ≤ . . . ≤ zn .

• Choose σ such that zi0 = 0, where i0 is such that

X d(V ) X d(V )
di < di ≥ . (4.5)
i<i0
2 i≤i0
2

• Scale z so that z12 + zn2 = 1.

Let t ∈ [z1 , zn ] be chosen according to the distribution with density 2|t| (the scaling on z
assumed above ensures that this is a properly normalized probability measure). Observe

9
that for any a < b,
Z b
Pr(t ∈ [a, b]) = 2|t|dt
a

2 2
b + a
 if a < 0 < b
= b 2 − a2 if b > a > 0

 2
a − b2 otherwise
≤ |b − a| (|a| + |b|) ,

an inequality that is easily verified in all three cases. For any t, let St = {i : zi ≤ t}. Then
X X
E d(St ) = Pr(i ∈ St )di = Pr(zi ≤ t)di .
t
i i

Our choice of the index i0 in (4.5) ensures that, if t < 0 then min(d(St ), d(V \St )) = d(St ),
while if t ≥ 0 then min(d(St ), d(V \St )) = d(V \St ). Thus
X X
E min (d(St ), d(V \St )) = Pr(zi ≤ t ∧ t < 0)di + Pr(zi > t ∧ t ≥ 0)di
t
i<i0 i≥i0
X X
= zi2 di + zi2 di (4.6)
i<i0 i≥i0
T
= z Dz. (4.7)

Next we compute
X
E |∂St | = Pr (zi ≤ t ≤ zj )
t
{i,j}∈E
X
≤ |zj − zi |(|zi | + |zj |)
{i,j}∈E
s X s X
≤ (zi − zj )2 (|zi | + |zj |)2
{i,j}∈E {i,j}∈E
| √
{z } √| {z √
}
z T Lz 2 2 T
P
≤ 2 {i,j}∈E (|zi | +|zj | )= 2z Dz
p
≤ 2λ2 z T Dz, (4.8)

where the second inequality is Cauchy-Schwarz. Combining (4.6) and (4.8),

p
E |∂St | ≤ 2λ2 E min (d(St ), d(v\St )) ,
t t

which we can rewrite as

p
E 2λ2 min (d(St ), d(v\St )) − |∂St | ≥ 0.
t

10
From there we deduce that there exists a choice of t such that
p
Φ(St ) ≤ 2λ2 ,
√
which immediately gives us that Φ(G) ≤ 2λ2 , as desired.
We note that the proof given above is algorithmic, in that it describes
√ an efficient algo-
rithm that, given a graph, will produce a set with conductance at most 2λ2 : simply com-
pute an eigenvector associated with the second smallest eigenvalue (using, e.g., the power
method), and output the set St which has smallest conductance among the n possibilities:
this can be checked efficiently.

4.3 Sparsity
The sparsity is another natural measure of “disconnectedness” of a graph, which turns out
to be very closely related to the conductance.
Definition 4.16. For a d-regular graph G and ∅ ( S ( V , the sparsity of S is defined as
1
E(i,j)∈E |1S (i) − 1S (j)| dn
|∂S| |V ||∂S|
σ(S) = = 2 = ,
E(i,j)∈V 2 |1S (i) − 1S (j)| n2
|S| · |V \S| 2d|S||V \S|
and so
Φ(S)
≤ σ(S) ≤ Φ(S).
2
Note that the definition of the sparsity lets us give a different, almost immediate proof
of the “easy” side of Cheeger’s inequality:
σ(G) = min σ(S)
S
1 2
P
dn (i,j)∈E |xi − xj |
= min 2
P 2
(i,j)∈V 2 |xi − xj |
x∈{0,1}n ,x6=0,1 2
n
2 {i,j}∈E (xi − xj )2
P
≥ min
x∈Rn ,x6=0,x⊥1 2d
P
n (i,j)∈V 2 (xi − xj ) 2
xT Lx
= min
x∈R ,x6=0,x⊥1 1
P 2
(i,j)∈V 2 (xi − xj )
n
n
xT Lx
= min
x∈Rn ,x6=0,x⊥1 1 2nx2i − 2 i,j xi xj
P P
n i∈V
T
x Lx λ2
= min T
= ,
x∈Rn ,x6=0,x⊥1 2x x 2
where here the inequality follows since we are taking the minimum over a larger set (the
constraint x ⊥ 1 is without loss of generality since the whole expression is invariant under
translation of the vector x by an additive constant multiple of 1).

11
Using the above inequality σ(G) ≤ 2Φ(G), we have re-proven the left-hand side of
Cheeger’s inequality, simply by observing that the second eigenvalue of L could be seen
as a natural relaxation of the sparsity, itself very closely related to the conductance.
The second eigenvalue gives us a good approximation to the conductance in case λ2 is
not too small, say it is a constant. If however λ2 goes to 0 with n, say λ2 ∼ 1/n, then the
−1/2 √
approximation can be very bad, it can be a multiplicative factor λ2 ∼ n off.
Here are two other relaxations that have been considered, and do much better. The first
one is due to Leighton and Rao, and can be defined as
P
(i,j)∈E wij
LR(G) = min d
P . (4.9)
w∈Rn×n
wij ≥0,wii =0n (i,j)∈V 2 wij
wij ≤wik +wkj

This can be interpreted as a minimization over all semi-metrics: distance measures d(i, j) =
wij on the graph that are always non-negative and satisfy the triangle inequality. An example
such metric is (i, j) → wij = |xi − xj | (for any fixed vector x), but there are others. The
advantage of allowing all semi-metrics is that LR(G) can be computed using a linear program.
Moreover, Leighton and Rao showed that
O(log n) LR(G) ≥ σ(G) ≥ LR(G),
thus we get a much tighter approximation to the sparsity than the one given by λ2 in cases
when the sparsity is small. An even tighter relaxation was introduced by Arora, Rao and
Vazirani, who considered
P
(i,j)∈E wij
ARV(G) = min d
P . (4.10)
w∈Rn×n
n
wij ≥0,wii =0(i,j)∈V 2 wij
2 ≤w 2 +w 2
wij ik kj

This is the same as before, except now we require d2 , instead of d, to be a semi-metric.

This is a stronger requirement, and it is still a relaxation because dij = |xi − xj | satisfies
both conditions. The optimization problem defining ARV(G) can be solved efficiently using
semidefinite programming, and Arora, Rao and Vazirani showed that
p
O( log n) ARV(G) ≥ σ(G) ≥ ARV(G).
It is open whether one can find a better approximation to σ(G) in polynomial time. The
best hardness results are Unique Games hardness for constant-factor approximations, and
anything in-between is open!

4.4 Random walks on graphs

4.4.1 A motivating example
We start with a motivating application. The problem k-SAT is, given m clauses over n
Boolean variables such that each clause is a disjunction of literals (a literal is a variable or

12
its negation), is it possible to find an assignment to the variables that satisfies all clauses?
For k ≥ 3 the problem is NP-hard, but for k = 2 there are efficient algorithms.
Here is a simple candidate. Start with a random assignment to the variables. At each
step, choose a clause that is not satisfied. Pick one of the two variables it acts on at random,
and flip it. Repeat.
Is this going to work? And if so, how long will take? Here is the key idea. Suppose there
exists a satisfying assignment, and fix it. Now consider the distance between the current
assignment, maintained by the algorithm, and this satisfying assignment. This distance is an
integer between 0 and n, and at each step it either increases or decreases by 1. If a clause is
violated, at least one of the two variables involved must have a different value in the current
assignment as in the satisfying assignment. With probability 1/2 we flip this variable, so
that at each step with probability at least 1/2 we decrease the distance by 1. How many
steps will it take to find the satisfying assignment? The answer is O(n2 ), and we’ll see how
to show this very easily once we’ve covered some of the basics of the analysis of random
walks on arbitrary graphs. (Note the algorithm we just described is not the best, and it is
possible to solve 2SAT in deterministic linear time...)

4.4.2 The random walk matrix

P
Let G be a undirected weighted graph with adjacency matrix A. Put di = j:{i,j}∈E wij (we
will always assume all weights to be non-negative). The natural random walk on G is to
w
step from i → j with probability diji . Let p(0) ∈ Rn+ be a distribution over vertices. One step
of the random walk is that
(1)
X (0) wij
p(0) → p(1) , where pj = pi ,
di
i:{i,j}∈E

which in matrix form can be written as p(1) = AD−1 p(0) . This version of the random walk
has one major drawback, which is that it does not always converge: consider for instance a
graph with a single edge, or more generally a bipartite graph; the walk started at a vertex on
the left will continue hopping back and forth between left and right without ever converging.
To overcome this issue it is customary to consider instead the lazy random walk: with
probability 1/2, do not move, and with probability 1/2, do as before. The update rule is
then
(t) I 1 −1
p = + AD p(t−1) ,
2 2
which in matrix form is p(t) = W p(t−1) where W is the random walk matrix:

Definition 4.17. Let G = (V, E) be an n-vertex weighted, undirected graph with weights
wij ≥ 0. Define the lazy random walk on G: if p ∈ Rn+ is a distribution on the vertices
V = {1, ..., n}, one step of the random walk brings p to W p where
1 1
W = I + AD−1 .
2 2
13
Note that W is not symmetric, so it is not diagonalizable. But observe that

1 −1/2 −1/2
L −1/2
1/2
D−1/2 = D1/2 I −

W =D I− I−D AD D ,
2 2
L is the normalized Laplacian associated with G. Thus if v is an eigenvector for L with
eigenvalue λ, then w = D1/2 v is a right eigenvector for W with eigenvalue (1 − λ/2); thus W
has n eigenvalues wi = 1−λi /2 that are directly related to those of the normalized Laplacian.
This will let us transfer our understanding of the λi to derive convergence properties of the
random walk.

4.4.3 Mixing Time

Let p(0) be any distribution on V = {1, . . . , n}, and for every t ≥ 1 define p(t) = W p(t−1)
inductively as the distribution on vertices after t steps of the lazy random walk have been
performed. The main question is: does limt→∞ p(t) exist, and if so, how fast is convergence?
Definition 4.18. A distribution π is a stationary distribution if W π = π.
By definition, any stationary distribution π is such that π is an eigenvector of W with
eigenvalue 1, thus D1/2 π is a right eigenvector of L with eigenvalue 0. But we know that
if G is connected L always has a single √ non-degenerate
√ eigenvalue equal to 0, and moreover
the associated eigenvector v1 ∝ ( d1 , . . . , dn ) can be taken to have all√its components √
non-negative. Thus there exists a unique stationary distribution π ∝ D1/2 ( d1 , . . . , dn ) ∝
(d1 , . . . dn ); normalizing the vector π so that it indeed corresponds to a distribution we obtain
that π = d(V1 ) (d1 , . . . dn ).
Definition 4.19. Given ε > 0, the mixing time is defined as
X
τε = min t : kW t p − πk1 ≤ ε ∀p ∈ Rn+ s.t.

pi = 1 ,
i

where π is the stationary distribution, and k · k1 is the statistical distance. By convention

we also let τ = τ1/4 .
Theorem 4.20. For any connected, weighed, undirected graph G the lazy random walk mixes
in time τε = O( log(n/ε)
λ2
), where λ2 is the second-smallest eigenvalue of L.
Proof. Take p any distribution on V . Then, p = N 1/2
P
i=1 αi D vi , where vi are the eigenvectors
1/2 −1/2
of L and αi some arbitrary coefficients. Here, v1 = d(V ) D π, so α1 = (D−1/2 p)T v1 =
d(V )−1/2 since p is a normalized distribution. Then,
X
W tp = αi W t D1/2 vi
X
= αi wit D1/2 vi
i
X
=π+ αi wit D1/2 vi .
i≥2

14
Therefore,
X
kW t p − πk1 = D1/2 wit αi vi
1
i≥2
√ X
≤ n αi wit D1/2 vi
2
i≥2
√ p X 1/2
≤ n dmax αi2 wi2t
i≥2
√ √
≤ n n ω2t ,

where for the last step we bounded the maximum degree by n and used that i≥2 αi2 ≤
P

pT D−1 p ≤ i p2i ≤ 1 since p is a distribution. The bound is ≤ ε for t = log(n/ε) log(n/ε)

P
log(w2 )
= log(1−λ 2 /2)
=
O( log(n/ε)
λ2
), as desired.
Example 4.21. The following simple test cases show that, aside from the log n factor we
lost by switching from the Euclidean norm to the statistical distance, the bound provided
by Theorem 4.20 can be very accurate.
• For the n-vertex clique Kn , we saw that the conductance φ(Kn ) ≈ 1/2 and thus by
Cheeger λ2 is at least a constant. If we put all the probability at time 0 on a single
vertex, it will take ∼ log 1/ε steps before the lazy random walk spreads at least 1 − ε
probability to the other vertices, so τε = Ω(log 1/ε).

• For the n-vertex path Pn , we saw λ2 ∼ 1/n2 , and you can check that τε ≥ n2 log( 1ε ):
starting on the leftmost vertex, it takes ∼ n2 steps before we hit the rightmost vertex,
and the log(1/ε) term is overhead before the walk becomes sufficiently close to uniform.
Note this proves the bound on the convergence time for the randomized algorithm for
2-SAT we saw earlier.

• For the dumbbell graph Kn/3 — Pn/3 — Kn/3 (two (n/3)-cliques linked by a path of
length n/3), the conductance is at most ∼ 1/n2 (cut in the middle), so by Cheeger
λ2 = O(1/n2 ). Consider a random walk starting on the left-most vertex. Then, one
step leads to a uniform distribution on the left Kn/3 clique. From there the probability
to enter the bridge is ∼ 1/n2 , and moreover the probability of making it through the
bridge without falling back into the left clique is about ∼ 1/n. Using this intuition
you can show that the mixing time is of order n3 , so that in fact λ2 = O(1/n3 ).

v1 v2 v3 vn−1 vn

How bad can it get? From the definition we see that the conductance φ(G) is always at
least 1/n2 , so by Cheeger’s inequality as long as G is connected we have λ2 = Ω(n−4 ). In
fact we can prove something slightly better.

15
Claim 4.22. Let G be a connected, unweighted graph, λ2 the second smallest eigenvalue of
2
the normalized Laplacian and r the diameter of G. Then λ2 ≥ r(n−1)2.

Proof. For any two vertices u, v let E u,v be the normalized Laplacian associated to the graph
having a single edge u → v, and P u,v the normalized Laplacian for a path of length at most
r from u to v. Then E u,v ≤ rP u,v , as can be seen from the associated quadratic forms:
(xu − xv )2 ≤ r((xu − xu1 )2 + · · · + (xur−1 −xv )2 ). Since any pair of vertices are connected by a
path of length at most r in G, LKn ≤ r n2 LG , where Kn is the clique on n vertices. Since the
n
normalized Laplacian for Kn has second smallest eigenvalue n−1 , we get the claimed bound
on λ2 .

4.5 Volume estimation

We make a little detour to explore a neat application of random walks — to the problem of
volume estimation. First, let’s discuss a little bit the general framework for sampling and
counting problems.

4.5.1 Approximate counting and sampling

A counting problem is specified by a function f : Σ∗ → N, where Σ is a finite alphabet. For
instance, f could be the function that takes as input a (properly encoded) 3-SAT formula ϕ
and returns the number of satisfying assignments to ϕ. Given an input x ∈ Σ∗ , the goal is
to return f (x).
A sampling problem is a function f : Σ∗ → (S, π) which returns a pair formed by a finite
set S and a distribution π on S. Given an input x ∈ Σ∗ , the goal is to produce an element
s ∈ S distributed according to π.
Counting and sampling problems tend to be very hard. For instance, even though 2-SAT
can be solved in polynomial time (as we saw in the previous lecture), the problem of counting
the number of solutions to a 2-SAT formula is #P -hard, where #P is the analogue of N P
for counting problems. The same holds for counting the number of cycles in an undirected
graph. On the other hand, counting the number of matchings is known to be in P, via a
simple computation.
So we settle for approximation schemes. For counting problems this is called a fully
polynomial randomized approximation scheme (FPRAS). For sampling it is called a fully
polynomial almost uniform sampler (FPAUS). More formally,
Definition 4.23. Given a counting problem f , a FPRAS for f is a randomized algorithm
which given as input x, ε, and δ returns a value y such that with probability at least 1 − δ
(over the algorithm’s private random coins), (1 − ε)f (x) ≤ y ≤ (1 + ε)f (x) in time that is
polynomial in |x|, 1/ε, and log(1/δ).
Given a sampling problem f , a FPAUS for f is a randomized algorithm which given
as input x and δ runs in time polynomial in x and log 1/δ and returns an element s ∈ S

16
that is distributed according to a distribution p = p(x, δ) such that kp − πk1 ≤ δ, where
f (x) = (S, π).

A famous theorem by Jerrum, Valiant and Vazirani shows that approximate counting
and approximate sampling are essentially equivalent:

Theorem 4.24. For “nicely behaved” counting problems (the technical term is “downward
self-reducible”), the existence of an FPRAS is equivalent to the existence of an FPAUS.

Proof sketch. For concreteness we prove the theorem for the problem of counting the number
of satisfying assignments to any formula ϕ.
FPAUS =⇒ FPRAS. Take a polynomial number of satisfying assignments for ϕ sampled by
the FPAUS. This lets us estimate p0 and p1 , the probability that ϕ is satisfiable conditioned
on x1 = 0 and x1 = 1 respectively. Assume p0 ≥ 1/2, the other case being symmetric. Make
a recursive call to approximate the number of satisfying assignments to ϕ|x1 =0 . Let N̂0 be
the estimate returned, and output N̂0 /p0 .
It is clear that this is correct on expectation. Moreover, using that p0 is not too small
a Chernoff bound shows that the estimate obtained from the FPAUS samples will be very
accurate with good probability. It is then not hard to prove by induction that provided a
polynomial number of samples are taken at each of the n recursive calls (where n is the
number of variables), the overall estimate can be made sufficiently accurate, with good
probability.
FPRAS =⇒ FPAUS. First we run the FPRAS to obtain good estimates for the number of
satisfying assignments N to ϕ and N0 to ϕ|x1 =0 . We can assume the estimates returned are
such that N̂0 /N̂ ≥ 1/2, as otherwise we exchange the roles of 0 and 1. Next we flip a coin
with bias N̂0 /N̂ . If it comes up heads we set x1 = 0 and recurse; if it comes up tails we set
x1 = 1 and recurse. Provided the estimates N̂ , N̂0 , . . . are accurate enough the distribution
on assignments produced by this procedure will be close in statistical distance to the uniform
distribution on satisfying assignments.

4.5.2 Volume estimation

Let K ⊆ Rn be convex. Our goal is to estimate the volume Voln (K): we would like to
devise an efficient procedure that, given K, returns two real numbers α, β that are such that
α ≤ Voln (K) ≤ β and β/α is as small as possible. Before diving into this, however, a basic
question — how is K specified?
In order to abstract out the details and provide an algorithm that works in a general
context we will assume that the only access we are given to K is through one of the following
type of “oracle”:

• Membership oracle (resp. weak membership oracle): given a query x ∈ Rn , the oracle
answers whether x ∈ K (resp. x ∈ K or d(x, K) ≥ ε; the oracle is allowed to fail
whenever neither condition is satisfied).

17
• Separation oracle (resp. weak separation oracle): given a query x, the oracle returns
the same answer as the membership oracle, but in case x ∈ / K (resp. d(x, K) ≥ ε) it
also returns an y ∈ R such that y z > y x ∀z ∈ K (resp. y T z > y T x − ε/2 ∀z ∈ K).
n T T

It is always possible to derive a weak separation oracle from a weak membership oracle,
and in this lecture we won’t worry about the difference — in fact all we’ll need is a weak
membership oracle.
But is this enough? What if we query x, and we learn x ∈ / K, with y = (1, 0, . . . , 0)? Then
we know we should increase x1 . Say we double every coordinate, to x0 = 2x. But suppose
we get the same answer, again and again. We never know how far K is! It seems like a
boundedness assumption is necessary, so we’ll assume the following: there exists (known)
values r, R > 0 such that B∞ (0, r) ⊆ K ⊆ B∞ (0, R) with R/r < 2poly(n) , where B∞ (0, r)
denotes the ball of radius r for the `∞ norm (a square box with sides of length 2r). In fact,
by scaling we may as well assume r = 1, and for simplicity we’ll also assume R = n2 . It
is not so obvious at first this is without loss of generality, but it is — with some further
re-scaling and shifting of things around it is not too hard to reduce to this case.
(The idea is to slowly grow a simplex inside K. At each step we can perform a change of
basis so that Conv(e1 , . . . , en ) ⊆ K. Then for i = 1, . . . , n we check if there is a point x ∈ K
such that |xi | ≥ 1 + 1/n2 . If so we include it, rescale, and end up with a simplex of volume at
least 1 + 1/n2 times the previous one, and this guarantees we won’t have to go through too
many steps. If there is no such point, we stop, as we’ve achieved the desired ratio. Finally
we need to check that we can find such point, if it exists, in polynomial time; this is the case
as if x ∈ K then (0, . . . , 0, xi , 0, . . . , 0) ∈ K as well (using convexity and the assumption that
K contains the simplex), so it suffices to call the membership oracle n times.)
Now that we have a proper setup in place, can we estimate Vol(K)? Say we only want
a multiplicative approximation. An idea would be to use the separation oracle to get some
kind of a rough approximation of the boundary of K using hyperplanes, then do some kind
of triangulation, and estimate the volume by counting tetrahedrons (or, in dimension n,
simplices). In fact there is a very strong no-go theorem for deterministic algorithms:
Theorem 4.25. For any deterministic polynomial-time algorithm such that on input a
convex body K ⊆ Rn (specified via a separation oracle) returns α(K), β(K) such that
α(K) ≤ Vol(K) ≤ β(K), it must be that there exists a constant c > 0 and a sequence
of convex bodies {Kn ⊆ Rn }n≥1 such that for all n ≥ 1,
β(Kn ) cn n
≥ .
α(Kn ) log n
This is very bad: even an approximation within an exponential factor is ruled out! Note
however that a key to the above result is that the only access to K is given by a separation
oracle — if we have more knowledge about K then a polynomial-time algorithm might be
feasible (though we don’t know any).
Proof idea. The idea for the proof is to design an oracle that answers the queries made by
any deterministic algorithm in a way that is consistent with the final convex body being one

18
of two possible bodies, K or K ◦ , whose volume ratio is exponentially large; if we manage to
do this then the algorithm cannot provide an estimate that will be accurate for both K and
K ◦.
The oracle is very simple: upon any query x ∈ Rn it is very generous and says that
x/kxk ∈ K, −x/kxk ∈ K, and moreover K is included in the “slab” {y : −kxk ≤ hy, xi ≤
kxk}. Note that these answers are all consistent with K being the Euclidean unit ball.
Now, if points x1 , . . . , xm have been queried, define K to be the convex hull of ±xi /kxi k,
±ej where ej are the unit basis vectors. Define K ◦ = {y : hy, zi ≤ 1 ∀z ∈ K. Then you
can check that the oracle’s answers are all consistent with K and K ◦ . But their volumes are
very different, and one can show that Vol(K ◦ )/Vol(K) is roughly of order (n/ log(m/n))n ;
as long as m is not exponential in n this is exponentially large.
If we allow randomized algorithms the situation is much better:

Theorem 4.26 (Dyer, Frieze, Kannan 1991). There exists a fully polynomial randomized
approximation scheme for approximating Vol(K).

A fully polynomial randomized approximation scheme (FPRAS) means that ∀ε, δ > 0
the algorithm returns a (1 ± ε)-multiplicative approximation to the volume with probability
at least 1 − δ, and runs in time poly(n, 1/ε, log 1/δ). Volume estimation is one of these
relatively rare problems for which we have strong indication that randomized algorithms can
be exponentially more efficient than deterministic ones (primality testing used to be another
such problem before the AKS algorithm was discovered!).
The algorithm of Dyer, Frieze and Kannan had a running time that scaled like ∼ n23 .
Since then a lot of work has been done on the problem, and the current record is ∼ n5 . In
principle it’s possible this could be lowered even more, to say ∼ n2 ; there are no good lower
bounds for this problem.
Proof sketch. The main idea is to use random sampling. For instance, suppose we’d place
K inside a large, fine grid, as in Figure 4.4.

19
Figure 4.4: The region K is the feasible region.

We could then run a random walk on the grid until it mixes to uniform. This will take
time roughly n(R/δ)2 , where δ is the grid spacing; given we assumed r = 1 something like
δ = 1/n2 would be reasonable.
Exercise 4. Show that the mixing time of the lazy random walk on {1, . . . , N }n , the n-
dimensional grid with sides of length N , is O(n2 N 2 ).
At the end of the walk we can call the membership oracle to check if we are in K. Since
Voln (K)
Prob(x ∈ K) ∼ Vol n (grid)
, by repeating the walk sufficiently many times we’d get a good
estimate. While this works fine in two dimensions (you can estimate π = Area(unit disk) in
this way!), in higher dimensions it fails dramatically, as all we know about Vol(K) is that
it is at least 2n (since it contains the unit ball for `∞ ), but the grid could have volume as
large as (2R)n = (2n2 )n , so even assuming perfectly uniform mixing the probability that we
actually obtain a point in K is tiny, it’s exponentially small.
This is still roughly how we’ll proceed, but we’re going to have to be more careful. There
are three important steps. Here is a sketch:

• Step 1: Subdivision.
Set K0 = B∞ (0, 1) ∩ K = B∞ (0, 1), K1 = B∞ (0, 21/n ) ∩ K, . . . , K2n log n = B∞ (0, n2 ) ∩
K = K. Then
Vol(K2n log n ) Vol(K2n log n−1 ) Vol(K2 ) n
Vol(K) = Vol(K2n log n ) = · ··· ·2 ,
Vol(K2n log n−1 ) Vol(K2n log n−2 ) Vol(K1 )

20
since Vol(K1 ) = 2n . So, we have reduced our problem to the following: given K ⊆ L
both convex such that Vol(K) ≥ 21 Vol(L), estimate Vol(K)/Vol(L). This eliminates
the “tiny ratio” issue we had initially, but now we have another problem: the enclosing
set L is no longer a nice grid, but it is an arbitrary convex set itself. Are we making
any progress?
• Step 2: A random walk.
Our strategy will be as follows. Run a random walk on a grid that contains L, such that
the stationary distribution of the random walk satisfies the two conditions that Pr(x ∈
L) is not too small, and the stationary distribution is close to uniform, conditioned on
lying in L. If we can do this we’re done: we repeatedly sample from the stationary
distribution sufficiently many times that we obtain many samples in L, and we check
the fraction of these samples that are also in K: Vol(K)
Vol(L)
∼ Pr(x∈K)
Pr(x∈L)
= Pr(x ∈ K|x ∈ L).
So the challenge is to figure out how to define this random walk around L. Here is a
natural attempt. Start at an arbitrary point x(0) ∈ L, say the origin. Set x(1) to be
a random neighbor of x(0) on the grid, subject to x(1) ∈ L (we have 2n neighbors to
consider, and for each we can call the membership oracle for L). Repeat sufficiently
many times. This is the right idea — note that we really want to stay as close to L
as possible, because if we allow ourselves to go outside too much we’ll get this “tiny
ratio” issue once more — but the boundary causes a lot of problems:
(a) Some points are never reached. L could be very pointy, in which case there could
be a grid point that lies in L, but none of its neighbors does. And this cannot be
solved just by making the grid finer; it is really an issue with the kinds of angles
that are permitted in L.
(b) The degree of the graph underlying our walk is not constant (it tends to be smaller
close to the boundary), so the stationary distribution will not be uniform.
(c) Some grid cubes have much bigger intersection with L than others.
It turns out we can fix all of these issues by doing a √
bit of “smoothing
√ out” on L. Let
0
δ be the width of the grid, and consider L = (1 + δ n)L, where δ n is the diameter
of a cube. Assuming δ ≤ n−2 , this doesn’t blow up the volume by much, so in terms
of volume ratio we’re fine. Moreover, you can check easily that:
– Any grid point inside L has all of its neighbors in L0 ,
– All p ∈ L belong to a grid cube ⊆ L0 .
These two points get rid of issue (a) above: all points in L are now reached by the
walk. Moreover, we can easily get rid of the degree issue by adding self loops. This
guarantees that the stationary distribution will be uniform on grid points in L, clearing
(b). Remains (c), the issue of uneven intersection between grid cubes and L. For this
we do the following:
– Do a random walk on δZn ∩ L0 as described before.

21
– Arrive at a random grid point p. Choose random vector q ∈ B∞ (0, 1) and output
the point p + δq if it is in L. Otherwise, restart the walk.

As a result the stationary distribution is uniform on L: if we have a cube C that

partially intersects L its points are sampled with probability precisely ∼ Vol(L ∩ C).
We also need to make sure that there are not too many restarts — what if Vol(L ∩
C)/Vol(C) is again tiny? Now here we can show this is ok if it happens, because the
point is we’re trying to estimate the whole volume of L, not just the volume for that
intersection. If it’s tiny, we never see it, but that’s fine.

• Step 3: Mixing Time.

So far the random walk would work even if K is not convex, in the sense that as long
as the walk mixes it will let us estimate the ratio Vol(K ∩ L)/Vol(L). But now we need
to understand the conductance of our graph: by Cheeger’s inequality and the analysis
of mixing time we saw in the previous lecture, as long as the conductance is at least
1/poly, we are good to go.
Since the graph is regular, given a set of vertices S such that |S| ≤ n/2, we have
φ(S) = |∂S|
d|S|
. If V is the volume of a cube, then d|S| ∼ Vol(S)
V
, where Vol(S) is the
sum of volumes of cubes at vertices in S, and |∂S| ∼ Area(∂S)
A
, where A is the surface
area ((n − 1)-volume) of a single cube. Since V /A = 2δ ≈ 2/n2 , we just need to lower
bound Area(S)
Vol(S)
. Fix a value of Vol(S). How small can Area(S) be? This is called an
isoperimetric inequality.

Theorem 4.27. Let L ⊆ Rn be convex with diameter d . Let S ⊆ L be such that

Vol(S) ≤ Vol(L)/2. Then, Area(S) ≥ d1 Vol(S).

If we look at a body L that is a very thin needle, we see that for S that cuts across half
the needle the volume will be roughly (d/2) times the area, so the estimate provided
in the theorem is essentially optimal. The proof of the theorem is not hard but it does
involve quite a bit of re-arranging to argue that the needle is indeed the “worst-case
scenario”, and we’ll skip it.
√ √
In our setting we have B∞ (0, 1) ⊆ K ⊆ L ⊆ L0 = (1 √
+ δ n)L ⊆ (1 + δ n)B∞ (0, n2 ).
2
So, our grid has sides of length at most . 2(1+δδ n)n = O(n4 ), and the diameter
is O(n4 ). Thus the isoperimetry theorem implies that our random walk will mix in
polynomial time.

4.6 Expander Graphs

Suppose that we have a collection of points V = {1, ..., N }, where N is very large. Our goal
is to uniformly sample points from V using as little space as possible. To do this, we place

22
a graph on V and select the points with a random walk. What types of graphs would be
good for this procedure? We could throw in all edges — this will optimize the mixing time.
But then at any step we need to choose amongst N possible neighbors, and this can be a
complicated task; especially in situations such as we encountered in the last lecture, where
determining if a point is a neighbor requires to do some computation (in that case, we had to
make a call to the membership oracle, which could be expensive). To reduce the number of
choices that the random walk has to make at each step, we want to minimize the degree. But
we don’t want to sacrifice the mixing time either. Expander graphs are often used because
they reach the optimal tradeoff, achieving the best possible mixing time (or more precisely,
the largest possible second eigenvalue) for a fixed degree.

4.6.1 Definitions of expanders

There are several definitions of expanders which are all more-or-less, but not quite, equiva-
lent. The one that we will focus on is that of (two-sided) spectral expanders.

Definition 4.28. Given d ∈ N and γ ∈ (0, 1), a one-sided (resp. two-sided) (d, γ) spectral
expander is a graph G such that:

• G is d-regular.

• |λ2 (L̄) − 1| ≤ γ (resp. ∀i ≥ 2, |λi (L̄) − 1| ≤ γ), where L is the normalized Laplacian of
G.

Two-sided expanders have all their eigenvalues close to 1, except λ1 = 0; one-sided ex-
panders can also have larger eigenvalues, up to the maximum possible of 2 (and in particular
they can be bipartite).

Remark 4.29. Using the mixing lemma from last lecture we find that the lazy random walk
on expanders mixes fast:
log( n )

τ = O .
1−γ

For the case of expanders (and especially two-sided expanders, which are guaranteed to
not be bipartite) we will usually run a normal random walk with walk matrix W = AD−1 ,
rather than the lazy random walk with walk matrix W = I− 21 D1/2 L̄D−1/2 that we considered
previously.

There are two other common definitions of expanders:

Definition 4.30. A (d, γe ) edge expander is a graph G such that:

• G is d-regular with degree d.

|∂S|
• φ(G) = min n ≥ γe .
1≤|S|≤ 2 d·|S|

23
Definition 4.31. A (d, γv ) vertex expander is a graph G such that:

• G is d-regular with degree d.

|N (S)|
• σ(G) = min n |S|
≥ γv , where N (S) is the set of all neighbors of S lying outside of
1≤|S|≤ 2
S. (Here σ(G) is called the vertex expansion of G.)

The following relation holds between these two notions:

γv
≤ γe ≤ γv .
d
The first inequality follows from the fact that the number of edges entering S is at least the
size of its neighborhood. The second inequality follows from the size of the neighborhood
being at least the number of edges divided by the degree.

4.6.2 Limits on expansion

From this point on we will focus on two-sided spectral expanders. Given a target degree d,
how small can we make γ?

Theorem 4.32. For any d-regular graph,

r
1 1
γ := max |λi (L̄) − 1| ≥ 2(1 − on (1)) − 2
i=2,...,n d d
r
1
∼2 .
d
We prove a slightly weaker version of this bound, which is missing the factor 2:
Proof. If A is the adjacency matrix of a graph G then for any k ≥ 1 the (i, j)-th entry (Ak )ij
counts the number of paths from i to j with length exactly k. So (A2 )ii is just the number
of edges incident on the i-th vertex, i.e. its degree. Therefore Tr(A2 ) = nd. We also know
that for a symmetric matrix,
n
X
Tr(A2 ) = λ2i (A)
i=1
n
X
= λ2i (d(I − L̄))
i=1
n
X
2
=d (1 − λi (L̄))2
i=1
≤ d + d2 (n − 1)γ 2 .
2

24
Therefore d2 (n − 1)γ 2 ≥ nd − d2 , i.e.
n 1
γ2 ≥ − .
d(n − 1) n − 1
q
For large n the last term is very small, and we get the bound γ ≥ (1 − on (1)) d1 , which is
off by just a factor 2.
Ramanujan graphs are graphs that achieve the optimal expansion for a given degree:

Definition 4.33. A Ramanujan graph is a d-regular graph such that

r
1 1
γ=2 − 2.
d d

4.6.3 Constructions of expanders

Do such good expanders as Ramanujan graphs even exist? The probabilitic method can be
used to show that random d-regular graphs are good expanders with high probability:

Theorem 4.34 (Friedman). ∀ > 0, a random d-regular graph on n vertices satisfies:

q
1 1
Pr γ ≤ 2 d − d2 + = 1 − on (1).

Unfortunately random graphs are not so useful in practice because they are just that:
random. In particular, working with a random graph requires to first compute the whole
graph, then store it in memory and perform the random walk. But recall that for our typical
applications of expanders we are thinking of working with potentially large graphs for which
we’d like to be able to compute the neighborhood structure locally very efficiently. Such a
construction was presented by Margulis; efficiency aside it is the first explicit construction
of a good (meaning that both d and γ are constants independent of the size of the graph)
family of expanders:

Example 4.35 (Margulis ’73). Take V = Zm × Zm for some integer m. For each vertex
(x, y) ∈ V connect it to the following eight vertices:


 (x, y ± x)

(x, y ± (x + 1))
N ((x, y)) =


 (x ± y, y)
(x ± (y + 1), y)


Theorem 4.36 (Margulis). The graph given above is a (8, γ) two-sided spectral expander
for some γ < 1 independent of m.

25
The construction of the graph is very simple, but the proof of the theorem is very difficult.
This construction also provides a very fast mixing time: it provides a way to mix a 2-
dimensional m × m grid in time O(log m), whereas as we’ve seen the regular random walk
would require time O(m2 ). This only requires us to double the degree and throw in a few
“long-distance hops”.
The first construction of explicit Ramanujan expanders was given by Lubotzky, Philips
and Sarnak:
Theorem 4.37 (LPS ’88). for every n = p + 1 where:
• p is a prime such that p ≡ 1[4],
• d = q m + 1, q prime, m integer,
q
there exists a (d, γ = 2 d1 − d12 ) spectral expander.

Very recent works by Marcus, Spielman and Srivastava give explicit constructions of
bipartite expanders for every possible degree d and size n. It is still an open problem whether
(non-bipartite) Ramanujan expanders exist for all possible degrees.

4.7 Derandomization
A couple lectures ago we saw a randomized algorithm that could efficiently solve a problem,
volume estimation, that we also argued was too hard to solve deterministically. One might
ask whether it is still possible to remove the randomness from such algorithms to obtain
efficient deterministic algorithms. The process of reducing or eliminating randomness from
an algorithm is called derandomization. The question of derandomizing all polynomial-
time algorithms is the question as to whether P = BPP, where:
Definition 4.38. P is the class of problems that can be solved deterministically in polyno-
mial time. BPP is the class of problems for which there is a randomized polynomial-time
algorithm that makes the correct decision for every input with probability ≥ 23 .
In the last lecture with volume estimation we seemed to show that for a specific problem
in BPP it was impossible to have an equivalent deterministic algorithm — thus P 6= BPP?
However the impossibility result relied on the assumption that the only access to the convex
set was through a membership/separation oracle. In “real life” it will in general be the case
that other details of the problem are available, and may be used to construct a deterministic
polynomial-time algorithm. This is what makes proving separations of complexity classes
difficult!

4.7.1 Pseudorandom Generators

A popular approach for trying to show that P = BPP is constructing good pseudorandom
generators (PRG).

26
Definition 4.39. A pseudorandom generator is a function g : {0, 1}s → {0, 1}n , where
n ≥ s and s is called the seed length. We say that a PRG g -fools a class of functions
C ⊆ {f : {0, 1}n → {0, 1}} if ∀f ∈ C:

Pr f (x) = 1 − Pr f (g(y)) = 1 ≤ ,
x∈{0,1}n y∈{0,1} s

where both probabilities are taken over a uniformly random choice of x, y respectively. (In-
tuitively, this means that from the point of view of any function f ∈ C, the output of g looks
random)

Here is how we might use this to prove that P = BPP. Let C be the class of all functions
that can be computed by a polynomial time-randomized algorithm with success probability
at least 2/3. Such algorithms can be understood as functions f of two variables, the input x
to the problem and the randomness r. Fixing all possible inputs x of a certain length gives
us a large class of functions C 0 which are functions of the randomness only. If we design a
PRG that -fools the class C 0 with < 31 and s = O(log n), then we could derandomize BPP
by trying all 2s = poly(n) possible seeds to our PRG and computing a good estimate of the
probability that f would accept on any input.
Pseudorandom generators are hard to construct. Nisan and Wigderson in 94 proved the
following “hardness vs randomness” trade-off. Here we state a strong form of their theorem
that incorporates a worst-case to average-case reduction by Impagliazzo and Wigderson.

Theorem 4.40 (NW’94,IW’97). Suppose there is a language L in EXP and δ > 0 such that
for all n large enough, the minimal size of a Boolean circuit that computes Ln is at least 2δn .
Then then there is a family of generators gm : {0, 1}O(log m) → {0, 1}m that are computable in
poly(m) time and that 1/8-fool the class of functions computable by circuits of size at most
2m (in particular, P = BPP).

4.7.2 Expanders for derandomization

We will use expanders, not with the intent of producing a PRG, but to derandomize the
following problem: Suppose that we have a BPP algorithm which given an input generates
1
random bits in {0, 1}r and has an error rate of 100 . If we run the algorithm t times and take
1 t/2
the majority, then we can reduce the error to ∼ 2t ( 100 ) . Doing so requires t · r bits. We
will show that by using expanders, we can get a similar result using much fewer bits.

Theorem 4.41. Given any BPP algorithm which requires r random bits and has error ≤
1
100
, we can construct an algorithm solving the same problem with error ≤ ( √25 )t and using
only r + 9t random bits.

The idea for the algorithm is very simple. We fix a (d, γ) expander G on V = {0, 1}r with
1
d ≤ 400 and γ ≤ 10 . We haven’t seen how to construct such a thing but I promise you that
it exists, and it’s not that hard to get. We then pick a random vertex to start from using r
bits, and for each of t iterations we perform a random walk to a neighbor using log2 (400) ≈ 9

27
bits. For each of the vertices traversed we run the algorithm with the corresponding bits
and take the majority outcome at the end.

Note that we do not need to actually compute the whole graph (which would have
size exponential in r); we only need to be able to access a small number of vertices and
their neighbors. The hope is that running the BPP algorithm on the highly correlated
pseudo-random bits generated by this procedure will generate similar results to uncorrelated
random bits. This is not trivial, and in particular it does not follow from the analysis of the
mixing time given in the previous lecture. That analysis only shows that mixing happens in
O(log 2r ) = O(r) steps, but here t could be much smaller than r.
Proof. Given an input x, we know that for any set of random bits y,
1
Pr Alg fails on input x and randomness y ≤ .
y∈{0,1}r 100
Fix an input x, and let X = {y : Alg fails on y}, Y = {0, 1}r \ X, v0 , v1 , . . . , vt the vertices
1
selected from the random walk and S = {i : vi ∈ X}. Note that X is at most a fraction 100
of the graph and that since we are taking majority at the end, Pr(f ail) = Pr(|S| ≥ t/2).
We will use a regular “non-lazy” random walk, as there is no point in staying at a vertex.
This gives us a random walk matrix W = AD−1 = d1 A since the graph is d-regular.
Let DX be the diagonal matrix where entry i is 1 if i ∈ X and 0 otherwise, and DY the
diagonal matrix where entry i is 1 if i ∈ Y and 0 otherwise. So DX + DY = I. Given a
certain set of designated walk steps R ⊆ {0, . . . , t} the probability that all such steps are
errors and all other steps are correct is:
(
1 DX if i ∈ R
Pr(R = S) = 1T Dt W · · · D1 W D0 , where Di = .
n DY if i ∈ /R

So for any fixed R,

1
Pr(R = S) ≤ k1k2 · kDt W k · · · kD1 W k · kD0 W k 2
n
√ 1
= n · kDX W k|R| · kDY W kt−|R| √
n
≤ kDX W k|R| ,

where the last inequality follows since kW k = k Ad k ≤ 1 → kDY W k ≤ 1. Now it remains to

bound kDX W k. Consider any vector x. We can use the decomposition x = α1 + y, y ⊥ 1
(these are not the x and y from the original problem description!). Then

kxk2 = α2 n + kyk2 ,

and
DX W x = αDX W 1 + DX W y.

28
To bound the first term, use that W 1 = 1 and so
p kxk p kxk
kαDX W 1k = |α| |X| ≤ √ |X| ≤ .
n 10

To bound the second term,

1 kxk
kDX W yk ≤ kW yk ≤ kyk ≤ ,
10 10
where the second inequality follows since y ⊥ 1. Thus we have shown that kDX W k ≤ 15 .
Finally, putting everything together,
t
Pr(f ail) = Pr |S| ≥
X 2
= Pr[R = S]
R:|R|≥ 2t
X
≤ kDX W k|R|
R:|R|≥ 2t

1 t
≤ 2t ( ) 2 .
5

4.7.3 Polynomial Identity Testing

There are relatively few candidates problems known that could provide a separation between
P and BPP. One example is volume estimation, that we saw a couple lectures ago. Another
example is, given an integer n (in unary), to generate a prime number 2n ≤ p ≤ 22n : while
thanks to the famous AKS algorithm it is possible to test if a given number is prime in
polynomial time, it remains unknown how to (deterministically) find one. (Thanks to the
prime number theorem there is an easy randomized algorithm: generate a random number
and test if it is prime; this should work with probability 1 − O(1/n).) A third example is the
problem of factoring a univariate polynomial over a finite field into irreducible factors. There
are efficient randomized algorithms for this problem (e.g. the Cantor-Zassenhaus algorithm),
but no worst-case polynomial time algorithm is known (although there are deterministic
algorithms, such as Shoup’s algorithm, that take exponential time for some polynomials,
but have polynomial expected running time when the polynomial is chosen at random).
Here we focus on a fourth example, polynomial identity testing (PIT):

Definition 4.42 (Polynomial Identity Testing). Given a (large) finite field F and an n-variate
polynomial p ∈ F[x1 , ..., xn ] provided as an arithmetic circuit (eg. (x1 +3x2 −x3 )(3x1 +x4 −1)),
determine whether p ≡ 0, i.e. all coefficients of p when fully expanded are zero.

29
To solve the above problem we could simply multiply all the terms out and check whether
the final coefficient of every term is 0, but that could take exponential time in the size of the
circuit specifying p. No known deterministic algorithm can solve PIT in polynomial time,
but there is a simple randomized algorithm that can.
Lemma 4.43 (Schwartz-Zippel). Given a non-zero p ∈ F[x1 , ..., xn ] with degree(p) ≤ d, let
S ⊆ F and (s1 , ...sn ) be n points selected independently and uniformly at random from S.
Then:
d

Pr p(s1 , ..., sn ) = 0 ≤ |S| .
Proof. The proof is by induction on n. For the base case, n = 1, p is a polynomial in one
variable and thus has at most d roots. Hence Pr(p(s1 ) = 0) ≤ d/|S|.
For the inductive step, we let k be the largest degree of x1 in p and write
p(s1 , . . . , sn ) = p1 (s2 , . . . , sn )sk1 + p2 (s1 , . . . , sn ),
where the maximum degree of p1 (s2 , . . . , sn ) is at most d − k and the maximum degree of s1
in p2 is strictly less than k. For the purposes of analysis, we can assume that s2 , . . . , sn are
chosen first. Then, we let E be the event that p1 (s2 , . . . , sn ) = 0. There are two cases:
• Case 1: E happens. From the induction hypothesis applied to M (a polynomial in
n − 1 variables), we know that Pr(E) ≤ (d − k)/|S|.
• Case 2: E does not happen. In this case, we let p0 be the polynomial in one variable
x1 that remains after x2 = s2 , . . . , xn = sn are substituted in p(x1 , . . . , xn ) . Since
p1 (s2 , . . . , sn ) 6= 0, the coefficient of of xk1 is non-zero, so p0 is a non-zero polynomial of
degree k in one variable. It thus has at most k roots, so Pr(p0 (r1 ) = p(r1 , . . . , rn ) =
0|¬E) ≤ k/|S|.
Putting the two cases together, we have

Pr p(s1 , . . . , sn ) = 0 = Pr p(s1 , . . . , sn ) = 0|E Pr E

+ Pr p(s1 , . . . , sn ) = 0|¬E Pr ¬E
≤ (d − k)/|S| + k/|S| = d/|S|.

Thus, if we take the set S to have cardinality at least twice the degree of our polynomial,
we can bound the probability of error by 1/2. This can be reduced to any desired small
number by repeated trials, as usual. Note that the algorithms also works over finite fields,
provided the field size is larger than the degree of the polynomial. Otherwise, the algorithm
could not possibly work: for example, the polynomial p(x) = x3 −x is not the zero polynomial,
but it evaluates to 0 on every point in F3 .
Is there an efficient deterministic algorithm for identity testing? A rather devastating
negative result was proved by Kabanets and Impagliazz, who showed that if there exists a
deterministic polynomial time algorithm for checking polynomial identities, then either:

30
• NEXP does not have polynomial size circuits; or

• The permanent cannot be computed by polynomial-size arithmetic circuits.

This means that an efficient derandomization of the above Schwartz-Zippel algorithm (or
indeed, any efficient deterministic algorithm for identity testing) would necessarily entail a
major breakthrough in complexity theory.

Using the Schwartz-Zippel algorithm

We give an application of the Schwartz-Zippel lemma to bipartite matching. Although
bipartite matching is easy to solve in deterministic polynomial time using flow techniques, it
remains an important open question whether there exists an efficient parallel deterministic
algorithm for this problem.
Bipartite matching is the following problem: Given a bipartite graph G = (V1 , V2 , E),
where |V1 | = |V2 | = n, and all edges connect vertices in V1 to vertices in V2 , does G contain
a perfect matching, that is, a subset of exactly n edges such that each vertex is contained in
exactly one edge?
To obtain an efficient randomized algorithm for this problem, we begin with a definition:

Definition 4.44 (Tutte matrix). The Tutte matrix AG corresponding to the graph G is the
n × n matrix [aij ] such that aij is a variable xij if (i, j) ∈ E, and 0 otherwise.

Claim 4.45. G contains a perfect matching if and only if det(AG ) 6= 0.

Proof. By definition, det(AG ) = σ sgn(σ) ni=1 aiσ(i) , where the sum is over all permuta-
P Q
tions σ of {1, . . . , n}. Note that each monomial in this sum corresponds to a possible perfect
matching in G, and that the monomial will be non-zero if and only if the corresponding
matching is present in G. Moreover, every pair of monomials differs in at least one (actually,
at least two) variables, so there can be no cancellations between monomials. This implies
that det(AG ) 6≡ 0 iff G contains a perfect matching.
The above Claim immediately yields an efficient algorithm for testing whether G con-
tains a perfect matching: simply run the Schwartz-Zippel algorithm on det(AG ), which is
a polynomial in n2 variables of degree n. Note that the determinant can be computed in
O(n3 ) time by Gaussian elimination. Moreover, the algorithm can be efficiently parallelized
using the standard fact that an n × n determinant can be computed in O(log2 n) time on
O(n3.5 ) processors.

Exercise 5. Generalize the above to a non-bipartite graph G. For this, you will need the
skew-symmetric matrix AG = [aij ] defined as:

 xij if (i, j) ∈ E and i < j;
aij = −xij if (i, j) ∈ E and i > j;
0 otherwise.


31
[Hint: You should show that the above Claim still holds, with this modified definition
of AG . This requires a bit more care than in the bipartite case, because the monomials in
det(AG ) do not necessarily correspond to perfect matchings, but to cycle covers in G. In this
case some cancellations will occur.]

A History of Mechanical Engineering - Ce Zhang PDF
100% (1)
A History of Mechanical Engineering - Ce Zhang PDF
742 pages
The Laplacian Matrix of A Graph
No ratings yet
The Laplacian Matrix of A Graph
4 pages
Mechanics 1 - Top 500 Question Bank For JEE Main by MathonGo
No ratings yet
Mechanics 1 - Top 500 Question Bank For JEE Main by MathonGo
60 pages
Arc Welding Process PDF
No ratings yet
Arc Welding Process PDF
12 pages
Tri Rated Cable
No ratings yet
Tri Rated Cable
2 pages
EC2-Cracked Deflection Example
No ratings yet
EC2-Cracked Deflection Example
9 pages
Spectral Graph Theory
No ratings yet
Spectral Graph Theory
4 pages
Fun With Magnets
No ratings yet
Fun With Magnets
2 pages
Load Tree Calcs
No ratings yet
Load Tree Calcs
3 pages
Spectral Graph
No ratings yet
Spectral Graph
53 pages
Machine Learning: Spectral Clustering
No ratings yet
Machine Learning: Spectral Clustering
79 pages
Ellie Draves Unc Graph Theory Final
No ratings yet
Ellie Draves Unc Graph Theory Final
19 pages
New Matrices For Spectral Hypergraph Theory, II: R. Vishnupriya, R. Rajkumar
No ratings yet
New Matrices For Spectral Hypergraph Theory, II: R. Vishnupriya, R. Rajkumar
41 pages
The Laplacian Eigenvalues of Graphs: A Survey
No ratings yet
The Laplacian Eigenvalues of Graphs: A Survey
35 pages
Physics Manual XI
No ratings yet
Physics Manual XI
23 pages
A Spectral Approach To Bandwidth
No ratings yet
A Spectral Approach To Bandwidth
19 pages
2SC5446 PDF
No ratings yet
2SC5446 PDF
6 pages
Food Process Engineering Lab 3. BOILER OPERATION
No ratings yet
Food Process Engineering Lab 3. BOILER OPERATION
22 pages
Unsolved Problems in Spectral Graph Theory: Lele Liu and Bo Ning
No ratings yet
Unsolved Problems in Spectral Graph Theory: Lele Liu and Bo Ning
30 pages
Spectral - Graph - Theory - 3
No ratings yet
Spectral - Graph - Theory - 3
27 pages
圈子空间和切子空间3
No ratings yet
圈子空间和切子空间3
11 pages
1 s2.0 S0925346723010212 Main
No ratings yet
1 s2.0 S0925346723010212 Main
11 pages
Project
No ratings yet
Project
38 pages
Lect14 18
No ratings yet
Lect14 18
9 pages
Introduction To Spectral Graph Theory, Cheegers Inequality
No ratings yet
Introduction To Spectral Graph Theory, Cheegers Inequality
7 pages
Zlemm 3
No ratings yet
Zlemm 3
7 pages
Bounds For Eigenvalues of The Adjacency Matrix
No ratings yet
Bounds For Eigenvalues of The Adjacency Matrix
18 pages
Connected Graphs With Large Multiplicity of in The Spectrum of The Eccentricity Matrix
No ratings yet
Connected Graphs With Large Multiplicity of in The Spectrum of The Eccentricity Matrix
21 pages
MBegue Prelim
No ratings yet
MBegue Prelim
43 pages
Spectral Theory
No ratings yet
Spectral Theory
11 pages
Spectral Decomposition For Graphs
No ratings yet
Spectral Decomposition For Graphs
5 pages
A Problem Sheet in Complex Analysis
No ratings yet
A Problem Sheet in Complex Analysis
8 pages
Carl Schildkraut
No ratings yet
Carl Schildkraut
20 pages
Ebrahimi B Et Al. - 2008 - On The Sum of Two Largest Eigenvalues of A Symmetric Matrix
No ratings yet
Ebrahimi B Et Al. - 2008 - On The Sum of Two Largest Eigenvalues of A Symmetric Matrix
7 pages
Problems and Solutions in Advanced Graph Theory
No ratings yet
Problems and Solutions in Advanced Graph Theory
6 pages
Heavy Duty Pavement Design: DR Wei Liu Senior Engineer Fugro-PMS LTD, New Zealand
No ratings yet
Heavy Duty Pavement Design: DR Wei Liu Senior Engineer Fugro-PMS LTD, New Zealand
32 pages
Lecture 3
No ratings yet
Lecture 3
6 pages
Seminar Sec 1 PDF
No ratings yet
Seminar Sec 1 PDF
9 pages
Zeta Quality Report
No ratings yet
Zeta Quality Report
8 pages
DoRoOs Final
No ratings yet
DoRoOs Final
20 pages
Random Walks
No ratings yet
Random Walks
6 pages
Monotonicity Properties of Certain Laplacian Eigenvectors
No ratings yet
Monotonicity Properties of Certain Laplacian Eigenvectors
5 pages
An Introduction To Ramanujan Graphs
No ratings yet
An Introduction To Ramanujan Graphs
17 pages
Skee4613 20162017-2
No ratings yet
Skee4613 20162017-2
7 pages
245118-23ME1301 - Strength of Materials
No ratings yet
245118-23ME1301 - Strength of Materials
3 pages
Algebra 1 Practice Problems
No ratings yet
Algebra 1 Practice Problems
5 pages
Lectures On Spectral Graph Theory PDF
No ratings yet
Lectures On Spectral Graph Theory PDF
25 pages
A Lower Bound For The Laplacian Eigenvalues of A Graph - Proof of A Conjecture by Guo
No ratings yet
A Lower Bound For The Laplacian Eigenvalues of A Graph - Proof of A Conjecture by Guo
5 pages
Short Answer Type Questions I
No ratings yet
Short Answer Type Questions I
12 pages
Lecture 1
No ratings yet
Lecture 1
6 pages
GABB18 Paper 5
No ratings yet
GABB18 Paper 5
8 pages
2004 01163 PDF
No ratings yet
2004 01163 PDF
17 pages
Eigen Values Interlacing Result For Various Types of Graphs
No ratings yet
Eigen Values Interlacing Result For Various Types of Graphs
20 pages
Laplacian Matrices of Graph
No ratings yet
Laplacian Matrices of Graph
34 pages
Tr1 g4-5 Si Poh Yi
No ratings yet
Tr1 g4-5 Si Poh Yi
4 pages
3.1 Some Simple Properties: G G G G G G G I
No ratings yet
3.1 Some Simple Properties: G G G G G G G I
11 pages
FR CGeolM 23 Engl 120124
No ratings yet
FR CGeolM 23 Engl 120124
11 pages
CS168 Spectral Graph Theory (Roughgarden & Valiant)
No ratings yet
CS168 Spectral Graph Theory (Roughgarden & Valiant)
11 pages
On Sum of Powers of Laplacian
No ratings yet
On Sum of Powers of Laplacian
9 pages
Lecture 8
No ratings yet
Lecture 8
3 pages
Mini-Project #6: Instructions
No ratings yet
Mini-Project #6: Instructions
3 pages
1 Outline: 18.409 An Algorithmist's Toolkit
No ratings yet
1 Outline: 18.409 An Algorithmist's Toolkit
10 pages
c3 Course Notes and Experimental Requirements
No ratings yet
c3 Course Notes and Experimental Requirements
24 pages
Walk Counts and The Spectral Radius of Graphs
No ratings yet
Walk Counts and The Spectral Radius of Graphs
26 pages
Grit Remover Design
No ratings yet
Grit Remover Design
4 pages
Solutions of Practice Questions: R X y y X
No ratings yet
Solutions of Practice Questions: R X y y X
40 pages
Econoveyor Manual
No ratings yet
Econoveyor Manual
25 pages
PPH 1607053 Ms Proofs
No ratings yet
PPH 1607053 Ms Proofs
9 pages
CS168: The Modern Algorithmic Toolbox Lectures #11: Spectral Graph Theory, I
No ratings yet
CS168: The Modern Algorithmic Toolbox Lectures #11: Spectral Graph Theory, I
7 pages
Jiang J
No ratings yet
Jiang J
16 pages
F1 18 Teaching Manual Issue3
No ratings yet
F1 18 Teaching Manual Issue3
17 pages
Adjacency Matrix Notes
No ratings yet
Adjacency Matrix Notes
3 pages
Zero Modularity Graphs
No ratings yet
Zero Modularity Graphs
9 pages
Cis515 15 Spectral Clust AppA
No ratings yet
Cis515 15 Spectral Clust AppA
11 pages
On The Signless Laplacian Spectral Radius of Some Graphs
No ratings yet
On The Signless Laplacian Spectral Radius of Some Graphs
3 pages
The Nordhaus-Gaddum Type Inequalities of A Alpha Matrix
No ratings yet
The Nordhaus-Gaddum Type Inequalities of A Alpha Matrix
8 pages
Eigenvalues and The Laplacian of A Graph
No ratings yet
Eigenvalues and The Laplacian of A Graph
21 pages
Lecture 11: Expanders: Definition 0.1. A Bipartite Graph G (I O E) Is Called An (N, D, C) - Expander If - I - O - N
No ratings yet
Lecture 11: Expanders: Definition 0.1. A Bipartite Graph G (I O E) Is Called An (N, D, C) - Expander If - I - O - N
11 pages
Sheet 5
No ratings yet
Sheet 5
7 pages
Theorist's Toolkit Lecture 6: Eigenvalues and Expanders
No ratings yet
Theorist's Toolkit Lecture 6: Eigenvalues and Expanders
9 pages
An Algorithmists Toolkit Lecture 1
No ratings yet
An Algorithmists Toolkit Lecture 1
5 pages
Bapat Simple PDF
No ratings yet
Bapat Simple PDF
5 pages
TSS Spec Sheet LP 750
No ratings yet
TSS Spec Sheet LP 750
1 page
IGCSE - Thin Convergin Lenses
No ratings yet
IGCSE - Thin Convergin Lenses
3 pages
Tarea I FQ II SOLUCIÓN PDF
No ratings yet
Tarea I FQ II SOLUCIÓN PDF
7 pages
Adjacency Matrix
No ratings yet
Adjacency Matrix
5 pages
Heat Transfer Examples
No ratings yet
Heat Transfer Examples
4 pages
Lecture 10: Introduction To Algebraic Graph Theory: Example 1.1 (The Spectrum of The Complete Graph)
No ratings yet
Lecture 10: Introduction To Algebraic Graph Theory: Example 1.1 (The Spectrum of The Complete Graph)
11 pages
Theory of Approximation
From Everand
Theory of Approximation
N. I. Achieser
No ratings yet
Differential Forms
From Everand
Differential Forms
Henri Cartan
5/5 (2)

Spectral - Graph - Theory - 5

Uploaded by

Spectral - Graph - Theory - 5

Uploaded by

Chapter 4

Introduction to Spectral Graph

4.1 Matrices associated to a graph

For the normalized Laplacian, we have the following claim.

If G is d-regular, then this simplifies to

Figure 4.1: A single edge

The adjacency matrix is  

The spectrum of L is given by λ1 = 0, λ2 = 2. The corresponding eigenvectors are

Figure 4.2: The triangle graph

The eigenvalues of L are 0, 3/2, 3/2 with corresponding eigenvectors

Figure 4.3: The path of length 2

The adjacency matrix is given by

The eigenvalues of L are 0, 1, 2 with corresponding eigenvectors

Proof. By the spectral theorem, we can write

where {v1 , . . . , vn } are an orthonormal basis of Rn formed of eigenvectors of M . For 1 ≤

since the eigenvalues are ordered in decreasing order. Thus

which together with (4.3) concludes the proof.

for which v1T Lv1 = 0. √

φ(G) = min φ(S).

If G is d-regular, this simplifies to

the fraction of edges incident on S that have one endpoint outside of S.

To see this, define p p

and using (4.4) we get that for any σ,

• Order the coordinates of z so that z1 ≤ . . . ≤ zn .

• Choose σ such that zi0 = 0, where i0 is such that

• Scale z so that z12 + zn2 = 1.

where the second inequality is Cauchy-Schwarz. Combining (4.6) and (4.8),

which we can rewrite as

This is the same as before, except now we require d2 , instead of d, to be a semi-metric.

4.4 Random walks on graphs

4.4.2 The random walk matrix

4.4.3 Mixing Time

where π is the stationary distribution, and k · k1 is the statistical distance. By convention

pT D−1 p ≤ i p2i ≤ 1 since p is a distribution. The bound is ≤ ε for t = log(n/ε) log(n/ε)

4.5 Volume estimation

4.5.1 Approximate counting and sampling

4.5.2 Volume estimation

As a result the stationary distribution is uniform on L: if we have a cube C that

• Step 3: Mixing Time.

Theorem 4.27. Let L ⊆ Rn be convex with diameter d . Let S ⊆ L be such that

4.6 Expander Graphs

4.6.1 Definitions of expanders

There are two other common definitions of expanders:

Definition 4.30. A (d, γe ) edge expander is a graph G such that:

• G is d-regular with degree d.

• G is d-regular with degree d.

The following relation holds between these two notions:

4.6.2 Limits on expansion

Theorem 4.32. For any d-regular graph,

Definition 4.33. A Ramanujan graph is a d-regular graph such that

4.6.3 Constructions of expanders

Theorem 4.34 (Friedman). ∀ > 0, a random d-regular graph on n vertices satisfies:

4.7.1 Pseudorandom Generators

4.7.2 Expanders for derandomization

So for any fixed R,

where the last inequality follows since kW k = k Ad k ≤ 1 → kDY W k ≤ 1. Now it remains to

To bound the second term,

4.7.3 Polynomial Identity Testing

• The permanent cannot be computed by polynomial-size arithmetic circuits.

Using the Schwartz-Zippel algorithm

Claim 4.45. G contains a perfect matching if and only if det(AG ) 6= 0.

You might also like

The adjacency matrix is

Theorem 4.34 (Friedman). ∀ > 0, a random d-regular graph on n vertices satisfies: