An O(log n % log log n)-approximation Algorithm
An O(log n % log log n)-approximation Algorithm
Downloaded 12/24/14 to 134.225.1.226. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php
Abstract theorem [24] shows that if the tree chosen in the first step is
We consider the Asymmetric Traveling Salesman problem “thin” then the cost of the Eulerian augmentation is within a
for costs satisfying the triangle inequality. We derive a ran- factor of the “thinness” of the (asymmetric) Held-Karp linear
domized algorithm which delivers a solution within a factor programming (LP) relaxation value (OPTHK ) [17]. This flow
O(log n/ log log n) of the optimum with high probability. argument works irrespectively of the actual directions of
the (directed) arcs correponding to the (undirected) edges
1 Introduction of the tree. Roughly speaking, a thin tree with respect to
the optimum solution x ∗ of the Held-Karp relaxation is a
In the Asymmetric Traveling Salesman problem (ATSP) we
spanning tree that, for every cut, contains a small multiple
are given a set V of n points and a cost function c : V × V →
(the thinness) of the corresponding value of x ∗ in this cut
R+ . The goal is to find a minimum cost tour that visits every
when the direction of the arcs are disregarded.
vertex at least once. Since we can replace every arc (u, v) in
A key step of our algorithm is to find a thin tree of small
the tour with the shortest path from u to v, we can assume c
cost compared to the LP relaxation value OPTHK . For this
satisfies the triangle inequality.
purpose, we consider the distribution with maximum en-
When the costs are symmetric, i.e. when for every
tropy among all those with marginal probabilities obtained
u, v ∈ V, c(u, v) = c(v, u), there is a factor 3/2 approximation
from the symmetrized LP solution (scaled by 1 − 1/n). From
algorithm due to Christofides [8]. This algorithm first finds a
the optimality conditions of a convex programming formula-
minimum cost spanning tree T on V, then finds the minimum
tion, we derive that this maximum entropy distribution corre-
cost Eulerian augmentation of that tree, and finally shortcuts
sponds to sampling a tree T with probability proportional to
the corresponding Eulerian walk into a tour.
e∈T λe for appropriately defined λe ’s for e ∈ E. We develop
Q
In this paper, we give an O(log n/ log log n) approxima-
a simple iterative algorithm for approximately computing
tion algorithm for the general asymmetric version. This fac-
these λe ’s efficiently. An important property of this scheme
tor finally breaks the Θ(log n) barrier from Frieze et al. [12]
is that the events correponding to edges being present in the
and subsequent improvements [3, 16, 11]. Our approach
sampled tree are negatively correlated. This means that the
for ATSP has similarities with Christofides’ algorithm; we
well-known Chernoff bound for the independent setting still
first construct a spanning tree with special properties. Then
holds, see Panconesi and Srinivasan [23].The proof of the
we find a minimum cost Eulerian augmentation of this tree,
O(log n/ log log n) thinness of the sampled tree is based on
and finally, shortcut the resulting Eulerian walk. For undi-
this tail bound.
rected graphs, being Eulerian means being connected and
The high level description of our algorithm can be found
having even degrees, while for directed graphs it means be-
in Figure 1. The proof of our main Theorem 6.3 also gives a
ing (strongly) connected and having the indegree of every
more formal overview of the algorithm.
vertex equal to its outdegree.
A simple flow argument using Hoffman’s circulation
2 Notation
∗ Stanford University, Department of Management Science and Engi- Before describing our approximation algorithm for ATSP in
neering. [email protected]. details, we need to introduce some notation. Throughout this
† MIT, Department of Mathematics. Supported by NSF contract CCF-
paper, we use a = (u, v) to denote the arc (directed edge)
0829878 and by ONR grant [email protected].
‡ MIT, Computer Science and Artificial Intelligence Laboratory. Sup-
from u to v and e = {u, v} for an undirected edge. Also we
ported by Fulbright Science and Technology Award, by NSF contract CCF-
use A (resp. E) for the set of arcs (resp. edges) in a directed
0829878, and by ONR grant N00014-05-1-0148. [email protected]. (resp. undirected) graph.
§ Stanford University, Department of Management Science and Engi- For a given function f : A → R, the cost of f is defined
neering. [email protected].
¶ Stanford University, Department of Management Science and Engi-
neering. [email protected].
ALGORITHM:
Downloaded 12/24/14 to 134.225.1.226. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php
1. Solve the Held-Karp LP relaxation of the ATSP instance to get an optimum extreme point solution x ∗ . [See
LP (3.1).] Define z ∗ by (3.5); z ∗ can be interpreted as the marginal probabilities on the edges of a probability
distribution on spanning trees.
2. Sample Θ(log n) spanning trees T j ’s from a distribution p̃(·) that approximates the maximum entropy distribution
among all the distributions that approximately preserve the marginal probabilities imposed by z ∗ . Let T ∗ be the
tree with minimum (undirected) cost among all the sampled trees. [See Sections 4 and 5.]
3. Orient each edge of T ∗ so as to minimize its cost. Find a minimum cost integral circulation that contains the
oriented tree T~ ∗ . Shortcut this multigraph and output the resulting Hamiltonian cycle.
(3.7) z (E(U)) ≤ |U| − 1, ∀U ⊂ V polytope). From the optimality conditions for this convex
(3.8) ze ≥ 0 ∀e ∈ E.} program and its dual, we show that this distribution generates
a λ-random spanning tree for some vector λ ∈ R|E| , where
The relative interior of P corresponds to those z ∈ P any tree T is output with probability proportional to e∈T λe .
Q
satisfying all inequalities (3.7) and (3.8) strictly. Section 4.3 explains the main implication of such distri-
∗
Clearly, z satisfies (3.6) since: butions. The events corresponding to the edges of G being
included in a sampled λ-random tree are negatively corre-
∀v ∈ V, x ∗ (δ+ (v)) = 1 ⇒ x ∗ (A) = n = |V| lated. This enables us to use Chernoff bounds on such events.
⇒ z (E) = n − 1 = |V| − 1.
∗ We use these tail bounds to establish the thinness of a sam-
pled tree. (Roughly speaking, a tree is said to be thin if the
Consider any set U ⊂ V. We have number of its edges in each cut is not much higher than its
X expected value; see Section 5 for a formal definition of thin-
∗ + ∗ +
x (δ (v)) = |U| = x (A(U)) + x (δ (U))
∗ ness.)
v∈U It is possible to approximately find the γe ’s efficiently.
≥ x (A(U)) + 1.
∗ In fact, we have a rather simple and iterative algorithm that,
after a polynomial number of iterations, finds approximate
Since x ∗ satisfies (3.2) and (3.3), we have λ̃e ’s with new marginal probabilities z̃e , where for all edges e,
z̃e ≤ (1 + ε)ze . We postpone the description of this algorithm
n−1 ∗ and its analysis to Section 7. Instead, in Section 4.2 we
z (E(U)) =
∗
x (A(U)) < x (A(U)) ≤ |U| − 1,
∗
n show how to efficiently sample a tree from such a distribution
showing that z satisfies (3.7) strictly. Since E is the support given any vector λ.
∗
where, as usual, δ(T ) = e∈T δe . Taking derivatives, we T 4.2. Given z in the spanning tree polytope of G =
P
derive that (V, E) and some ε > 0, values γ̃e for all e ∈ E can be found,
0 = 1 + log p(T ) − δ(T ), so that if we define the exponential family distribution
or 1 X
p̃(T ) := exp( γ̃e )
P
(4.11) p(T ) = eδ(T )−1 . e∈T
spanning tree in G corresponds to a uniform spanning tree matrix tree theorem (see [4]). The matrix tree theorem states
that T ∈T e∈T λe for any graph G is equal to the absolute
P Q
in a multigraph obtained from G by letting the multiplicity
of edge e be proportional to λe . value of any cofactor of the weighted Laplacian L where
Observe that a tree T sampled from an exponential
e = (i, j) ∈ E
family distribution p(·) as given in Theorem 4.2 is λ -random −λ
Pe
Li, j = λ i= j
for λe := eγe for all e ∈ E. As a result, we can use the tools e∈δ({i}) e
developed for λ -random trees to obtain an efficient sampling
0 otherwise.
procedure, see Section 4.2, and to derive sharp concentration
An alternative approach for computing pG0 [λ , f ] is to use the
bounds for the distribution p(·), see Section 4.3.
fact (see e.g. Ch. 4 of [21]) that pG0 [λ , f ] is equal to λ f times
the effective resistance of f in G0 treated as an electrical
4.2 Sampling a λ -Random Tree .
circuit with conductances of edges given by λ . The effective
There is a host of results (see [15, 20, 9, 1, 5, 25, 19] and
resistance can be expressed by an explicit linear-algebraic
references therein) on obtaining polynomial-time algorithms
formula whose computation boils down to inverting a certain
for generating a uniform spanning tree, i.e. a λ -random tree
matrix that can be easily derived from the Laplacian of G0
for the case of all λe ’s being equal. Almost all of them can
(see e.g. section 2.4 of [13] for details).
be easily modified to allow arbitrary λ ; however, not all of
them still guarantee a polynomial running time for general
4.3 Negative Correlation and a Concentration Bound.
λe ’s. We use for example an iterative approach similar to
We derive now the following concentration bound. As
[20].
discussed in the next section, this bound is instrumental in
The idea is to order the edges e1 , . . . , em of G arbitrarily
establishing the thinness of a sampled tree.
and process them one by one, deciding probabilistically
whether to add a given edge to the final tree or to discard it. T 4.3. For each edge e, let Xe be an indicator random
More precisely, when we process the j-th edge e j , we decide variable associated with the event [e ∈ T ], where T is a λ -
to add it to a final spanning tree T with probability p j being random tree. Also, for any subset C of the edges of G, define
the probability that e j is in a λ -random tree conditioned on X(C) = e∈C Xe . Then we have
P
the decisions that were made for edges e1 , . . . , e j−1 in earlier
iterations. Clearly, this procedure generates a λ -random tree, eδ
!E[X(C)]
and its running time is polynomial as long as the computation Pr[X(C) ≥ (1 + δ)E[X(C)]] ≤ .
(1 + δ)1+δ
of the probabilities p j can be done in polynomial time.
To compute these probabilities efficiently we note that, Usually, when we want to obtain such concentration
by definition, p1 = ze1 . Now, if we choose to include e1 in bounds, we prove that the variables {Xe }e are independent
the tree then: and use the Chernoff bound. Unfortunately, in our case, the
variables {Xe }e are not independent. However, it is well-
e∈T 0 λe
P Q
T 0 3e1 ,e2
p2 = Pr[e2 ∈ T |e1 ∈ T ] = known that they are negatively correlated, i.e. for any subset
e∈T 0 λe
P Q
0
F ⊆ E, Pr[∀e∈F Xe = 1] ≤ e∈F Pr[Xe = 1], see e.g. ch. 4 of
T 3e1
Q
e∈T 0 \e1 λe
P Q
T 0 3e ,e [21]1 .
= P 1 2Q .
T 0 3e1 e∈T 0 \e1 λe
L 4.1. The random variables {Xe }e are negatively cor-
As one can see, the probability that e2 ∈ T conditioned on related.
the event that e1 ∈ T is equal to the probability that e2 is in a
λ -random tree of a graph obtained from G by contracting the Once we have established negative correlation between
edge e1 . Similarly, if we choose to discard e1 , the probability the Xe ’s, Theorem 4.3 follows directly from the result of
p2 is equal to the probability that e2 is in a λ -random tree of Panconesi and Srinivasan [23] that the upper tail part of the
a graph obtained from G by removing e1 . In general, p j is
equal to the probability that e j is included in a λ -random tree 1 Lyons and Peres prove this fact only in the case of T being a uniform
of a graph obtained from G by contracting all edges that we spanning tree i.e. when all λe ’s are equal, but Section 4.1 of [21] contains a
have already decided to add to the tree, and deleting all edges justification why this proof implies this property also in the case of arbitrary
λe ’s. As mentioned before, for rational λe ’s, the main idea is to replace
that we have already decided to discard. each edge e with Cλe edges (for an appropriate choice of C) and consider a
Therefore, the only thing we need in order to compute uniform spanning tree in the corresponding multigraph. The irrational case
the p j ’s is to be able to compute efficiently for a given graph follows from a limit argument.
c(T ∗ ) > 2OPTHK . This concludes the proof of the theorem. cut has the same number of arcs in both directions. As H is
weakly connected (as it contains T~ ∗ ), it is strongly connected
6 Transforming a Thin Spanning Tree into an Eulerian and thus, H is an Eulerian directed multigraph. We can
Walk extract an Eulerian walk of H and shortcut it to obtain our
Hamiltonian cycle of cost at most c( f ∗ ) since the costs satisfy
As the final step of the algorithm, we show how one can find
the triangle inequality.
an Eulerian walk with small cost using a thin tree. After
To complete the proof of Theorem 6.1, it remains to
finding such a walk, one can use the metric property to
show that c( f ∗ ) ≤ (2α + s)c(x ∗ ). For this purpose, we define
convert this walk into an Hamiltonian cycle of no greater
1 + 2αxa∗ a ∈ T~ ∗
(
cost (by shortcutting). In particular, the following theorem
u(a) =
justifies the definition of thin spanning trees. 2αxa∗ a < T~ ∗ .
T 6.1. Assume that we are given an (α, s)-thin span- We claim that there exists a circulation g satisfying l(a) ≤
ning tree T ∗ with respect to the LP relaxation x ∗ . Then g(a) ≤ u(a) for every a ∈ A. To prove this claim,
we can find a Hamiltonian cycle of cost no more than we use Hoffman’s circulation theorem 6.2. Indeed, by
(2α + s)c(x ∗ ) = (2α + s)OPTHK in polynomial time. construction, l(a) ≤ u(a) for every a ∈ A; furthermore,
Lemma 6.1 below shows that, for every U ⊂ V, we have
Before proceeding to the proof of Theorem 6.1, we l(δ− (U)) ≤ u(δ+ (U)). Thus the existence of the circulation g
recall some basic network flow results related to circulations. is established. Furthermore,
A function f : A → R is called a circulation if f (δ+ (v)) =
f (δ− (v)) for each vertex v ∈ V. Hoffman’s circulation c( f ∗ ) ≤ c(g) ≤ c(u) = c(T~ ∗ ) + 2αc(x ∗ ) ≤ (2α + s)c(x ∗ ),
theorem [24, Theorem 11.2] gives a necessary and sufficient establishing the bound on the cost of f ∗ . This completes the
condition for the existence of a circulation subject to lower proof of Theorem 6.1.
and upper capacities on arcs.
L 6.1. For the capacities l and u as constructed in the
T 6.2. (H’ ) Given proof of Theorem 6.1, the following holds for any subset
lower and upper capacities l, u : A → R, there exists a U ⊂ V:
circulation f satisfying l(a) ≤ f (a) ≤ u(a) for all a ∈ A if l(δ− (U)) ≤ u(δ+ (U)).
and only if
Proof. Irrespective of the orientation of T ∗ into T~ ∗ , the
1. l(a) ≤ u(a) for all a ∈ A and number of arcs of T~ ∗ in δ− (U) is at most αz ∗ (δ(U)) by
definition of α-thinness. Thus
2. for all subsets U ⊂ V, we have l(δ− (U)) ≤ u(δ+ (U)).
l(δ− (U)) ≤ αz ∗ (δ(U)) < 2αx ∗ (δ− (U)),
Furthermore, if l and u are integer-valued, f can be chosen
to be integer-valued. due to (3.4) and (3.5). On the other hand, we have
u(δ+ (U)) ≥ 2αx ∗ (δ+ (U)) = 2αx ∗ (δ− (U)) ≥ l(δ− (U)),
Proof. [Theorem 6.1] We first orient each edge {u, v} of T ∗
to arg min{c(a) : a ∈ {(u, v), (v, u)} ∩ A}, and denote the re- where we have used the fact that x ∗ itself is a circulation (see
sulting directed tree by T~ ∗ . Observe that by definition of our (3.4)). The lemma follows.
ετ
P
T :e<T e
γ(T ) γe∗ < − 4m . Indeed, there are m edges, and by Lemma 7.1
= we know that in each iteration we decrease γe of one of these
e−δ T 3e eγ(T )
P
! edges by at least ε/4. Thus, we know that, after more than τ
1
= eδ −1 . iterations, there exists e∗ for which γe∗ is as desired.
qe (γ) Note that we never decrease γe for edges e with qe (·)
smaller than (1 + ε)ze , and Lemma 7.1 shows that reducing
Before bounding the number of iterations, we collect
γ f of edge f , e can only increase qe (·). Therefore, we
some basic results regarding spanning trees which we need
know that all the edges with γe being negative must satisfy
for the proof of the number of iterations.
qe ≥ (1 + ε/2)ze . In other words, all edges e such that
L 7.2. Let G = (V, E) be a graph with weights γe for qe < (1+ε/2)ze satisfy γe = 0. Finally, by a simple averaging
argument, we know that e qe = n − 1 < (1 + ε/2)(n − 1) =
P
e ∈ E. Let Q ⊂ E be such that for all f ∈ Q, e ∈ E \ Q,
we have γ f > γe + ∆ for some ∆ ≥ 0. Let r be the size of a (1 + ε/2) e ze . Hence, there exists at least one edge f ∗ with
P
2. Any spanning tree T ∈ T= can be generated by taking We construct Q as follows. We set threshold values Γi =
ετi
the union of any spanning forest F (of cardinality r) − 4m 2 , for i ≥ 0, and define Qi = {e ∈ E |γe ≥ Γi }. Let Q = Q j
of the graph (V, Q) and a spanning tree (of cardinality where j is the first index such that Q j = Q j+1 . Clearly, by
n−r −1) of the graph G/Q in which the edges of Q have construction of Q, property (II) is satisfied. Also, Q is non-
been contracted. empty since f ∗ ∈ Q0 ⊆ Q j = Q. Finally, by the pigeonhole
principle, since we have m different edges, we know that
ετ
3. Let T max be a maximum spanning tree of G with respect j < m. Thus, for each e ∈ Q we have γe > Γm = − 4m .
to the weights γ(·), i.e. T max = arg maxT ∈T γ(T ). Then, This means that e < Q and thus Q has property (I).
∗
for any T ∈ T< , we have γ(T ) < γ(T max ) − ∆. Observe that Q satisfies the hypothesis of Lemma 7.2
ετ
with ∆ = 4m 2 . Thus, for any T ∈ T< , we have
which yields the desired inequality. in a random spanning tree T̂ of G b , where each tree T̂ is
chosen with probability proportional to eγ(T̂ ) . Since spanning
We proceed to bounding the number of iterations. b have n − r − 1 edges, we have
trees of G
X
L 7.3. The algorithm executes at most (7.15) q̂e = n − r − 1.
O( 1ε |E|2 [|V| log(|V|) − log(εzmin )]) iterations of the main e∈E\Q
It remains to prove that for any e < Q, qe < q̂e + εz2min . edge e - this will enable us to compute all qe (γ)’s. This can
We have that be done using Kirchhoff’s matrix tree theorem (see [4]), as
P γ(T )
discussed in Section 4.2 (with λe = eγe ). Observe that we can
T ∈T :e∈T e bound all entries of the weighted Laplacian matrix in terms
qe = P γ(T )
T ∈T e of the input size since the proof of Lemma 7.3 actually shows
γ(T ) ετ
+ T ∈T< :e∈T eγ(T ) ≤ γe ≤ 0 for all e ∈ E and any iteration of the algo-
P P
T ∈T= :e∈T e that − 4|E|
= P γ(T ) rithm. Therefore, we can compute these cofactors efficiently,
T ∈T e
P γ(T ) P γ(T ) in time polynomial in n, − log zmin and 1/ε. Finally, δ can be
T ∈T :e∈T e T ∈T :e∈T e
≤ P = γ(T ) + P < γ(T ) computed efficiently from Lemma 7.1.
T ∈T= e T ∈T e
γ(T )
eγ(T )
P
T ∈T :e∈T e Acknowledgments. We would like to thank the reviewers
X
(7.16) ≤ P = γ(T ) + ,
T ∈T= e T ∈T :e∈T
eγ(Tmax ) for many insightful comments.
<