Solving LP Relaxations of Large Scale Precedence Constrained Problems
Solving LP Relaxations of Large Scale Precedence Constrained Problems
net/publication/221316783
CITATIONS READS
112 1,198
2 authors, including:
Daniel Bienstock
Columbia University
159 PUBLICATIONS 6,110 CITATIONS
SEE PROFILE
All content following this page was uploaded by Daniel Bienstock on 17 January 2018.
October, 2009
Abstract
We describe new algorithms for solving linear programming relaxations of very large prece-
dence constrained production scheduling problems. We present theory that motivates a new set
of algorithmic ideas that can be employed on a wide range of problems; on data sets arising in
the mining industry our algorithms prove effective on problems with many millions of variables
and constraints, obtaining provably optimal solutions in a few minutes of computation.
1 Introduction
We consider problems involving the scheduling of jobs over several periods subject to precedence
constraints among the jobs as well as side-constraints. We must choose the subset of jobs to be
performed, and, for each of these jobs, how to perform it, choosing from among a given set of
options (representing facilities or modes of operation). Finally, there are side-constraints to be
satisfied, including period-wise, per-facility processing capacity constraints, among others. There
are standard representations of these problems as (mixed) integer programs.
Our data sets originate in the mining industry, where problems typically have a small number
of side constraints - often well under one hundred – but may contain millions of jobs and tens of
millions of precedences, as well as spanning multiple planning periods. Appropriate formulations
often achieve small integrality gap in practice; unfortunately, the linear programming relaxations
are far beyond the practical reach of commercial software.
We present a new iterative algorithm for solving the LP relaxation of this problem. The algo-
rithm incorporates, at a low level, ideas from Lagrangian relaxation and column generation, but
is however based on fundamental observations on the underlying combinatorial structure of prece-
dence constrained, capacitated optimization problems. Rather than updating dual information,
the algorithm uses primal structure gleaned from the solution of subproblems in order to accelerate
convergence. The general version of our ideas should be applicable to a wide class of problems. The
algorithm can be proved to converge to optimality; in practice we have found that even for problems
with millions of variables and tens of millions of constraints, convergence to proved optimality is
usually obtained in under twenty iterations, with each iteration requiring only a few seconds on
current computer hardware.
∗
Department of Industrial Engineering and Operations Research, Columbia University.
Research partially funded by a gift from BHP Billiton Ltd., and ONR Award N000140910327.
†
Resource and Business Optimization Group Function, BHP Billiton Ltd.
1
2 Definitions and Preliminaries
2.1 The Precedence Constrained Production Scheduling Problem
Definition 1 We are given a directed graph G = (N , A), where the elements of N represent jobs,
and the arcs A represent precedence relationships among the jobs: for each (i, j) ∈ A, j can be per-
formed no later than job i. Denote by F , the number of facilities, and T , the number of scheduling
periods.
Let yj,t ∈ {0, 1} represent the choice to process job j in period t, and xj,t,f ∈ [0, 1] represent the pro-
portion of job j performed in period t, and processed according to processing option, or “facility”, f .
The linear programming relaxation of the resulting problem, which we will refer to as PCPSP,
is as follows:
t
X t
X
Subject to: yi,τ ≤ yj,τ , ∀(i, j) ∈ A, 1 ≤ t ≤ T (2)
τ =1 τ =1
Dx ≤ d (3)
F
X
yj,t = xj,t,f , ∀j ∈ N , 1 ≤ t ≤ T (4)
f =1
T
X
yj,t ≤ 1, ∀j ∈ N (5)
t=1
x ≥ 0. (6)
For precedence constrained production scheduling problems that occur in the mining industry
some typical numbers are as follows:
These numbers indicate that the number of constraints of the form (2), (4) and (5) can be expected
to be very large.
2.2 Background
2.2.1 The Open Pit Mine Scheduling Problem
The practical motivating problem behind our study is the open pit mine scheduling problem. We
are given a three-dimensional region representing a mine to be exploited; this region is divided into
“blocks” (jobs, from a scheduling perspective) corresponding to units of earth (“cubes”) that can
2
be extracted in one step. In order for a block to be extracted, the set of blocks located (broadly
speaking) in a cone above it must be extracted first. This gives rise to a set of precedences, i.e. to
a directed graph whose vertices are the blocks, and whose arcs represent the precedences. Finally,
the extraction of a block entails a certain (net) profit or cost.
The problem of selecting which blocks to extract so as to maximize profit can be stated as
follows: n o
max cT x : xi ≤ xj ∀ (i, j) ∈ A, xj ∈ {0, 1} ∀j ,
where as before A indicates the set of precedences. This is the so-called maximum weight closure
problem – in a directed graph, a closure is a set S of vertices such that there exist no arcs (i, j) with
i ∈ S and j ∈ / S. It can be solved as a minimum s − t cut problem in a related graph of roughly the
same size. See [P76], and also [J68], [Bal70] and [R70]. Further discussion can be found in [HC00],
where the authors note (at the end of Section 3.4) that it can be shown by reduction from max
clique that adding a single cardinality constraint to a max closure problem is enough to make it
NP-hard. For additional related material see [F06], [LG65], [CH03], and references therein.
The problem we are concerned with here, by contrast, also incorporates production scheduling.
When a block is extracted it will be processed at one of several facilities with different operating
capabilities. The processing of a given block i at a given facility f consumes a certain amount
of processing capacity vif and generates a certain net profit pif . This overall planning problem
spans several time periods; in each period we will have one or more knapsack (capacity) constraints
for each facility. We usually will also have additional, ad-hoc, non-knapsack constraints. In this
version the precedence constraints apply across periods as per (2): if (i, j) ∈ A then j can only be
extracted in the same or in a later period than i.
Typically, we need to produce schedules spanning 10 to 20 periods. Additionally, we may have
tens of thousands (or many more) blocks; this can easily make for an optimization problem with
millions of variables and tens of millions of precedence constraints, but with (say) on the order of
one hundred or fewer processing capacity constraints (since the total number of processing facilities
is typically small).
3
of this problem that is only required to solve a sequence of linear programs with a number of
variables on the order of the number of aggregates (times the number of periods) in order to come
to a solution of the large LP. Thus if the number of aggregates is small the LP can be solved quickly.
Another development that has come to our attention recently is an algorithm by [CEGMR09]
which can solve the LP relaxation of even very large instances of the open pit mine scheduling
problem very efficiently. This algorithm is only applicable however to problems for which there is
a single processing option and for which the only constraints are knapsacks and there is a single
such constraint in each scheduling period. The authors note however that more general problems
can be relaxed to have this form in order to yield an upper bound on the solution value.
3 Our results
Empirically, it can be observed that formulation (1-6) frequently has small integrality gap. We
present a new algorithm for solving the continuous relaxation of this formulation and generaliza-
tions. Our algorithm is applicable to problems with an arbitrary number of process options and
arbitrary side constraints, and it requires no aggregation.1 On very large, real-world instances our
algorithm proves very efficient.
Our algorithmic developments hinge on three ideas. In order to describe these ideas, we will
first recast PCPSP as a special case of a more general problem, to which these results (and our
solution techniques) apply.
Lemma 3 Any instance of PCPSP can be reduced to an equivalent instance of GP CP with the
same number of variables and of constraints.
Proof. Consider an instance of PCPSP on G = (N , A), with T time periods, F facilities and side
constraints Dx ≤ d. Note that the y variables can be eliminated. Consider the following system of
inequalities on variables zj,t,f (j ∈ N , 1 ≤ t ≤ T , 1 ≤ f ≤ F ):
4
zj,T,F ≤ 1, j ∈ N , (13)
zi,t,F − zj,t,F ≤ 0, ∀ (i, j) ∈ A, 1 ≤ t ≤ T, (14)
z ≥ 0. (15)
Given a solution (x, y) to PCPSP, we obtain a solution z to (11)-(15) by setting, for all j, t and f :
t−1 X
X F f
X
zj,t,f = xj,τ,f 0 + xj,t,f 0 ,
τ =1 f 0 =1 f 0 =1
and conversely. Thus, for an appropriate system D̄z ≤ d¯ (with the same number of rows as Dx ≤ d)
and objective c̄T z, PCPSP is equivalent to the linear program:
¯ and constraints (11)-(15)}.
min{c̄T z : D̄z ≤ d,
Note: In Lemma 3 the number of precedences in the instance of GP CP is larger than in the
original instance of PCPSP; nevertheless we stress that the number of constraints (and variables)
is indeed the same in both instances.
We will now describe ideas that apply to GP CP . First, we have the following remark.
Note: There is a stronger version of Observation 4 in the specific case of problem PCPSP; namely,
the x variables can be eliminated from the Lagrangian (details: full paper, also see [BZ09]).
Observation 4 suggests that a Lagrangian relaxation algorithm for solving problem GP CP –
that is to say, an algorithm that iterates by solving problems of the form (16-18) for various vectors
π – would enjoy fast individual iterations. This is correct, as our experiments confirm that even
extremely large max closure instances can be solved quite fast using the appropriate algorithm (de-
tails, below). However, in our experiments we also observed that traditional Lagrangian relaxation
methods (such as subgradient optimization), applied to GP CP , performed quite poorly, requiring
vast numbers of iterations and not quite converging to solutions with desirable accuracy.
Our approach, instead, relies on leveraging combinatorial structure that optimal solutions to
GP CP must satisfy. Lemmas 5 and 6 are critical in suggesting such structure.
Proof: Ā must have at least n − q linearly independent rows and thus its null space must have
dimension ≤ q.
5
Lemma 6 Let P be the feasible space of a GP CP with q side constraints. Denote by Ax ≤ b the
subset of constraints containing the precedence constraints and the constraints 0 ≤ x ≤ 1, and let
Dx ≤ d denote the side constraints. Let x̂ be an extreme point of P , and the entries of x̂ attain k
distinct fractional values {α1 , . . . , αk }. For 1 ≤ r ≤ k, let θr ∈ {0, 1}n be defined by:
(
1, if x̂j = αr ,
for 1 ≤ j ≤ n, θjr =
0, otherwise.
Let Ā be the submatrix of A containing the binding constraints at x̂. Then the vectors θr are linearly
independent and belong to the null space of Ā. As a consequence, k ≤ q.
Proof: First we prove that Āθr = 0. Given a precedence constraint xi − xj ≤ 0, if the constraint is
binding then x̂i = x̂j . Thus if x̂i = αr , so that θir = 1, then x̂j = αr also, and so θjr = 1 as well, and
so θir − θjr = 0. By the same token if x̂i 6= αr then x̂j 6= αr and again θir − θjr = 0. If a constraint
xi ≥ 0 or xi ≤ 1 is binding at x̂ then naturally θir = 0 for all r as x̂i is not fractional. The supports
of the θr vectors are disjoint, yielding linear independence. Finally, k ≤ q follows from Lemma 5.
where µ ≥ 0, and, for each q, v q ∈ {0, 1}n is the incidence vector of a closure S q ⊂ N . [In fact,
the S q can be assumed to be nested]. So for any i, j ∈ N , x∗j = x∗i if i and j belong to precisely
the same family of sets S q . Also, Lemma 6 states that the number of distinct values that x∗j can
take is small, if the number of side constraints is small. Therefore it can be shown that when the
number of side constraints is small the number of closures (terms) in (19) must also be small. In
the full paper we flesh out this connection further, showing that a rich relationship exists between
the max closures produced by lagrangian problems and the optimal dual and primal solutions to
GP CP . Next, we will develop an algorithm that solves GP CP by attempting to “guess” the cor-
rect representation (19).
(a) Axi ≤ b,
6
(e) The set {θ1 , . . . , θq } spans N x̂ ∩ I x̂ ,
In the special case of the GP CP , we can choose xi satisfying the additional condition:
(g) xij = 0, for all j such that x̂j is fractional.
Proof sketch: Let us refer to the integer coordinates of x as xI and to the corresponding columns
of A as AI , and to the fractional coordinates of x as xF , and to the corresponding columns of A
as AF . Let h be the number of columns in AF . Note that b − AI xI is integer, and so by total
unimodularity there exists integer y ∈ Rh satisfying AF y ≤ b − AI xI , ĀF y = b̄ − ĀI xI . Defining
now xi = (xI , y) then xi is integer; it is equal to x everywhere that x is integer, and it satisfies
Axi ≤ b and Āxi = b̄. Clearly x − xi belongs to I x , and moreover Ā(x − xi ) = 0 so that it belongs
to N x as well, and so it can be decomposed as
q
X
x − xi = αr θr . (21)
r=1
For the special case of GP CP we have already described a decomposition for which xi equals x
everywhere that x is integer and is zero elsewhere. See the full paper for other details.
Comment: Note that rank(Ā) can be high and thus condition (d) is not quite as strong as Lemma
6; nevertheless q is small in any case and so we obtain a decomposition of x̂ into “few” terms when
the number of side-constraints is “small”. Theorem 7 can be strengthened for specific families of
totally unimodular matrices. For example, when A is the node-arc incidence matrix of a digraph,
the θ vectors are incidence vectors of cycles, which yields the following corollary.
Corollary 8 Let P be the feasible set for a minimum cost network flow problem with integer data
and side constraints. Let x̂ be an extreme point of P , and let q be the number of linearly independent
side constraints that are binding at x̂. Let ζ = {j : x̂j integral}. Then x̂ can be decomposed into
the sum of an integer vector v satisfying all network flow (but not necessarily side) constraints,
and with vj = x̂j ∀j ∈ ζ, and a sum of no more than q fractional cycle flows, over a set of cycles
disjoint from ζ.
(P1 ) : max cT x
s.t. Ax ≤ b
Dx ≤ d. (22)
Denote L(P1 , µ) be the Lagrangian relaxation in which constraints (22) are dualized with penalties
µ, i.e. the problem max{cT x + µT (d − Dx) : Ax ≤ b}.
One can approach problem (P1 ) by means of Lagrangian relaxation, i.e. an algorithm that
iterates by solving multiple problems L(P1 , µ) for different choices of µ; the multipliers µ are up-
dated according to some procedure. A starting point for our work concerns the fact that traditional
7
Lagrangian relaxation schemes (such as subgradient optimization) can prove frustratingly slow to
achieve convergence, often requiring seemingly instance-dependent choices of algorithmic parame-
ters, and frequently failing to deliver a sufficiently accurate solution. However, as observed in [B02]
(and also see [BA00]) Lagrangian relaxation schems can discover useful “structure.”
For example, Lagrangian relaxation can provide early information on which constraints from
among (22) are likely to be tight, and on which variables x are likely to be nonzero, even if the
actual numerical values for primal or dual variables computed by the relaxation are inaccurate.
The question then is how to use such structure in order to accelerate convergence and to obtain
higher accuracy. In [B02] the following approach was used:
• Periodically, interrupt the Lagrangian relaxation scheme to solve a restricted linear program
consisting of (P1 ) with some additional constraints used to impose the desired structure. Then
use the duals for constraints (22) obtained in the solution to the restricted LP to restart the
Lagrangian procedure.
The restricted linear program includes all constraints, and thus could (potentially) still be very
hard – the idea is that the structure we have imposed renders the LP much easier. Further, the LP
includes all constraints, and thus the solution we obtain is fully feasible for (P1 ), thus proving a
lower bound. Moreover, if our guess as to “structure” is correct, we also obtain a high-quality dual
feasible vector, and our use of this vector so as to restart the Lagrangian scheme should result in
accelerated convergence (as well as proving an upper bound on (P1 )). In [B02] these observations
were experimentally verified in the context of several problem classes.
We now seek to extend, and generalize, these ideas. In the following template, at each iteration
k we employ a linear system H k x = hk which can be interpreted as an educated guess for conditions
that an optimal solution to P1 should satisfy. This is problem-specific; we will indicate later how
this structure is discovered in the context of GP CP .
Algorithm Template
1. Set µ0 = 0 and set k = 1.
(P2k ) : max cT x
s.t. Ax ≤ b (23)
Dx ≤ d (24)
k k
H x=h (25)
5. Solve P2k to obtain xk , an optimal primal vector (with value z k ) and µk , an optimal
dual vector corresponding to constraints (24). If µk = µk−1 , STOP.
8
Notes:
1. Ideally, imposing H k x = hk in Step 4 should result in an easier linear program.
2. For simplicity, in what follows we will assume that P2k is always feasible; though this is a
requirement that can be easily circumvented in practice (full paper).
Theorem 9 (a) If the algorithm stops at iteration k in Step 2, then xk−1 is optimal for P1 . (b) If
it stops in Step 5 then xk is optimal for P1 .
where the first equality follows by duality and the second by definition of wk in Step 2 since
H k−1 wk = hk−1 . Also, clearly z k−1 ≤ z ∗ , and so in summary
(b) µk = µk−1 implies that wk optimally solves L(P1 , µk ), so that we could choose wk+1 = wk and
so H k wk+1 = hk , obtaining case (a) again.
GPCP Algorithm
I k = {j ∈ N : yjk = 1} (27)
and define
Ok = {j ∈ N : yjk = 0}. (28)
9
4. Let P2k consist of P1 , plus the additional constraints H k x = hk .
5. Solve P2k , with optimal solution xk , and let µk denote the optimal duals corresponding to the
side-constraints Dx ≤ d. If µk = µk−1 , STOP.
We have:
Lemma 10 (a) For each k, problem P2k is an instance of GPCP with rk variables and the same
number of side-constraints as in Dx ≤ d. (b) If P21 is feasible, the above algorithm terminates
finitely with an optimal solution.
Comments: The above algorithm is a basic application of the template. The partition C k can either
be refined, or coarsened. In particular, xk (Step 5) may attain fewer than rk distinct values (it
should attain “few” values, by Lemma 10 (a) and Lemma 6) and we can correspondingly merge
different sets Cjk . The feasibility assumption in (b) of Lemma 10 can be bypassed. Details will be
provided in the full paper.
5 Computational Experiments
In this section we present results from some of our experiments. A more complete set of results will
be presented in the full paper. All these tests were conducted using a single core of a dual quad-core
3.2 GHz Xeon machine with 64 GB of memory. The LP solver we used was Cplex, version 12 and
the min cut solver we used was our implementation of Hochbaum’s pseudoflow algorithm ([H08]).
The tests reported on in Tables 1 and 2 are based on three real-world examples provided by
BHP Billiton2 , to which we refer as ’Mine1’, ’Mine2’ and ’Mine3’ and a synthetic but realistic model
called ’Marvin’ which is included with Gemcom’s Whittle [W] mine planning software. ’Mine1B’ is
a modification of Mine1 with a denser precedence graph. Mine3 comes in two versions to which we
refer as ’big’ and ’small’. Using Mine1, we also obtained smaller and larger problems by modifying
the data in a number of realistic ways. Some of the row entries in these tables are self-explanatory;
the others have the following meaning:
• Problem arcs. The number of arcs in the graph that the algorithm creates to represent the
scheduling problem (i.e., the size of the min cut problems we solve).
• Iterations, time to 10−5 optimality. The number of iterations (resp., the CPU time)
taken by the algorithm until it obtained a solution it could certify as having ≤ 10−5 relative
optimality error.
Finally, an entry of ”—” indicates that Cplex was unable to terminate after 100000 seconds of
CPU time. More detailed analyses will appear in the full paper.
2
Data was masked.
10
Marvin Mine1B Mine2 Mine3,s Mine3,b
Algorithm Performance
Iterations to 10−5
optimality 8 8 9 13 30
Time to 10−5
optimality (sec) 10 60 344 1 1076
Iterations to
comb. optimality 11 12 16 15 39
Time to comb.
optimality (sec) 15 96 649 1 1583
References
[Bal70] M.L. Balinsky, On a selection problem, Management Science 17 (1970), 230–231.
[BA00] F. Barahona and R. Anbil, The Volume Algorithm: producing primal solutions with a
subgradient method, Math. Programming 87 (2000), 385 – 399.
[BZ09] D. Bienstock and M. Zuckerberg, A new LP algorithm for precedence constrained produc-
tion scheduling, posted on Optimization Online, 08/2009.
[CH03] L. Caccetta and S.P. Hill, An application of branch and cut to open pit mine scheduling,
Journal of Global Optimization 27 (2003), 349–365.
11
Mine1 very Mine1 Mine1 Mine1 Mine1,3
small medium large full weekly
Algorithm Performance
Iterations to 10−5
optimality 6 6 8 7 10
Time to 10−5
optimality (sec) 0 1 7 45 2875
Iterations to
comb. optimality 7 7 11 9 20
Time to comb.
optimality (sec) 0 2 10 61 6633
[CH09] B. Chandran and D. Hochbaum, A Computational Study of the Pseudoflow and Push-
Relabel Algorithms for the Maximum Flow Problem, Operations Research 57 (2009), 358–376.
[F06] C. Fricke, Applications of integer programming in open pit mine planning, PhD thesis, De-
partment of Mathematics and Statistics, The University of Melbourne, 2006.
[H08] D. Hochbaum, The pseudoflow algorithm: a new algorithm for the maximum flow problem,
Operations Research 58 (2008), 992–1009.
[HC00] D. Hochbaum and A. Chen, Improved planning for the open - pit mining problem, Opera-
tions Research 48 (2000), 894–914.
[J68] T.B. Johnson, Optimum open pit mine production scheduling, PhD thesis, Operations Re-
search Department, University of California, Berkeley, 1968.
12
[LG65] H. Lerchs, and I.F. Grossman, Optimum design of open-pit mines, Transactions C.I.M. 68
(1965), 17 – 24.
[P76] J.C. Picard, Maximal Closure of a graph and applications to combinatorial problems,, Man-
agement Science 22 (1976), 1268 – 1272.
[R70] J.M.W. Rhys, A selection problem of shared fixed costs and network flows, Management
Science 17 (1970), 200–207.
13