0% found this document useful (0 votes)
97 views

Mixed Integer Programming: Models and Methods - Nicolai Pisaruk

This document provides an overview of the book "Mixed Integer Programming: Models and Methods" by Nicolai Pisaruk. The book covers theoretical results and algorithms for solving mixed integer programming problems that have been implemented in modern software. It discusses formulations for modeling nonlinearities with binary variables and techniques for strengthening formulations. The main methods covered are branch-and-cut and branch-and-price. The book is intended for those studying integer programming, operations research, mathematical programming, and related fields.

Uploaded by

Alexandre Frias
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
97 views

Mixed Integer Programming: Models and Methods - Nicolai Pisaruk

This document provides an overview of the book "Mixed Integer Programming: Models and Methods" by Nicolai Pisaruk. The book covers theoretical results and algorithms for solving mixed integer programming problems that have been implemented in modern software. It discusses formulations for modeling nonlinearities with binary variables and techniques for strengthening formulations. The main methods covered are branch-and-cut and branch-and-price. The book is intended for those studying integer programming, operations research, mathematical programming, and related fields.

Uploaded by

Alexandre Frias
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 261

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/332859832

Mixed Integer Programming: Models and Methods

Book · May 2019

CITATIONS READS
0 7

1 author:

Nicolai Pisaruk
Belarus State University
33 PUBLICATIONS   172 CITATIONS   

SEE PROFILE

Some of the authors of this publication are also working on these related projects:

Python-mipcl View project

mip-book View project

All content following this page was uploaded by Nicolai Pisaruk on 04 May 2019.

The user has requested enhancement of the downloaded file.


N.N. Pisaruk

Mixed Integer Programming:


Models and Methods
May 4, 2019

+++++++
Preface

Initially, I wrote a short manual for users of the optimization library MIPCL (Mixed
Integer Programming Class Library). Later I decided to rework the manual to make
it useful for a wider audience. As a result, this book was written, in which almost
all the main theoretical results, already implemented in modern commercial mixed-
integer programming (MIP) software, are presented. All algorithms and techniques
described here (and many others) are implemented in MIPCL. Having the experi-
ence of developing such a complex software product as MIPCL, I dared to teach
others of using MIP in practice.
It is clear that in a book of this volume it is impossible to cover the diversity
of researches in MIP. In particular, specialized algorithms for solving numerous
special cases of mixed integer programs (MIPs). are not considered here since the
number of publications on this topic is enormous. Therefore, when selecting mate-
rials, I decided to discuss only those results that are already implemented in modern
computer MIP libraries or those that potentially can be included in these libraries
in the near future.
The strength of MIP as a modeling tool for practical problems was recognized
immediately with its appearance in the 50th and 60th years of the 20th century.
Unfortunately, for a long time the computers and software available could not solve
those models. As a result, the illusion melted. Even today, many potential users still
believe that MIP is just a tool for writing models, but with very limited capacity
for solving those models. In fact, the situation has changed dramatically over the
past twenty years. Today, we can solve many difficult practical MIPs using standard
software.
What is the reason for the wide use of MIP in practice? A brief answer is that,
using binary variables that take only two values, 0 or 1, we can model many types of
nonlinearities, in particular, almost any logical conditions. And the latter are present
in almost every non-trivial practical application. It is also very important that MIP
models are easily expandable. When developing a decision-making system, be care-
ful of using highly specialized models and software. Even if you do not encounter
problems at the development stage, they can appear later, during system operation,

v
vi Preface

when the requirements to it change, and their accounting is impossible in the cur-
rently used model.
For many years, the basic approach to solving MIPs has remained unchanged:
it was a linear-programming-based branch-and-bound method proposed by Land
and Doig back in 1960. And this was despite the fact that at the same time there
was significant progress in the theory of linear programming and related areas of
combinatorial optimization. Many of the ideas developed there ”passed” through
intensive computational experiments, but until recently only a few of them were
implemented in commercial software products used by practitioners. Nowadays, the
best MIP programs include many ideas that accumulate theoretical achievements,
for example, the modern MIP solvers preprocess and automatically reformulate the
problems being solved, generate cuts of various types, use a variety of heuristics in
the nodes of the search tree to build feasible solutions. This allowed R.E. Bixby to
state that the gap between theory and practice was being closed.
Next, we briefly present the contents of this book. In the introduction, we discuss
the specific features of the MIPs that distinguish them from other mathematical pro-
gramming problems. Next, we give examples of the formulations of various types of
nonlinearities in MIP models. Then we try to understand why one MIP formulation
is stronger (better) than another one, and also discuss some ways of strengthen-
ing existing formulations. Understanding that not all MIP formulations are equally
good in practice has come relatively recently. Prior to this, as a rule, preference was
given to more compact formulations.
In typical situations, to use MIP in practice, one does not need to be an expert
in theory. Some skills are only needed to formulate practical problems as MIPs.
This can be learned by studying applications and their formulations that have al-
ready become classical. Chapter 2 presents a number of such applications and their
formulations. Even more applications are considered in the other chapters as exam-
ples for demonstrating some of the techniques used in MIP. Descriptions of many
applications are also found in the exercises that are given after each chapter. Many
exercises are specially formulated by asking to justify the validity of the proposed
answer. This makes the exercises an additional source of information on the topic
under discussion.
Because of its universality, the general MIP is a very difficult computational
problem. Many of the most complex problems of combinatorial optimization are
very simply formulated as MIPs. A number of results from computational complex-
ity theory indicate that efficient algorithms are unlikely to be proposed for solving
such problems. We cannot also expect that in the foreseeable future a computer pro-
gram will be developed that will be able to solve with equal efficiency all MIPs that
arise in practice. Therefore, modern MIP libraries are designed so that they allow
the users to redefine (reprogram) many of their functions, replacing them with those
that take into account specific features of the problem being solved. One cannot ef-
fectively use these libraries without knowing the theory on which they are based.
The rest of the book is devoted to the study of the algorithms implemented in the
modern MIP libraries.
Preface vii

Since MIP is based on linear programming (LP), a brief introduction to LP


is given in Chapter 3. Here we do not intend to discuss all aspects of the theory
and practice of LP. Our goal is to provide enough information to understand the
applications of LP methods in MIP.
Currently, the main method for solving MIPs is the branch-and-cut method, since
it is used in all (in all!) modern competitive MIP programs. Briefly, the branch-and-
cut method is a combination of the branch-and-bound and cutting plane methods. In
chapters 4–6 we study both components of the branch-and-cut method, as well as
its other important features.
Another efficient methodology widely used for solving specific MIPs is the
branch-and-price method, which can be viewed as a combination of the branch-
and-bound and column generation methods. The advantages and potentialities of the
branch-and-price method, as well as the difficulties of its implementation, are dis-
cussed in Chapter 7. It is also important that the branch-and-price method perfectly
complements the branch-and-cut method: usually the branch-and-price method is
used in cases where the branch-and-cut method is not very efficient.
The final chapter discusses the relatively new MIP applications for solving op-
timization problems with uncertain parameters. Such problems often arise in eco-
nomic applications (e.g., in models of long-term planning), when the decision must
be taken today, and its efficiency can be judged only after the completion of the
planning horizon. Although the problems of this sort have been studied for a long
time, but only the emergence of powerful modern computers has made it possible
to apply the results of these studies in practice The material of this chapter should
in no case be regarded as a brief introduction to stochastic programming or robust
optimization. Here, we consider only those methods that reduce the problem with
uncertain parameters to its deterministic equivalent (or counterpart) that is a MIP.
In the comments to each chapter, only some publications on the topic are selec-
tively cited: several important original sources and also several recent reviews and
articles that can be used for more in-depth study.
This book will be useful as a primary or auxiliary source in the study of such dis-
ciplines as integer programming, operation research, mathematical programming,
discrete optimization, decision theory, operations management.
To understand the presented material in full, it is assumed that the reader is fa-
miliar with the foundations of linear algebra and linear programming, as well as the
basic concepts of convex analysis, graph theory, probability theory, and computa-
tional complexity.

Minsk, May 2018 Nicolai N. Pisaruk


Contents

Abbreviations and Notations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiii

1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1 Integerality and Nonlinearity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1.1 Discrete Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.1.2 Fixed and Variable Costs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.1.3 Approximation of Nonlinear Functions . . . . . . . . . . . . . . . . . . 3
1.1.4 Approximation of Convex Functions . . . . . . . . . . . . . . . . . . . . 5
1.1.5 Logical Conditions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.2 Multiple Alternatives and Disjunctions . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.2.1 Floor Planning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.2.2 Linear Complementarity Problem . . . . . . . . . . . . . . . . . . . . . . 9
1.2.3 Quadratic Programming Under Linear Constraints . . . . . . . . 9
1.3 How an LP May Turn Into a MIP? . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.4 Polyhedra . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
1.5 Good and Ideal Formulations, Reformulation . . . . . . . . . . . . . . . . . . . 14
1.6 Strong Inequalities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
1.7 Extended Formulations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
1.7.1 Single-Product Lot-Sizing Problem . . . . . . . . . . . . . . . . . . . . . 18
1.7.2 Fixed Charge Network Flows . . . . . . . . . . . . . . . . . . . . . . . . . . 20
1.8 Alternative Formulations for Scheduling Problems . . . . . . . . . . . . . . . 22
1.8.1 Continuous Time Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
1.8.2 Time-Index Formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
1.9 Knapsack Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
1.9.1 Integer Knapsack . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
1.9.2 0,1-Knapsack . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
1.10 Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
1.11 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

ix
x Contents

2 MIP Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
2.1 Set Packing, Partitioning, and Covering Problems . . . . . . . . . . . . . . . 37
2.2 Service Facility Location . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
2.3 Portfolio Management: Index Fund . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
2.4 Multiproduct Lot-Sizing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
2.5 Balancing Assembly Lines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
2.6 Electricity Generation Planning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
2.7 Designing Telecommunication Networks . . . . . . . . . . . . . . . . . . . . . . . 46
2.8 Placement of Logic Elements on the Surface of a Crystal . . . . . . . . . 47
2.9 Assigning Aircrafts to Flights . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
2.10 Optimizing the Performance of a Hybrid Car . . . . . . . . . . . . . . . . . . . . 51
2.11 Short-Term Financial Management . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
2.12 Planning Treatment of Cancerous Tumors . . . . . . . . . . . . . . . . . . . . . . 54
2.13 Project Scheduling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
2.14 Short-Term Scheduling in Chemical Industry . . . . . . . . . . . . . . . . . . . 58
2.15 Multidimensional Orthogonal Packing . . . . . . . . . . . . . . . . . . . . . . . . . 61
2.15.1 Basic IP Formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
2.15.2 Tightening Basic Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
2.15.3 Rotations and Complex Packing Items . . . . . . . . . . . . . . . . . . . 65
2.16 Single Depot Vehicle Routing Problem . . . . . . . . . . . . . . . . . . . . . . . . 65
2.16.1 Classical Vehicle Routing Problem . . . . . . . . . . . . . . . . . . . . . 68
2.17 Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
2.18 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70

3 Linear Programming . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
3.1 Basic Solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
3.2 Primal Simplex Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
3.2.1 How to Find a Feasible Basic Solution . . . . . . . . . . . . . . . . . . 79
3.2.2 Pricing Rules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
3.3 Dual Simplex Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
3.3.1 Adding New Constraints and Changing Bounds . . . . . . . . . . . 83
3.3.2 How to Find a Dual Feasible Basic Solution? . . . . . . . . . . . . . 83
3.3.3 The Dual Simplex Method Is a Cutting Plane Algorithm . . . 84
3.3.4 Separation Rules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
3.4 Why an LP Does Not Have a Solution? . . . . . . . . . . . . . . . . . . . . . . . . 87
3.5 Duality in Linear Programming . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
3.6 Linear Programs With Two-Sided Constraints . . . . . . . . . . . . . . . . . . . 89
3.7 Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
3.8 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90

4 Cutting Planes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
4.1 Cutting Plane Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
4.2 Chvátal-Gomory Cuts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
4.3 Mixed Integer Rounding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
4.4 Fractional Gomory Cuts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107
Contents xi

4.5 Disjunctive Inequalities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109


4.6 Lift And Project . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114
4.7 Separation and Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118
4.7.1 Markowitz Model for Portfolio Optimization . . . . . . . . . . . . . 118
4.7.2 Exact Separation Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121
4.8 Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126
4.9 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127

5 Cuts for Structured Mixed-Integer Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . 131


5.1 Knapsack Inequalities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131
5.1.1 Separation Problem For Cover Inequalities . . . . . . . . . . . . . . . 132
5.2 Lifting Inequalities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133
5.2.1 Lifted Cover Inequalities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134
5.2.2 Lifting Feasible Set Inequalities . . . . . . . . . . . . . . . . . . . . . . . . 139
5.3 Mixed Knapsack Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140
5.3.1 Sequence Independent Lifting . . . . . . . . . . . . . . . . . . . . . . . . . . 142
5.4 Simple Flow Structures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143
5.4.1 Separation for Flow Cover Inequalities . . . . . . . . . . . . . . . . . . 145
5.5 Generalized Upper Bounds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147
5.5.1 Clique Inequalities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148
5.5.2 Odd Cycle Inequalities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148
5.5.3 Conflict Graphs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149
5.6 Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150
5.7 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150

6 Branch-And-Cut . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153
6.1 Branch-And-Bound . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153
6.2 Branch-And-Cut . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158
6.3 Branching . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161
6.3.1 Priorities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163
6.3.2 Special Ordered Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164
6.4 Global Gomory Cuts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165
6.5 Preprocessing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167
6.5.1 Disaggregation of Inequalities . . . . . . . . . . . . . . . . . . . . . . . . . 171
6.5.2 Probing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172
6.6 Traveling Salesman Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 174
6.7 Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179
6.8 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179

7 Branch-And-Price . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183
7.1 Column Generation Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183
7.1.1 One-Dimensional Cutting Stock Problem . . . . . . . . . . . . . . . . 183
7.1.2 Column Generation Approach . . . . . . . . . . . . . . . . . . . . . . . . . 184
7.1.3 Finding a Good Initial Solution . . . . . . . . . . . . . . . . . . . . . . . . 185
7.1.4 Cutting Stock Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 186
xii Contents

7.2 Dancig-Wolfe Reformulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 188


7.2.1 Master and Pricing Problems . . . . . . . . . . . . . . . . . . . . . . . . . . 189
7.2.2 Branching . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 190
7.3 Generalized Assignment Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 191
7.3.1 Master Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 192
7.3.2 Pricing Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193
7.3.3 Branching . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193
7.3.4 Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 194
7.4 Symmetry Issues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 201
7.5 Designing Telecommunication Networks . . . . . . . . . . . . . . . . . . . . . . . 202
7.5.1 Master Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 204
7.5.2 Pricing Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 204
7.6 Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 205
7.7 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 206

8 Optimization With Uncertain Parameters . . . . . . . . . . . . . . . . . . . . . . . . . 209


8.1 Two-Stage Stochastic Programming Problems . . . . . . . . . . . . . . . . . . 209
8.2 Benders’ Reformulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 211
8.3 Risks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 214
8.3.1 Extended Two-Stage Model . . . . . . . . . . . . . . . . . . . . . . . . . . . 217
8.3.2 Credit Risk . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 218
8.4 Multistage Stochastic Programming Problems . . . . . . . . . . . . . . . . . . 220
8.5 Synthetic Options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 222
8.6 Yield Management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 225
8.7 Robust MIPs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 228
8.7.1 Row-Wise Uncertainties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 229
8.7.2 Polyhedral Uncertainties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 230
8.7.3 Combinatorial Uncertainties . . . . . . . . . . . . . . . . . . . . . . . . . . . 231
8.7.4 Robust Single-Product Lot-Sizing Problem . . . . . . . . . . . . . . . 234
8.8 Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 236
8.9 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 237
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 239

Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 245
Abbreviations and Notations

R: field of real numbers


R+ , R++ : sets of non-negative and positive real numbers
Z: ring of integers
Z+ , Z++ : sets of non-negative and positive integers
X ±Y : set {x ± y : x ∈ X, y ∈ Y }
X ×Y : Cartesian product, {(x, y) : x ∈ X, y ∈ Y }, of sets X and Y
def
∏ni=1 Xi : set X1 × X2 × · · · × Xn = {(x1 , x2 . . . , xn ) : xi ∈ Xi , i = 1, . . . , n}
X n : set {(x1 , x2 , . . . , xn ) : xi ∈ X, i = 1, . . . , n}
X A : set {(xa1 , xa2 , . . . , xan ) : xai ∈ X, i = 1, . . . , n} for A = {a1 , a2 , . . . , an }
conv(X): convex hull of vectors from X ⊆ Rn
cone(X): convex cone generated by set of vectors X ⊆ Rn
I: identity matrix of required size
0: zero vector or matrix of required size
def
ei : i-th unit vector, ei = (0, . . . , 0, 1, 0, . . . , 1)T with 1 in the i-th position
e: vector (1, 1 . . . , 1)T of required size
AT : transposed matrix for matrix A
A−1 : inverse matrix for non-degenerate matrix A
det(A): determinant of square matrix A
AJI : submatrix of matrix A formed by elements that are in rows from set I and
columns from set J
AI : submatrix of matrix A formed by elements that are in rows from set I
AJ : submatrix of matrix A formed by elements that are in columns from set J
rank(A): rank of matrix A
def √
kxk = xT x: (Euclidean) norm of vector x ∈ Rn
χ X : characteristic function (vector) of subset X of finite set S: χ X (i) = 1 (χiX = 1)
if i ∈ X, and χ X (i) = 0 (χiX = 0) if i ∈ S \ X
H(a, b): hyperplane {x ∈ Rn : ax = b}
H≤ (a, b), H≥ (a, b): half-spaces {x ∈ Rn : ax ≤ b} and {x ∈ Rn : ax ≥ b}
P(A, b): polyhedron {x ∈ Rn : Ax ≤ b}
P(A, b; S): mixed-integer set {x ∈ Rn : Ax ≤ b, x j ∈ Z for j ∈ S}

xiii
xiv Abbreviations and Notations

G = (V, E): graph or directed graph (digraph) with vertex set V , and edge (arc)
set E
E(S, T ): set of edges in graph G = (V, E) with one end in set S, and other one in
set T ; or set of arcs in digraph G = (V, E) outgoing set S and incoming set T
f (n) = O(g(n)) if there exists constant c > 0 such that f (n) ≤ cg(n) for suffi-
ciently large n ∈ Z+ (for example, 5n2 + 7n + 100 = O(n2 ))
(Ω , A , P): probability space, where
Ω : space of elementary events or sample space,
A : algebra or σ -algebra of subsets of Ω (elements of A are called event),
P: probability measure on A (P(S) is probability that randomly chosen ω ∈ Ω
belongs to S ∈ A )
E(ξ ): (mathematical) expectation (expected value) of random variable ξ : Ω →
def R
R, E(ξ ) = Ω ξ (ω)P(dω)
IP: Integer Programming
IP: Integer Program (IP problem)
LP: Linear Programming
LP: Linear Program (LP problem)
MIP: Mixed Integer Programming
MIP: Mixed Integer Program (MIP problem)
NP: class of decision problems (with two answers: ”yes” or ”no”), that can be
solved by nondeterministic Turing machine in polynomial time
P: class of decision problems (with two answers: ”yes” or ”no”), that can be
solved by deterministic Turing machine in polynomial time
Chapter 1
Introduction

The mixed integer program (MIP) is the following optimization problem:

max{cT x : b1 ≤ Ax ≤ b2 , d 1 ≤ x ≤ d 2 , x j ∈ Z for j ∈ S}, (1.1)

where b1 , b2 ∈ Rm , c, d 1 , d 2 ∈ Rn , A is a real m × n-matrix, x is an n-vector of vari-


ables (unknowns), and S ⊆ {1, . . . , n} is the set of integer variables. In the integer
program (IP) all variables are integer (|S| = n).
As compared to the linear program (LP), in which S = 0, / the MIP has variables
taking values from a discrete set. This difference makes the MIP significantly more
difficult from the algorithmic point of view. We can say that the MIP is one of the
most difficult problems of mathematical programming. And this is not surprising,
since many combinatorial optimization problems, including those considered to be
the most difficult, are very simply formulated as specific MIPs. One of the most
common applications of mixed integer programming (MIP) in everyday life is the
efficient use of limited resources.

1.1 Integerality and Nonlinearity

We will see later many times that the condition ”x is integer” can be used to express
many nonlinear constraints. But first we note that this restriction itself can be given
by means of a single smooth equation:

sin(πx) = 0.

Another important condition ”x is binary” (x can take only one of two values: 0
or 1) is written as one quadratic equation

x2 − x = 0.

1
2 1 Introduction

This representation of binary variables allows us to formulate many combinatorial


optimization problems as quadratic programming problems that involve quadratic
terms in their objective functions and constraints. For example, the NP-hard set
partition problem (see also Sect. 2.1)

max{cT x : Ax = e, x ∈ {0, 1}n },

where c ∈ Rn , and A is an m × n-matrix with 0 or 1 elements, is rewritten as the


following quadratic programming problem:

cT x → max,
Ax = e,
xi2 = xi , i = 1, . . . , n.

Here and below e denotes a vector of suitable size all components of which are equal
to 1.
Suppose now that an integer variable x is non-negative and upper bounded, that
is, 0 ≤ x ≤ d, where d is a positive integer. In the binary system, d can be written as
a k = blog dc + 1 digit number. Therefore, introducing k new continuous variables,
s0 , . . . , sk−1 , we can represent the condition x ∈ {0, 1, . . . , d} by the following system
of equations:
k−1
x= ∑ 2i si ,
i=0
s2i = si , i = 0, . . . , k − 1.

So, we can conclude that any MIP is reduced to a quadratic programming prob-
lem and, consequently, the general MIP is not more difficult than the general
quadratic programming problem. But a distinctive feature of integer programming
(IP) is that here the integer-valued variables are handled in a very special way at the
algorithmic level by branching on integer variables and by generating cuts.
From a practical point of view, it is more important that, introducing additional
integer (most often binary) variables, we can model many nonlinearities by linear
constraints.

1.1.1 Discrete Variables

A discrete variable x can take only a finite number of values v1 , . . . , vk . For example,
in the problem of designing a communication network, the capacity of a link can be,
say, 1, 2 or 4 gigabytes. This discrete variable x can be represented as an ordinary
continuous variable by introducing k binary variables y1 , . . . , yk and writing down
the constraints
1.1 Integerality and Nonlinearity 3

x − v1 y1 − v2 y2 − . . . − vk yk = 0, (1.2a)
y1 + y2 + . . . + yk = 1, (1.2b)
yi ∈ Z+ , i = 1, . . . , k. (1.2c)

Let us also note that, instead of declaring that all variables yi are integer, it is
enough to specify that (1.2b) is a generalized upper bound, i.e., in such a constraint
only one variable can take a non-zero value. The generalized upper bounds in the
MIP software manuals are often referred to as special ordered sets of type 1 (SOS1),
and their accounting is carried out by performing a special type of branching (see
Sect. 6.3.2).

1.1.2 Fixed and Variable Costs

One of the most significant limitations of linear programming with respect to solving
economic problems is that in linear models one cannot take into account fixed costs.
In MIP, accounting for fixed costs is simple.
Let us assume that the cost of producing x units
of some product is calculated as follows: c(x) 6


( 
def f + px if 0 < l ≤ x ≤ u, 
c(x) = 
0 if x = 0, r -
l u x
where f is a fixed production cost, p is a cost of
Fig. 1.1 Cost function
producing one product unit, l and u are minimum
and maximum production capacities.
Introducing a new binary variable y (y = 1 if product is produced, and y = 0 oth-
erwise) and adding the variable lower and upper bounds ly ≤ x ≤ uy, we transform
the nonlinear c(x) into a linear function, c(x, y) = ax + by, of two variables x and y.

1.1.3 Approximation of Nonlinear Functions

Let a nonlinear function y = f (x) be given on an interval [a, b], and let us choose a
partition of this interval:

a = x̄1 < x̄2 < · · · < x̄r = b

Connecting the neighboring break-points (x̄k , ȳk = f (x̄k )) and (x̄k+1 , ȳk+1 = f (x̄k+1 ))
by the line segments, we obtain a piecewise-linear approximation, f˜(x), of the func-
tion f (x) (Fig. 1.2). Now we can write down the following system to describe the
set of points (x, y) lying on the graph of f˜:
4 1 Introduction

f (x) 6

f (x̄3 ) r
@
 @r
f (x̄4 )  J

f (x̄6 ) r
J

r
J 
f (x̄2 ) J 
 
 J
r
 
f (x̄5 ) J
r
 J

f (x̄1 )
-
a = x̄1 x̄2 x̄3 x̄4 x̄5 x̄6 = b x

Fig. 1.2 Piecewise-linear approximation of a nonlinear function

r
x= ∑ λk x̄k , (1.3a)
k=1
r
y= ∑ λk ȳk , (1.3b)
k=1
r
∑ λk = 1, (1.3c)
k=1
λk ≤ δk , k = 1, . . . , r, (1.3d)
δi + δ j ≤ 1, j = 3, . . . , r, i = 1, . . . , j − 2, (1.3e)
λk ≥ 0, δk ∈ {0, 1}, k = 1, . . . , r. (1.3f)

Equations (1.3a)–(1.3c) ensure that the point (x, y) belongs to the convex hull
of the points (x̄1 , ȳ1 ), . . . , x̄1 , ȳ1 ). The other relations, (1.3d)–(1.3f), require that no
more than two variables λk take nonzero values, and the indices of these non-zero
variables be consecutive numbers. These conditions reflect the requirement that the
point (x, y) must lie on some line segment connecting two neighboring break-points.
It should be noted that almost all modern commercial MIP solvers take into
account Ineqs. (1.3d) and (1.3e) algorithmically, organizing branching in a special
way (see Sect. 6.3.2). In this case, it is not necessary to explicitly specify these
inequalities, it suffices to indicate that Eq. (1.3c) is of type SOS2 (Special Ordered
Set of Type 2).
1.1 Integerality and Nonlinearity 5

1.1.4 Approximation of Convex Functions

If f (x) is a convex function, then in many cases we can represent the relation y =
f (x) without introducing integer variables. As before, given a partition a = x̄1 <
x̄2 < · · · < x̄r = b of the interval [a, b], we need to approximate f (x) with a piecewise
linear function f˜ (see Fig. 1.3).

f (x) 6
f (x̄5 )
r

f (x̄1 )

@
f (x̄2 ) @rP

f (x̄4 ) r

Pr
PP 

f (x̄3 ) -
x
a = x̄1 x̄2 x̄3 x̄4 x̄5 = b

Fig. 1.3 Approximation of a convex function

Let us define the numbers


f (x̄k+1 ) − f (x̄k )
dk = x̄k+1 − x̄k , qk = , k = 1, . . . , r − 1.
dk
As f is convex, we have q1 ≤ q2 ≤ · · · ≤ qr−1 . Introducing auxiliary real variables
xk (k = 1, . . . , r − 1), we can write down the following representation for the relation
y = f˜(x):
r−1
x= ∑ xk ,
k=1
r−1 (1.4)
y = f (a) + ∑ qk xk ,
k=1
0 ≤ xk ≤ dk , k = 1, . . . , r − 1.
It is not difficult to justify the following statement.
Proposition 1.1. If in a MIP containing System (1.4) all coefficients of the variable
y are positive in all constraints with the sign ”≤”, and negative in all constraints
with the sign ”≥” and in the objective function (assuming that the objective function
is maximized), then (1.4) is sufficient to represent the relation y = f˜(x).
6 1 Introduction

1.1.5 Logical Conditions

Formally, we write down logical conditions using boolean variables and formulas.
Any boolean variable can take only two values: true and false. From boolean vari-
ables, using binary logical operations ∨ (or), ∧ (and), and a unary operation ¬ (¬x
means not x), we can make up boolean formulas in much the same way that we can
make up algebraic expressions using arithmetic operations over real variables. For
example,
(x1 ∨ ¬x2 ) ∧ (¬x1 ∨ x3 ) (1.5)
is a boolean formula. Substituting values for boolean variables, we can calculate the
value of the boolean formula using the rules presented in Table 1.1.

Table 1.1 Logical operations ¬, ∧ and ∨


x1 x2 x1 ∧ x2 x1 ∨ x2
x ¬x false false false false
false true true false false true
true false false true false true
true true true true

For example, for a truth set (x1 , x2 , x3 ) = (true,false,false), (1.5) takes the value
of false.
Any boolean formula of n boolean variables can be represented in a conjunctive
normal form (CNF): !
m
^ _ σ ij
xj , (1.6)
i=1 j∈Si

where Si ⊆ {1, . . . , n} (i = 1, . . . , m) and all σ ij ∈ {0, 1}. Here we use the following
def def
notation: x1 = x and x0 = ¬x. Note that (1.5) is already represented
 asa CNF.
W σ ij
CNF (1.6) takes the value of true only if every clause j∈Si x j contains at
least one literal (a literal is a variable or its negation) with the value of true. If we
identify false with 0, and true with 1, then the negation operation ¬ converts x into
1 − x. In view of what has been said, the truth sets on which (1.6) takes the value of
true, are the solutions to the following system of inequalities:

∑ x j + ∑ (1 − x j ) ≥ 1, i = 1, . . . , m,
j∈Si1 j∈Si0 (1.7)
x j ∈ {0, 1}, j = 1, . . . , n.

def
Here, for δ ∈ {0, 1}, we use the notation Siδ = { j ∈ Si : σ ij = δ }. For example, the
CNF
(x1 ∨ x2 ∨ x3 ) ∧ (x1 ∨ ¬x2 ) ∧ (x2 ∨ ¬x3 ) ∧ (x3 ∨ ¬x1 )
1.2 Multiple Alternatives and Disjunctions 7

takes the value of true on the sets that are solutions to the system

x1 + x2 + x3 ≥ 1,
x1 + (1 − x2 ) ≥ 1,
x2 + (1 − x3 ) ≥ 1,
x3 + (1 − x1 ) ≥ 1,
x1 , x2 , x3 ∈ {0, 1}.

1.2 Multiple Alternatives and Disjunctions

It is required that out of given m inequalities

Ai x ≤ bi , i = 1, . . . , m,

at least any q inequalities be satisfied. For example, if two jobs i and j are executed
on the same machine, then we must require the validity of the following disjunction:

ei − s j ≤ 0 or e j − si ≤ 0,

where si and ei are, respectively, the start and end times of job i.
Introducing binary variables yi for i = 1, . . . , m, with yi = 0 if Ai x ≤ bi is valid,
and yi = 0 otherwise, we can take into account the required condition as follows:

Ai x ≤ bi + M(1 − yi ), i = 1, . . . , m,
m
∑ yi ≥ q,
i=1
yi ∈ {0, 1}, i = 1, . . . , m.

Here M is a sufficiently large number such that the inequalities Ai x ≤ bi + M are


satisfied automatically for all feasible solutions x of the problem being solved.
Now let us consider the case when at least one of the two conditions must hold:

x1 ≥ a or x2 ≥ b.

For example, we want to have a workstation with x1 ≥ a processors or a single-


processor system with a processor frequency x2 ≥ b.
If both variables x1 and x2 are nonnegative, then, introducing an auxiliary binary
variable y, we can express the required disjunction by two inequalities:

x1 ≥ ay, x2 ≥ b(1 − y).

Next, we demonstrate the use of multiple alternatives and disjunctions on three


examples.
8 1 Introduction

1.2.1 Floor Planning

On a rectangular chip of width W and height H, n rectangular modules must be


placed, module i has a width wi and height hi .
We choose a coordinate system with the origin O in the lower left corner of the
chip, the Ox axis directed to the right, and the Oy axis directed upwards. Let the pair
of real variables xi , yi determine the coordinate of the lower-left corner of module i,
i = 1, . . . , n. Obviously, the following inequalities must hold:

0 ≤ xi ≤ W − wi , 0 ≤ yi ≤ H − hi , i = 1, . . . , n. (1.8)

To ensure that two modules, i and j, do not intersect, at least one of the following
four inequalities must be valid:

xi + wi ≤ x j (i lies to the left of j),


x j + w j ≤ xi (i lies to the right of j),
yi + hi ≤ y j (i lies below j),
y j + h j ≤ yi (i lies above j).

Introducing four binary variables zli j , zrij , zbij , and zaij , we can represent this disjunc-
tion by the following system of inequalities

xi + wi ≤ x j +W (1 − zli j ),
x j + w j ≤ xi +W (1 − zrij ),
yi + hi ≤ y j + H(1 − zbij ), (1.9)
y j + h j ≤ yi + H(1 − zaij ),
zli j + zrij + zbij + zaij ≥ 1.

Usually the modules can be rotated by 90◦ . For each module i = 1, . . . , n, we


introduce an additional binary variable δi , which takes the value of 1 if the module
is rotated. Now the width and height of module i are respectively equal

(1 − δi )wi + δi hi and (1 − δi )hi + δi wi .

In view of this, (1.8) and (1.9) are rewritten as follows:

0 ≤ xi ≤ W − ((1 − δi )wi + δi hi ), i = 1, . . . , n,
0 ≤ yi ≤ H − ((1 − δi )hi + δi wi ), i = 1, . . . , n,
xi + (1 − δi )wi + δi hi ≤ x j +W (1 − zli j ), i = 1, . . . , n − 1; j = i + 1, . . . , n,
x j + (1 − δ j )w j + δ j h j ≤ xi +W (1 − zrij ), i = 1, . . . , n − 1; j = i + 1, . . . , n,
yi + (1 − δi )hi + δi wi ≤ y j + H(1 − zbij ), i = 1, . . . , n − 1; j = i + 1, . . . , n,
y j + (1 − δ j )h j + δ j w j ≤ yi + H(1 − zaij ), i = 1, . . . , n − 1; j = i + 1, . . . , n,
1.2 Multiple Alternatives and Disjunctions 9

zli j + zrij + zbij + zaij ≥ 1, i = 1, . . . , n − 1; j = i + 1, . . . , n,


zli j , zrij , zbij , zaij ∈ {0, 1}, i = 1, . . . , n − 1; j = i + 1, . . . , n,
δi ∈ {0, 1}, i = 1, . . . , n.

1.2.2 Linear Complementarity Problem

It is necessary to solve the following system

Ax + y = b, (1.10a)
T
x y = 0, (1.10b)
x, y ≥ 0, (1.10c)

where A is a nonsingular n × n-matrix, b ∈ Rn , and x, y are n-vectors of continuous


variables. This problem is known as the linear complementarity problem. Its impor-
tant special cases are the linear programming problem, the quadratic programming
problem under linear constraints (see Sect. 1.2.3), and the problem of finding a Nash
equilibrium in a bimatrix game (see Exercise 1.7).
Despite its name, (1.10) is not a linear problem because (1.10b) is not a linear
equation, which, due to the non-negativity of the vectors x and y, is equivalent to the
system
xi yi = 0, i = 1, . . . , n, (1.11)
in which each equality xi yi = 0 expresses the disjunction: xi = 0 or yi = 0.
Assuming that we know upper bounds xi ≤ gi and yi ≤ hi 1 for all variables xi and
yi , we can represent (1.11) by the following system:

xi ≤ gi zi , yi ≤ hi (1 − zi ) , zi ∈ {0, 1}, i = 1, . . . , n.

1.2.3 Quadratic Programming Under Linear Constraints

The quadratic programming problem under linear constraints is formulated as fol-


lows:
1
cT x + xT Dx → min,
2
Ax ≥ b, (1.12)
x ≥ 0,

1 For an integer matrix A, we can estimate the values of xi and yi from Cramer’s rule (do this as an
exercise). But from a practical point of view, such estimates are too rough.
10 1 Introduction

where c ∈ Rn , b ∈ Rm , A is a real m × n-matrix, D is a real symmetric n × n-matrix,


x is a vector of n variables.
If a point x is an optimal solution to (1.12), then, by the first-order necessary
optimality conditions (also known as the Karush-Kuhn-Tucker (KKT) conditions),
there exists a vector y ∈ Rm such that the following constraints hold:

x ≥ 0, y ≥ 0,
Ax ≥ b,
c + Dx − AT y ≥ 0, (1.13)
yT (Ax − b) = 0,
T
c + Dx − AT y x = 0.

If (x, y) is a solution to (1.13), then the point x is called a stationary point (or KKT-
point) for (1.12). If D is a positive semi-definite matrix, then the objective function
is convex, and, consequently, every stationary point is an optimal solution to (1.12).
Let us consider the next MIP:
z → max,
0 ≤ Au − bz ≤ e − α,
0 ≤ Du−AT v + cz ≤ e − β ,
0 ≤ u ≤ β,
(1.14)
0 ≤ v ≤ α,
0 ≤ z ≤ 1,
α ∈ {0, 1}m ,
β ∈ {0, 1}n .

We denote by z∗ the optimal objective value in (1.14). It is not difficult to verify the
following statements.
1. If z∗ = 0, then problem (1.12) does not have stationary points.
2. If z∗ > 0 and (u∗ , v∗ , z∗ ) is an optimal solution to (1.14), then the vectors
x∗ = (1/z∗ ) u∗ , y∗ = (1/z∗ ) v∗ ) make up a solution to (1.13), and hence x∗ is a
stationary point for (1.12).

1.3 How an LP May Turn Into a MIP?

When solving a practical problem, it is very important to foresee possible modifi-


cations of the initial model from the very beginning. Such modifications can easily
bring the model out of the class of problems that can be solved by your program. In
particular, it is very often the case with LP models, when addition of some simple
1.3 How an LP May Turn Into a MIP? 11

and natural constraints transforms an LP into a MIP. Let us demonstrate this on an


example of the transportation problem, which is one of the most famous LP models.
There are m suppliers and n consumers of some product. Supplier i has available
ai product units, and consumer j wants to get b j product units. The unit transporta-
tion cost from supplier i to consumer j is ci j . Let xi j denote the quantity of the
product delivered by supplier i to consumer j. It is necessary to determine a supply
plan, X = [xi j ], for which the total transportation cost is minimum.
This transport problem is formulated as the following LP:
m n
∑ ∑ ci j xi j → min, (1.15a)
i=1 j=1
n
∑ xi j ≤ ai , i = 1, . . . , m, (1.15b)
j=1
m
∑ xi j = b j , j = 1, . . . , n, (1.15c)
i=1
xi j ≥ 0, i = 1, . . . , m; j = 1, . . . , n. (1.15d)

Objective (1.15a) is to minimize the total transportation cost. Inequalities (1.15b)


require that the total amount of all deliveries of each supplier does not exceed the
available quantity. Equations (1.15c) guarantee that each consumer will receive as
much as is needed.
Below, we list several reasons why an optimal solution to (1.15) may be unac-
ceptable in practice.
1. In many cases, especially when small quantities are supplied, fixed costs con-
stitute a significant part of the transportation costs. In this case, the cost of supplying
xi j > 0 product units from supplier i to consumer j is fi j + ci j xi j , where fi j is a fixed
cost.
2. It may turn out that, according to an optimal plan, some consumers have only
one supplier, and they may require to diversify their supplies. Therefore, we have
a new condition that each consumer j must receive the product from at least k j
suppliers.
3. It may well be that a particular consumer receives from some supplier, say,
less than one percent of the demand, and such splitting of supplies may also not
satisfy this consumer. Therefore, we have another requirement that the volume of
any delivery to each consumer j be not less than u j . Obviously, too small deliveries
are unprofitable also for suppliers, and we suppose that it is required that the volume
of any delivery from each supplier i be not less than vi .
To take into account all the above requirements, we introduce an additional fam-
ily of binary variables yi j , where yi j = 1 only if xi j > 0. An extended formulation of
the transportation problem is written as follows:
12 1 Introduction
m n
∑ ∑ ( fi j yi j + ci j xi j ) → min, (1.16a)
i=1 j=1
n
∑ xi j ≤ ai , i = 1, . . . , m, (1.16b)
j=1
m
∑ xi j = b j , j = 1, . . . , n, (1.16c)
i=1
m
∑ yi j ≥ k j , j = 1, . . . , n, (1.16d)
i=1
xi j ≤ min{ai , b j }yi j , i = 1, . . . , m; j = 1, . . . , n, (1.16e)
xi j ≥ max{vi , u j }yi j , i = 1, . . . , m; j = 1, . . . , n, (1.16f)
yi j ∈ {0, 1}, i = 1, . . . , m; j = 1, . . . , n. (1.16g)

Objective (1.15a) is to minimize the sum of the fixed and variable transportation
costs. Inequalities (1.16d) ensure that each consumer j will have at least k j suppliers.
Together (1.16e) and (1.16f) imply that yi j = 0 only if xi j = 0. Inequalities (1.16f)
require that the volume of each delivery be not less than the minimum delivery
volumes of the involved supplier and consumer.

1.4 Polyhedra

A polyhedron is the set of solutions of a system of linear inequalities. Bounded poly-


hedra are called polytopes in order to distinguish them from unbounded polyhedra.
An alternative definition of a polyhedron is given by the following theorem.
Theorem 1.1 (Weyl). A set P ⊆ Rn is a polyhedron if and only if P = conv(S) +
cone(T ) for some finite subsets of vectors S, T ⊂ Rn .
The following notations were used in the above theorem:
• conv(S): convex hull of a set S ⊆ Rn that is the minimum (by inclusion) convex
set containing S. Recall that a set is called convex if, with any two its points x1
and x2 , it contains the segment {(1 − λ )x1 + λ x2 : 0 ≤ λ ≤ 1} joining them.
• cone(T ): convex cone generated by a set of vectors T ⊆ Rn . If the set T is finite,
T = {y1 , . . . , yq }, then
q
def
cone(T ) = {x : x = ∑ λi yi , λ ∈ Rq+ }.
i=1

Theorem 1.1 also gives an alternative definition of a polytope as a convex hull of


a finite set of points.
1.4 Polyhedra 13

An affine subspace of Rn is the result of a parallel translation of a linear subspace.


More precisely, for any point a of the affine subspace A ⊆ Rn the set La = {x − a :
x ∈ A } is a linear subspace, and A = a + La .
The dimension of a set X ⊆ Rn is the dimension of the minimum affine subspace
containing X. An affine subspace of dimension n − 1 is called a hyperplane. We
can also define a hyperplane as the set H(a, β ) of the solutions to a linear equation
aT x = β , where a ∈ Rn (a 6= 0), β ∈ R. The hyperplane H(a, β ) divides the vector
space Rn into two half-spaces
def def
H≤ (a, β ) = {x ∈ Rn : aT x ≤ β } and H≥ (a, β ) = {x ∈ Rn : aT x ≥ β }.

We can say that the polyhedron is the intersection of a finite number of half-spaces.
Let P ∈ Rn be a polyhedron of size d, and H(a, β ) be a hyperplane. If P com-
pletely belongs to one of the half-spaces H≤ (a, β ) or H≥ (a, β ) and touches the hy-
perplane H(a, β ) (P ∩ H(a, β ) 6= 0),
/ then P ∩ H(a, β ) and H(a, β ) are called a face
and a supporting hyperplane of the polyhedron P. We specifically distinguish three
types of faces:
• facet: face of size d − 1;
• vertex: face of size 0 (dot);
• edge: face of size 1 (segment).
Two vertices of a polyhedron are called adjacent if they are connected by an edge
(lie on one edge).
Any supporting hyperplane that is tangent to a facet is also called a facet defining
hyperplane. For a full-dimensional polyhedron (of dimension n), the facet defining
hyperplanes are uniquely defined (up to multiplication by a positive scalar). If in
a system of inequalities Ax ≤ b a hyperplane H(Ai , bi ) is not facet defining for the
def
polyhedron P(A, b) = {x ∈ Rn : Ax ≤ b}, then it can be excluded from the system of
inequalities without expanding the set of its solutions. In practice, we can recognize
facet defining hyperplanes based on the following statement.
Proposition 1.2. A hyperplane is facet defining for a polyhedron P of dimension d
if and only if it contains at least d vertices of P.
A three-dimensional polytope P, shown in Fig. 1.4, is formed by the intersection
of half-spaces, which are given by the inequalities:

x1 + x2 + x3 ≤ 4,
x2 ≤ 2,
x3 ≤ 3,
3x1 + x3 ≤ 6,
x1 ≥ 0,
x2 ≥ 0,
x3 ≥ 0.

It has 7 facets, 8 vertices (depicted as bold dots), and 13 edges (segments that con-
nect the vertices).
14 1 Introduction
x3
6 (0, 0, 3)
s  s (0, 1, 3)
 @
s 
 @
(1, 0, 3)  @
L @s (0, 2, 2)
 L
 L
 L
 L
 L
 (0, 2, 0)
 (0, 0, 0) s L s
L
-
x2
 L
 L
 L
s L s
(2, 0, 0) (2, 2, 0)
x1

Fig. 1.4 Example of a polytope

1.5 Good and Ideal Formulations, Reformulation

In many cases, the same problem can be formulated in several different ways. Not
all formulations are equivalent. In this section we will try to understand why one
formulation is better than another.
Let us consider a mixed integer set
def
P(A, b; S) = {x ∈ Rn : Ax ≤ b, x j ∈ Z for j ∈ S}, (1.17)

where A is a real m × n-matrix, b ∈ Rm , S ⊆ {1, . . . , n}. If in the definition of the set


P(A, b; S) we drop the requirement about integrality of the variables, we obtain the
relaxation polyhedron, P(A, b), for this set.
Let P(A, b; S) and P(A0 , b0 ; S) with P(A, b; S) = P(A0 , b0 ; S) be two different for-
mulations for the feasible domain of some MIP. We say that P(A, b; S) is stronger
than P(A0 , b0 ; S) if the relaxation polyhedron P(A, b) is contained in the relaxation
polyhedron P(A0 , b0 ).

x2 x2 x2
6 6 6
r @r r r r r

3  @ 3  3
2
r r r @  2 r r r 2 @ r r
r
aa r r 
S r r
1 aa  1
S
1 @r r
a @
 - S - -
1 2 3 x1 1 2 3 x1 1 2 3 x1

Fig. 1.5 Three formulations for a set in the plane


1.5 Good and Ideal Formulations, Reformulation 15

Figure 1.5 presents three different formulations for a set of seven points in the
plane. The rightmost figure shows the ideal formulation, the polyhedron of which
coincides with the convex hull of the given set of points. If it is possible to write
down an ideal formulation for some MIP, then this MIP can be considered as an LP.
If a MIP is solved by the branch-and-bound method (see Sect. 6.1), as a rule,
the use of a stronger formulation leads to a reduction in the solution time due to
the decrease in the number of branchings. An obvious way to strengthen an existing
formulation P(A, b; S) is to add new inequalities valid for P(A, b; S), but not valid for
the relaxation polyhedron P(A, b). We say that an inequality is valid for some set if
all points from this set satisfy this inequality. Let inequalities aT x ≤ u and α T x ≤ v
hold for all points in P(A, b; S). It is said that the inequality aT x ≤ u is stronger than
the inequality α T x ≤ v (or the inequality aT x ≤ u dominates the inequality α T x ≤ v)
if
P(A, b) ∩ H≤ (a, u) ⊂ P(A, b) ∩ H≤ (α, v).

Example 1.1 We need to compare two formulations


( )
n
n+1
x ∈ {0, 1} , ∑ x j ≤ nxn+1 (1.18)
j=1

and

x ∈ {0, 1}n+1 ,

x j ≤ xn+1 , j = 1, . . . , n (1.19)

for the set of points

X = {x ∈ {0, 1}n : xn+1 = 0 ⇒ x1 = x2 = · · · = xn = 0} .

Solution. Let
n
P1 = {x ∈ Rn+1 : ∑ x j ≤ nxn+1 , 0 ≤ x j ≤ 1, j = 1, . . . , n + 1},
j=1

P2 = x ∈ Rn+1 : x j ≤ xn+1 , 0 ≤ x j ≤ 1, j = 1, . . . , n + 1


be the relaxation polytopes for (1.18) and (1.19), respectively. Summing together
the inequalities
x j ≤ xn+1 , j = 1, . . . , n,
we obtain the inequality
n
∑ x j ≤ nxn+1 .
j=1

Thus, we have shown that P2 ⊆ P1 .


To show that P2 is a proper subset of P1 (P2 ⊂ P1 ), it is enough to specify a point
from P1 that does not belong to P2 . There are many such points. In particular, the
16 1 Introduction

vertices (e j , 1/n) ( j = 1, . . . , n) of P1 do not belong to P2 . Here e j is the j-th unit


vector in Rn . Since P2 ⊂ P1 , (1.19) is stronger than (1.18).
We now show that (1.19) is an ideal formulation. For this it suffices to show
that the polytope P2 is integer, that is, all its vertices are integer. Let x̄ ∈ P2 . If
0 < x̄n+1 < 1, then, taking into account the inequalities x̄ j ≤ x̄n+1 , we conclude that
the point (1/x̄n+1 ) · e also belongs to P2 . But since 0 ∈ P2 and

1
x̄ = (1 − x̄n+1 ) · 0 + x̄n+1 · e,
x̄n+1
then x̄ lies on the segment joining two points of the polytope P2 . Hence, x̄ is not a
vertex of P2 .
So, if x̄ is a vertex of P2 , then its component x̄n+1 is 0 or 1. If x̄n+1 = 0, then
all other components x̄ j are also equal to zero. If x̄n+1 = 1, then the inequalities
x j ≤ xn+1 become the inequalities x j ≤ 1. Therefore, the point (x̄1 , . . . , x̄n ) must be
one of the vertices of the cube [0, 1]n , which are all integer. t
u

Let us note, that the disaggregation of inequalities, i.e., replacing one inequality
with a system of inequalities that is stronger than the initial inequality, is a powerful
preprocessing technique (see Sect. 6.5).

1.6 Strong Inequalities

Let A be a real m × n -matrix, and b ∈ Rm . We say that an inequality α T x ≤ β is a


consequence of the system of inequalities Ax ≤ b if α T x ≤ β holds for all solutions
to Ax ≤ b.
Proposition 1.3. An inequality α T x ≤ β is a consequence of a consistent system of
inequalities Ax ≤ b if and only if there exists a vector u ∈ Rm T T
+ such that α = u A
T
and β ≥ u b.
Proof. If α T = uT A and β ≥ uT b for some u ∈ Rm
+ , then, for any solution x̄ of the
system Ax ≤ b, the inequality

α T x̄ = uT Ax̄ ≤ uT b ≤ β

holds, and, therefore, α T x ≤ β is a consequence of Ax ≤ b.


Now we assume that an inequality α T x ≤ β is a consequence of a system Ax ≤ b.
By Theorem 3.1 (of LP duality), we have

β ≥ max{α T x : Ax ≤ b} = min{bT u : AT u = α, u ≥ 0}.

Let ū ∈ Rm T T
+ be an optimal solution to the right-hand LP. Then α = A ū and β ≥ ū b.
t
u
1.7 Extended Formulations 17

An inequality Ai x ≤ bi is redundant in a system of inequalities Ax ≤ b if


P(A, b) = P(AM\{i} , bM\{i} ), where M = {1, . . . , m}. Proposition 1.3 allows us to
give an algebraic equivalent to this geometric definition: the inequality Ai x ≤ bi is
redundant in Ax ≤ b if it is a consequence of the subsystem of all other inequalities
Ak x ≤ bk , k 6= i. Note that this algebraic definition is most often used to identify re-
dundant inequalities. We also note that the presence of redundant inequalities in the
constraint system of an LP is highly undesirable, since this can significantly slow
down the process of solving this LP.
Now we give a characterization of non-redundant inequalities of a system Ax ≤
b. Since a non-redundant inequality cannot be removed from the system without
expanding the polyhedron P(A, b), each such inequality must touch the polyhedron
P(A, b) over some its facet (see Sect. 1.4). Let us remember that such inequalities
are called facet defining.
Proposition 1.4. If the dimension of a polyhedron P ⊆ Rn is d, then a valid for
P inequality α T x ≤ β is facet defining for P if and only if P contains d affinely
independent points2 lying on the hyperplane H(α, β ).
Now let us consider a mixed integer set P(A, b; S) ⊆ Rn . We say that out of two
valid for P(A, b; S) inequalities, α T x ≤ β and π T x ≤ γ, the inequality α T x ≤ β is
stronger than (dominates) the inequality π T x ≤ γ, if the inclusion

P(A, b) ∩ H≤ (α, β ) ⊂ P(A, b) ∩ H≤ (π, γ)

is valid. An inequality is called strong if there exists no other inequality that is


stronger than it.
Let us call a set from Rn polyhedral if its convex hull is a polyhedron. Suppose
that the set P(A, b; S) is bounded, and all elements of the matrix A and vector b are
rational numbers. Under this assumption, the set X = conv(P(A, b; S)) is a polyhe-
dron (see Exercise 4.1). Therefore, the inequalities that are facet defining for X are
strong for the set P(A, b; S). Since all vertices of the polyhedron X belong to the set
P(A, b; S), we can reformulate Proposition 1.4 as follows.
Proposition 1.5. If the dimension of a polyhedral set P(A, b; S) is equal to d, then a
valid for P(A, b; S) inequality α T x ≤ β is strong for P(A, b; S) if and only if P(A, b; S)
contains d affinely independent points lying on the hyperplane H(α, β ).

1.7 Extended Formulations

We can strengthen a formulation of a MIP by adding to it new constraints. Alter-


natively, we can try to develop an extended formulation by adding to the existing
formulation new variables and constraints. In this section we consider two examples
2 A set of points x1 , . . . , xd from Rn is affinely independent if all these points do not lie in an affine
subspace of dimension less than d. Equivalently, the points x1 , . . . , xd are affinely independent if
the points x2 − x1 , . . . , xd − x1 are linearly independent.
18 1 Introduction

of extended formulations trying to understand why some formulations are stronger


than others.

1.7.1 Single-Product Lot-Sizing Problem

Consider a single-product version of the lot-sizing problem. A firm is producing


some product, production and storage capacities are unlimited (in comparison to the
demands). The planning horizon consists of T periods. For each period t = 1, . . . , T ,
we know
• dt : demand for product;
• ft : fixed production cost;
• ct : unit production cost;
• ht : unit storage cost.
Inventory of the product in the warehouse before the start of the planning horizon
is s0 .
It is necessary to determine how many units of the product to produce in each
period in order to fully meet the demands and so that the total production and storage
cost over all T periods is minimum.
For t = 1, . . . , T , we introduce the following variables:
• xt : amount of product produced in period t;
• st : amount of product stored in the warehouse at the end of period t;
• yt = 1, if the product is produced in period t, and yt = 0 otherwise.
Having determined Dt = ∑Tτ=t dτ , we can write the following MIP:
T
∑ ( ft yt + ct xt + ht st ) → min, (1.20a)
t=1
st−1 + xt = dt + st , t = 1, . . . , T, (1.20b)
0 ≤ xt ≤ Dt yt , t = 1, . . . , T, (1.20c)
yt ∈ {0, 1}, t = 1, . . . , T. (1.20d)

Objective (1.20a) is to minimize total expenses over all T periods. Each balance
equation in (1.20b) relates two neighboring periods: the amount of product, st−1 , in
the warehouse at the end of period t − 1 plus the amount, xt , produced in period t
equals the demand, dt , in period t plus the amount, st , stored in the warehouse at the
end of period t. The inequalities in (1.20c) impose the implications: yt = 0 ⇒ xt = 0.
If all fixed costs, ft , are positive, then for an optimal solution (x∗ , y∗ ) of the
relaxation LP for (1.20), we have

yt∗ = xt∗ /Dt , t = 1, . . . , T.


1.7 Extended Formulations 19

Consequently, for all producing periods t (when xt∗ > 0), except for the last one, yt∗
is a fractional number, since yt∗ = xt∗ /Dt ≤ dt /Dt < 1. Many integer variables taking
fractional values is a clear indicator that the used formulation is weak.
To obtain an ideal formulation for our lot-sizing problem, we need to add to
(1.20) the system of (l, S)-inequalities:

∑ xt + ∑ dtl yt ≥ d1l , S ⊆ {1, . . . , l}, l = 1, . . . , T, (1.21)


t∈S t∈S̄

j
where S̄ = {1, . . . , l} \ S and di j = ∑t=i dt . This inequalities reflect the following
simple observation: the amount of product produced in periods t ∈ S (∑t∈S xt ) and
the maximum amount of product that can be produced in periods t ∈ S̄ for use in the
first l periods (∑t∈S̄ dtl yt ) must be no less than the product demand in these first l
periods (d1l ).
We see that the ideal formulation for the set of feasible solutions to (1.20) con-
tains an exponentially many inequalities. Although this is not a big obstacle to us-
ing this formulation in practice, it is still impossible to solve such a MIP using the
standard software, and therefore, to represent any exponentially large family of in-
equalities, we need to implement a specialized separation procedure (for example
see Sect. 6.6).
We can strengthen (1.20) in another way by disaggregating the decision vari-
ables: xt = ∑Tτ=t xtτ , where, for t = 1, . . . , T , and τ = t, . . . , T , the new variable xtτ
represents the amount of the product produced in period t for period τ.
First, we exclude from (1.20) the variables st . Adding together the balance equa-
tions sk−1 + xk = dk + sk for k = 1, . . . ,t, we obtain
t t
st = s0 + ∑ xk − ∑ dk .
k=1 k=1

Using these equalities, we rewrite the objective function in the following way
!
T t t
∑ ( ft yt + ct xt + ht s0 + ∑ xk − ∑ dk
t=1 k=1 k=1
T T T T
= ∑ ( ft yt + wt xt ) + K = ∑ ft yt + ∑ ∑ wt xtτ + K,
t=1 t=1 t=1 τ=t

T
where wt = ct + ht + · · · + hT and K = ∑t=1 ht (s0 − ∑tk=1 dk ).
In the new variables xtτ , (1.20) can be reformulated as follows:
20 1 Introduction

T T T
∑ ft yt + ∑ ∑ wt xtτ → min,
t=1 t=1 τ=t
τ
∑ xtτ = dτ , τ = 1, . . . , T, (1.22)
t=1
0 ≤ xtτ ≤ dτ yt , t = 1, . . . , T ; τ = t, . . . , T,
yt ∈ {0, 1}, t = 1, . . . , T.

It is not difficult to show that among the solutions of the relaxation LP for (1.22)
there are solutions (x∗ , y∗ ) for which all components of the vector y∗ are integer
and from xtτ ∗ > 0 it follows that x∗ = d , i.e., the whole demand for product in
tτ τ
any period τ is fully produced only in one period. For this reason, (1.22) can be
considered as an ”almost” ideal formulation.
The main drawback of many extended formulations is their large size. In our case,
we replaced Formulation (1.20) having 3T variables and 2T nontrivial3 constraints
with Formulation (1.22) having T (T + 1)/2 variables and 2T constraints. For exam-
ple, if T = 100, we have only 300 variables in the first case, and 5050 in the second
case. The difference is huge! Sometimes, it is more efficient to use in practice so-
called approximate extended formulations, such a formulation is obtained by adding
to the basic compact formulation only a part of the ”most important” variables and
constraints of the extended formulation.

1.7.2 Fixed Charge Network Flows

A transportation network is given by a directed graph (digraph) G = (V, E). For each
node v ∈ V , we know the demand dv for some product. If dv > 0, then v is a demand
node; if dv < 0, then v is a supply node; dv = 0 for transit nodes. It is assumed
that supply and demand are balanced: ∑v∈V dv = 0. The capacity of an arc e ∈ E is
ue > 0, and the cost of shipping xe > 0 units of product along this arc is fe + ce xe .
Naturally, if the product is not moved through the arc (xe = 0), then nothing is paid.
The fixed charge network flow problem (FCNF) is to decide on how to transport the
product from the supply to the demand nodes so that the transportation expenses are
minimum.
The FCNF problem appears as a subproblem in many practical applications such
as designing transportation and telecommunication networks, or optimizing supply
chains.
Introducing the variables
• xe : flow (quantity of shipping product) through arc e ∈ E,
• ye = 1 if product is shipped (xe > 0) through arc e, and ye = 0 otherwise,
we formulate the FCNF problem as follows:

3 Normally, the lower and upper bounds for variables are called trivial constraints
1.7 Extended Formulations 21

∑ ( fe ye + ce xe ) → min, (1.23a)
e∈E

∑ xe − ∑ xe = dv , v ∈ V, (1.23b)
e∈E(V,v) e∈E(v,V )

0 ≤ xe ≤ ue ye , e ∈ E, (1.23c)
ye ∈ {0, 1}, e ∈ E. (1.23d)

Here E(V, v) (resp., E(v,V )) denote the sets of arcs from E that are entering (resp.,
leaving) a node v ∈ V .
Objective (1.23a) is to minimize the total transportation expenses. Each balance
equation in (1.23b) requires that the number of flow units entering a particular node
be equal to the number of flow units leaving this node. The variable upper bounds
(1.23c) are capacity restrictions with the following meaning:
• the flow through any arc cannot exceed the arc capacity;
• if some arc is not used for shipping product (ye = 0), then the flow through this
arc is zero (xe = 0).
Since, for any optimal solution of the relaxation LP, ye = xe /ue , then (1.23) can-
not be a strong formulation if the capacities, ue , of many arcs are greater than the
flows, xe , along these arcs. This, for example, happens in problems without capacity
limitations, when the numbers ue are some rough upper estimates for the values of
the arc flows xe .
We can strengthen (1.23) by disaggregating the flow variables xe . In what follows
we assume that all the values of fe and ce are non-negative. In a new formulation
for each flow unit along any arc, we will indicate its origin supply node and its
destination demand node. Let us denote by S and T , respectively, the set of supply
(dv < 0) and demand (dv > 0) nodes. We introduce two new families of variables:
• qst : number of product units supplied from node s ∈ S to node t ∈ T ;
• zste : (s,t)-flow along arc e ∈ E that is a part of flow sending from s ∈ S to t ∈ T
and going through arc e.
With these new variables, we rewrite (1.23) as follows:

∑ ( fe ye + ce xe ) → min, (1.24a)
e∈E
st
∑ ze − ∑ zste = −qst , s ∈ S, t ∈ T, (1.24b)
e∈E(V,s) e∈E(s,V )

∑ zste − ∑ zste = 0, v ∈ V \ {s,t}), s ∈ S, t ∈ T, (1.24c)


e∈E(V,v) e∈E(v,V )

∑ qst = −ds , s ∈ S, (1.24d)


t∈T

∑ qst = dt , t ∈ T, (1.24e)
s∈S

∑ zste = xe , e ∈ E, (1.24f)
(s,t)∈S×T
22 1 Introduction

0 ≤ zste ≤ min{ue , −ds , dt }ye , e ∈ E, s ∈ S, t ∈ T, (1.24g)


0 ≤ xe ≤ ue ye , e ∈ E, (1.24h)
qst ≥ 0, s ∈ S, t ∈ T, (1.24i)
ye ∈ {0, 1}, e ∈ E. (1.24j)

In this formulation, (1.24b) and (1.24c) are flow conservation constraints: for
s ∈ S and t ∈ T , zst ∈ RE is a flow from s to t of value qst , i.e., qst flow units are sent
from s to t, and, for all nodes other than s or t, the incoming and outgoing flows are
equal. Equations (1.24d) and (1.24e) ensure that each supply nodes sends and each
demand node receives the required quantity of product. Equations (1.24f) determine
the total flow along any arc by summing up all the flows going from supply to
demand nodes. The variable upper bounds (1.24g) impose the capacity limitations
for the arc flows: the flow zste along arc e sent from s to t cannot exceed the capacity
ue of arc e, the supply −ds at node s and the demand dt at node t. In fact, these more
precise variable bounds make (1.24) stronger than (1.23).

1.8 Alternative Formulations for Scheduling Problems

We have already seen that a formulation of a MIP can be strengthened by adding


to it new constraints and variables. When some existing formulation do not allow
us to solve problems of required sizes (from practical point of view), sometimes,
it is possible to completely change the modeling concept and develop an alterna-
tive formulation. In this section we consider two different modeling concepts for
a rather general scheduling problem, which subsumes as special cases a great deal
of scheduling problems studied in the literature. We have to fulfill a set of jobs
on a number of processors under certain constraints such as restrictions on the job
completion times, priorities between jobs (one job cannot start until another one is
finished), and etc. The goal is to optimize some criterion, e.g, to minimize the total
processing time, which is the completion time of the last job (assuming that the first
job starts at time 0); or to maximize the number of processed jobs.
Formally, we are given n jobs to be processed on m processors (machines). Let
def
Pj ⊆ {1, . . . , m} denote the subset of processors that can fulfill job j ∈ J = {1, . . . , n}.
Each job j is characterized by the following parameters:
• w j : weight;
• r j , d j : release and due dates (the job must be processed during the time interval
[r j , d j ]);
• pi j : processing time on processor i ∈ Pj .
Precedence relations between jobs are given by an acyclic digraph G = (J, E)
defined on the set J of jobs: for any arc ( j1 , j2 ) ∈ J, job j2 cannot start until job j1
is finished.
1.8 Alternative Formulations for Scheduling Problems 23

In general not all jobs can be processed. For a given schedule, let U j = 0 if job j
is processed, and U j = 1 otherwise. Then the problem is to find such a job schedule
for which the weighted number of not processed jobs, ∑nj=1 w jU j , is minimum. Al-
ternatively, we can say that our goal is to maximize the weighted sum of processed
jobs, which is ∑nj=1 w j (1 −U j ).

1.8.1 Continuous Time Model

In models with continuous time, the main variables correspond to events that are
defined as the moments when individual jobs begin or end. In our model, we use the
following variables:
• s j : start time of job j;
• yi j = 1 if job j is accomplished by processor i, and yi j = 0 otherwise;
• xi, j1 , j2 = 1 if both jobs, j1 and j2 , are carried out on processor i, and j1 is finished
before j2 starts, and xi, j1 , j2 = 0 otherwise.
In these variables we formulate our scheduling problem as follows:
n
∑ w j ∑ yi j → max, (1.25a)
j=1 i∈Pj

∑ yi j ≤ 1, j = 1 = 1, . . . , n, (1.25b)
i∈Pj

tj = ∑ pi j yi j , j = 1, . . . , n, (1.25c)
i∈Pj

rj ≤ sj ≤ dj −tj , j = 1, . . . , n, (1.25d)
s j2 − s j1 + M(1 − xi, j1 , j2 ) ≥ pi, j1 , j1 , j2 = 1, . . . , n, j1 6= j2 ,
i ∈ Pj1 ∩ Pj2 , (1.25e)
xi, j1 , j2 + xi, j2 , j1 ≤ yi, j2 , j1 , j2 = 1, . . . , n, j1 6= j2 ,
i ∈ Pj1 ∩ Pj2 , (1.25f)
yi, j1 + yi, j2 − xi, j1 , j2 − xi, j2 , j1 ≤ 1, j1 = 1, . . . , n − 1,
j2 = j1 + 1, . . . , n, i ∈ Pj1 ∩ Pj2 , (1.25g)
s j1 + t j1 ≤ s j2 , ( j1 , j2 ) ∈ E, (1.25h)
yi j ∈ {0, 1}, j = 1, . . . , n, i ∈ Pj , (1.25i)
xi, j1 , j2 ∈ {0, 1}, j1 , j2 = 1, . . . , n, j1 6= j2 ,
i ∈ Pj1 ∩ Pj2 . (1.25j)

Here M is a sufficiently large number, for example,

M = max d j − min r j .
1≤ j≤n 1≤ j≤n
24 1 Introduction

Objective (1.25a) is to maximize the weighted sum of accomplished jobs. Here


the sum ∑i∈Pj yi j take the value of 1 only if job j is carried out by some proces-
sor. Inequalities (1.25b) ensure that any job will be assigned to at most one pro-
cessor. Equations (1.25c) determine the actual processing times, t j , of all jobs j,
where t j is the processing time of job j on the processor it is assigned to. Inequali-
ties (1.25d) require that all jobs be processed within the given time intervals. Since
M is sufficiently large, any particular inequality in (1.25e) is a real restriction only
if xi, j1 , j2 = 1. In this case, both jobs, j1 and j2 , are assigned to processor i, and the
inequality
s j2 − s j1 ≥ pi, j1
means that job j1 is finished when job j2 starts; the latter agrees well with the equal-
ity xi, j1 , j2 = 1. Two inequalities, one from (1.25f) and the other from (1.25g), written
for particular i, j1 and j2 , imply that if both jobs, j1 and j2 , are accomplished by
processor i (yi, j1 = yi, j2 = 1), then either xi, j1 , j2 = 1 ( j1 precedes j2 ) or xi, j2 , j1 = 1
( j2 precedes j1 ) but not both. Inequalities (1.25h) reflect the precedence relations.
As a rule, formulations with big M are weak. Our formulation (1.25) is not an
exception. For simplicity, suppose that there is only one processor (m = 1, P1 =
{1, . . . , n}), and there are no precedence relations (E = 0). / For sufficiently large M,
the relaxation LP for (1.25) has an optimal solution (s∗ , x∗ , y∗ ) with y∗1 j = 1, s∗j = r j

for j = 1, . . . , n, and all x1, j1 , j2 = 1/2. Obviously, such a solution can be very far
from the problem feasible domain. As a consequence, (1.25) cannot be used for
solving scheduling problems of practical importance.

1.8.2 Time-Index Formulation

A time-index formulation is based on time-discretization, i.e., the planning horizon,


from time R = min1≤ j≤n r j to time D = max1≤ j≤n d j , is divided into periods, and
period t starts at time t − 1 and ends at time t. Now a schedule is represented by a
family of decision binary variables {x jit }, where x jit = 1 if job j starts in period t
on processor i, and x jit = 0 otherwise. To formulate precedence relations we need
three families of auxiliary variables, which are uniquely defined by the decision
variables x jit :
• y j = 1 if job j is processed, and y j = 0 otherwise;
• s j : start time of job j;
• t j : processing time of job j.
We consider the following time-index formulation:
1.8 Alternative Formulations for Scheduling Problems 25
n
∑ w j y j → max, (1.26a)
j=1
min{t,d j −pi j }
∑ ∑ x jiτ ≤ 1, t = R, . . . , D, i = 1, . . . , m, (1.26b)
1≤ j≤n: τ=max{t−pi j ,r j }
r j ≤t≤d j −pi j
d j −pi j
yj = ∑ ∑ x jit , j = 1, . . . , n, (1.26c)
i∈Pj t=r j
d j −pi j
sj = ∑ ∑ t · x jit , j = 1, . . . , n, (1.26d)
i∈Pj t=r j
d j −pi j
tj = ∑ ∑ pi j · x jit , j = 1, . . . , n, (1.26e)
i∈Pj t=r j

y j1 − y j2 ≥ 0, ( j1 , j2 ) ∈ E, (1.26f)
s j2 − s j1 ≥ t j1 , ( j1 , j2 ) ∈ E, (1.26g)
x jit ∈ {0, 1}, i ∈ Pj , t = r j , . . . , d j − pi j , j = 1, . . . , n, (1.26h)
y j ∈ {0, 1}, j = 1, . . . , n. (1.26i)

Objective (1.26a) is to maximize the weighted number of processed jobs. In-


equalities (1.26b) ensure that any processor will perform at most one job in any
period. Equations (1.26c), (1.26d), and (1.26e) determine the values of all auxiliary
variables, y j , s j , and t j . Simultaneously, Eqs. (1.26c) imply that each job can start
only once (because y j ∈ {0, 1}). The precedence relations are expressed by Ineqs.
(1.26f) and (1.26g). For each pair of related jobs ( j1 , j2 ) ∈ E, (1.26f) requires that
j2 be processed only if j1 is processed, while (1.26g) requires that, if both jobs, j1
and j2 , are processed, then j1 must be finished when j2 starts.
The time-index formulation can be easily modified to model many other types
of scheduling problems, and this is its important advantage. For example, if we set
y j = 1 for all j, and redefine the objective as follows

n u j −pi j
∑ wj ∑ ∑ (t + pi j )x jit → min,
j=1 i∈Pj t=l j

then our goal is to minimize the weighted completion time.


In addition, the optimal objective value of the relaxation LP for (1.26) — this
LP is obtained from (1.26) after dropping the requirement about integrality of the
variables — provides a strong bound on the objective value of our scheduling prob-
lem, and it dominates the bounds provided by the other known IP formulations.
This is because any solution to the relaxation LP of the time-index formulation can
be interpreted as some non-preemptive relaxation schedule. More precisely, such a
26 1 Introduction

relaxation schedule is obtained by slicing jobs into pieces and then each piece is
processed without interruption.
The main disadvantage of the time-index formulation is its size: even for one ma-
chine problems, there are n+T constraints and there may be up to nT variables. As a
consequence, for instances with many jobs and long processing intervals [r j , d j ], the
relaxation LPs will be very big in size, and their solution times will be large. Nev-
ertheless, the time-index formulation can be used in practice for solving scheduling
problems with relatively short planning horizons.

1.9 Knapsack Problems

There are two basic variations of the knapsack problems:


integer knapsack:
max{cT x : aT x ≤ b, x ∈ Zn+ }, (1.27)
0,1-knapsack:
max{cT x : aT x ≤ b, x ∈ {0, 1}n }. (1.28)
Here c ∈ Rn++ ,a ∈ Zn++ ,
and b is a positive integer.
The problem was named due to the following not very serious interpretation.
You won a prize that allows you to fill your knapsack with any items present in a
supermarket, provided that the total weight of items should not exceed the weight
limit, b, of your knapsack. Suppose that there are n types of items in the supermarket,
and the cost of one item j is c j . You will most likely want to fill your knapsack so
that the total cost of items in it is maximum. To do this, you have to solve an integer
knapsack problem. If you are allowed to take not more than one item of each type,
then you need to solve a 0,1-knapsack problem.
Of course, we could also give more important examples of the application of the
knapsack problems in practice. But in fact, the real world is not so simple, and there
are not many real situations in it that can be modeled only by one linear constraint,
even with integer variables4 . Here we study the knapsack problems for another rea-
son: the separation problems for many classes of inequalities, and the estimation
problems for a number of column generation algorithms are formulated as knap-
sack problems.
The knapsack problems are the simplest IPs, but even they, (1.27) and (1.28)),
are NP-hard5 . Despite this fact, in practice we can solve knapsack problems with
not very big coefficients relatively quickly using dynamic programming algorithms.
4 Sometimes, objecting to this statement, one recalls the aggregation of equations (see Exer-
cise 1.10) that allows us to express several constraints with just one equation. But aggregation
always weakens the formulation, and in practice it should be avoided.
5 An NP-complete problem is a recognition problems (with the answer ”yes” or ”no”), solvable

on a non-deterministic Turing machine in polynomial time. A problem P is called NP-hard if any


NP-complete problem can be solved in polynomial time using a procedure, A , for solving P; it
is assumed that one call to A takes constant time.
1.9 Knapsack Problems 27

1.9.1 Integer Knapsack

Let us consider the integer knapsack problem (1.27). For β = 0, . . . , b, we define


def
F(β ) = max{cT x : aT x = β , x ∈ Zn+ }.

It is easy to verify that the following recurrence formula holds:

F(0) = 0,
F(β ) = max F(β − a j ) + c j , β = 1, . . . , b. (1.29)
j: a j ≤β

As usual, we assume that the maximum over the empty set of alternatives is equal
to −∞.
Calculating the values of F(β ) using (1.29) is called a direct step of dynamic pro-
gramming. When all values F(β ) are calculated, an optimal solution, x∗ , to (1.27)
can be found by performing the following reverse step.
Start with β ∈ arg max F(q), and set x∗j = 0 for j = 1, . . . , n.
0≤q≤b
While β > 0, do the following computations:
find an index j such that F(β ) = F(β − a j ) + c j ,
and set x∗j := x∗j + 1, β := β − a j .
Example 1.2 We need to solve the problem

4x1 + 5x2 + x3 + 2x4 → max,


5x1 + 4x2 + 2x3 + 3x4 ≤ 7,
x1 , x2 , x3 , x4 ∈ Z+ .

Solution. First, we compute

F(0) = 0,
F(1) = −∞,
F(2) = F(0) + 1 = 1,
F(3) = max{F(1) + 1, F(0) + 2} = max{−∞, 2} = 2,
F(4) = max{F(0) + 5, F(2) + 1, F(1) + 2} = max{5, 2, −∞} = 5,
F(5) = max{F(0) + 4, F(1) + 5, F(3) + 1, F(2) + 2}
= max{4, −∞, 3, 3} = 4,
F(6) = max{F(1) + 4, F(2) + 5, F(4) + 1, F(3) + 2}
= max{−∞, 6, 6, 4} = 6,
F(7) = max{F(2) + 4, F(3) + 5, F(5) + 1, F(4) + 2} = max{5, 7, 5, 7} = 7.
28 1 Introduction

Now we can find an optimal solution x∗ . As F(7) = max F(q), we start with
0≤q≤7
β = 7 and x∗ = (0, 0, 0, 0)T . Since F(7) = F(7 − a2 ) + c2 , we set x2∗ = 0 + 1 = 1
and β = 7 − a2 = 3. Next, as F(3) = F(3 − a4 ) + c4 , we set x4∗ = 0 + 1 = 1 and
β = 3 − a4 = 0.
Therefore, the point x∗ = (0, 1, 0, 1)T is a solution to the knapsack problem of
Example 1.2. t
u

1.9.2 0,1-Knapsack

Now let us consider the 0,1-knapsack problem (1.28). For k = 1, . . . , n and β =


0, . . . , b, let us define
( )
k k
def
Fk (β ) = max ∑ c jx j : ∑ a j x j = β , x j ∈ {0, 1}, j = 1, . . . , k .
j=1 j=1

According to this definition, the optimal objective value in (1.28) is max Fn (β ).


0≤β ≤b
For k = 1, . . . , n, the following recurrence formula holds:

Fk−1 (β ), β = 0, . . . , ak − 1,
Fk (β ) = (1.30)
max{Fk−1 (β ), Fk−1 (β − ak ) + ck }, β = ak , . . . , b,

under the initial conditions

F0 (0) = 0, F0 (β ) = −∞ for β = 1, . . . , b.

Since we need to compute n × b values Fk (β ), and computing each of these val-


ues, we perform one comparison and one assignment, therefore the calculations by
Formula (1.30) can be performed in O(n b) time using O(n b) memory cells.
Having calculated all values Fk (β ), we can find an optimal solution, x∗ , to (1.28)
by executing the following reverse step.

Start with β ∈ arg max Fn (q).


0≤q≤b
For k = n, . . . , 1,
set xk∗ = 0 if Fk (β ) = Fk−1 (β ), otherwise, set xk∗ = 1 and β := β − ak .

Formula (1.30) is not the only possible one. Suppose that all values c j are integer.
Let C ∈ Z be an upper bound for the optimal objective value in (1.28). For k =
1, . . . , n and z = 0, . . . ,C, we define
( )
k k
def
Gk (z) = min ∑ a j x j : ∑ c j x j = z, x j ∈ {0, 1}, j = 1, . . . , k .
j=1 j=1
1.9 Knapsack Problems 29

For k = 1, . . . , n, the following recurrence formula holds:



Gk−1 (z), z = 0, . . . , ck − 1,
Gk (z) = (1.31)
min{Gk−1 (z), Gk−1 (z − ck ) + ak }, z = ck , . . . ,C,

under the initial conditions

G0 (0) = 0, G0 (z) = ∞ for z = 1, . . . ,C.

The optimal objective value in (1.28) is equal to

max{z : Gn (z) ≤ b}.

After calculating all values Gk (z), we can find an optimal solution, x∗ , to (1.28) by
performing the following reverse step.
Start with z ∈ arg max{q : Gn (q) ≤ b}.
For k = n, . . . , 1,
if Gk (z) = Gk−1 (z), set xk∗ = 0, otherwise, set xk∗ = 1 and z := z − ck .
The calculations by Formula (1.31) can be performed in O(nC) time using
O(nC) memory cells.
Comparing the computational complexity of both recurrence formulas, (1.30)
and (1.31), we can conclude that (1.30) should be used if b < C, otherwise we need
to use (1.31).
Example 1.3 We need to solve the problem

10x1 + 7x2 + 25x3 + 24x4 → max,


2x1 + 1x2 + 6x3 + 5x4 ≤ 7,
x1 , x2 , x3 , x4 ∈ {0, 1}.

Solution. Obviously, the optimal objective value in this problem is greater than
b = 7. Therefore, we use Formula (1.30). The calculations are presented in Table 1.2.

Table 1.2 Calculations by Formula (1.30) for Example 1.3


β F0 F1 F2 F3 F4
0 0 0 0 0 0
1 −∞ −∞ 7 7 7
2 −∞ 10 10 10 10
3 −∞ −∞ 17 17 17
4 −∞ −∞ −∞ −∞ −∞
5 −∞ −∞ −∞ −∞ 24
6 −∞ −∞ −∞ 25 31
7 −∞ −∞ −∞ 32 34
30 1 Introduction

The optimal objective value for our example problem is

max F4 (q) = F4 (7) = 34.


0≤q≤7

To find an optimal solution x∗ , we need to execute the reverse step starting with
β = 7:

F4 (7) = max{F3 (7), F3 (7 − 5) + 24} = max{32, 10 + 24} = 34


⇒ x4∗ = 1 and β = 7 − 5 = 2;
F3 (2) = F2 (2) = 10
⇒ x3∗ = 0 and β = 2;
F2 (2) = max{F1 (2), F1 (2 − 1) + 7} = max{10, 0 + 7} = 10
⇒ x2∗ = 0 and β = 2;
F1 (2) = max{F0 (2), F0 (2 − 2) + 10} = max{−∞, 0 + 10} = 10
⇒ x1∗ = 1 and β = 0.

Hence, the point x∗ = (1, 0, 0, 1)T is an optimal solution to the 0, 1-knapsack prob-
lem from Example 1.3. t
u

Example 1.4 We need to solve the problem

2x1 + x2 + 3x3 + 4x4 → max,


35x1 + 24x2 + 69x3 + 75x4 ≤ 100,
x1 , x2 , x3 , x4 ∈ {0, 1}.

Solution. First, we estimate from above the optimal objective value. To do this,
we solve the relaxation LP that is obtained from the original problem by allowing
all binary variables to take values from the interval [0, 1]. The algorithm for solving
LPs with only one constraint is very simple (see Exercise 3.4).
1. First we sort the ratios c j /a j by non-increasing:

c1 2 c4 4 c3 3 c2 1
= ≥ = ≥ = ≥ = .
a1 35 a4 75 a3 69 a2 24
2. Then we compute the solution x̂:
100 − 35 13
x̂1 = 1, x̂4 = = , x̂3 = x̂2 = 0.
75 15
As an upper bound, we take the number C = bcT x̂c = b2 + 4 · (13/15)c = 5.
Since C = 5 < 100 = b, we will use Formula (1.31). The calculations are presented
in Table 1.3.
Since G4 (5) = 99 < 100 = b, the optimal objective value to our example problem
is 5. To find an optimal solution x∗ , we need to execute the reverse step starting from
1.10 Notes 31

Table 1.3 Computations by Formula (1.31) for Example 1.4


z G0 G1 G2 G3 G4
0 0 0 0 0 0
1 ∞ ∞ 24 24 24
2 ∞ 35 35 35 35
3 ∞ ∞ 59 59 59
4 ∞ ∞ ∞ 93 75
5 ∞ ∞ ∞ 104 99

z = 5:

G4 (5) = min{G3 (5), G3 (5 − 4) + 75} = min{104, 24 + 75} = 99


⇒ x4∗ = 1 and z = 5 − 4 = 1;
G3 (1) = G2 (1) = 24
⇒ x3∗ = 0;
G2 (1) = max{G1 (1), G1 (1 − 1) + 24} = max{∞, 0 + 24} = 24
⇒ x2∗ = 1 and z = 0;
G1 (0) = G0 (0) = 0
⇒ x1∗ = 0.

Thus, the point x∗ = (0, 1, 0, 1)T is an optimal solution to the 0,1-knapsack problem
from Example 1.4. t
u

1.10 Notes

Sect. 1.2. The general linear complementarity problem is NP-hard, but a number of
special cases of this problem can be solved by simplex-like algorithms (see [95]).
Nevertheless, the formulation of the linear complementarity problem as a MIP al-
lows us to seek a solution with additional properties by assigning an appropriate
objective function.
The Karush-Kuhn-Tucker (KKT) optimality conditions for the non-linear con-
strained optimization problems can be found in almost any book on non-linear pro-
gramming, for example, in [29, 94].
Sect. 1.4. Fundamental works on systems of linear inequalities and polyhedral the-
ory are [66, 34]. Many aspects of polyhedral theory related to optimization are also
presented in [122, 102].
Sect. 1.5. The importance of strong MIP formulations was not recognized imme-
diately. In the literature, say, thirty years ago, we can find MIP formulations that
32 1 Introduction

are considered weak (bad) today. The principles, on which strong formulations are
build, are discussed in the reviews [141, 144].
Sect. 1.7.1. A single-product lot-sizing model with unlimited production capacities,
as well as a dynamic programming algorithm for its solution are described in [135].
Approximate and extended formulations for various lot-sizing problems are dis-
cussed in [133]. Inequalities (1.21), complementing Formulation (1.20) to the ideal
one, were obtained in [17].
Sect. 1.8. The MIP formulations are known for a wide variety of scheduling prob-
lems [114, 2]. The formulations with indexing by time were first introduced in [47].
The handbook [111] provides an up-to-date coverage of theoretical models and prac-
tical applications of modern scheduling theory.
Sect. 1.9. The classic work on using dynamic programming to solve the knapsack
problems is [56]. The idea to change the roles of the objective and the constraint,
which allowed us to write Formula (1.31), was proposed in [75]. For the complexity
of the knapsack problems, see, for example, [110, 109].
Sect. 1.11. Theorem 1.4 was proved in [73] (see also more accessible sources [110,
122]). The result of Exercise 5.8 was obtained in [145].

1.11 Exercises

1.1. Describe the following sets by systems of linear inequalities, introducing, where
necessary, binary variables:
a) X1 = {x ∈ R : |x| ≥ a};
b)  b) \ {x̄}, where x̄ ∈ P(A, b);
X2 = P(A,
X3 = x ∈ R3 : x3 = min{x1 , x2 }, 0 ≤ x1 , x2 ≤ d ;

c)
d) X4 = {(x, y) ∈ {0, 1}n × {0, 1}m : ∑nj=1 x j ≥ a ⇒ ∑m
i=1 yi ≥ b}.

1.2. The sum of the largest components of a vector. Consider the set
r
def
Xr,α = {x ∈ Rn : ∑ xπ x (i) ≤ α},
i=1

where r is an integer (1 ≤ r ≤ n), α is a real number, and the permutation π x orders


the components of a real n-vector x by non-increasing: xπ x (1) ≥ xπ x (2) ≥ · · · ≥ xπ x (n) .
For example, if a vector x represents an investment portfolio (xi is the share of
asset i), then the requirement to diversify the investments so that not more than
80% of the budget can be invested in any 10% of assets is written as the inclusion
x ∈ Xd0.1ne,0.8 .
Prove that the set Xr,α is described by the system

xi1 + · · · + xir ≤ α, 1 ≤ i1 < i2 < · · · < ir ≤ n,

of n!/(r!(n − r)!) inequalities.


1.11 Exercises 33

Let us also note that a compact extended formulation for the set Xr,α is given in
Exercise 3.3.
1.3. Sudoku is a popular logic puzzle. An n × n-matrix A is formed from an m × m-
matrix by replacing its elements for m × m-matrices, which we call blocks. So, n =
m2 . Some positions in A are filled with some numbers:

ai j = āi j for (i, j) ∈ E ⊆ {1, . . . , n}2 .

It is necessary to fill all empty positions with numbers from 1 to n so that each row,
column and block does not contain the same numbers. An example of such a puzzle
for m = 3 is presented below, where a puzzle is on the left, and its solution is on the
right.
7 8 5 2 6 4 7 8 1 5 2 3 9
8 6 4 5 8 9 3 6 2 4 7 1 5
1 9 8 2 1 5 3 9 7 4 8 6
4 2 8 9 7 4 3 1 2 8 9 6 5 7
7 2 6 4 5 3 8 9 1
5 7 6 1 2 5 8 9 7 6 1 3 4 2
7 3 6 1 7 4 9 3 2 5 6 8
3 1 6 4 3 5 8 1 7 6 9 2 4
2 5 8 1 9 6 2 5 4 8 1 7 3
Formulate an IP to find a solution to the Sudoku puzzle so that the sum of the
diagonal elements of the resulting matrix A is maximum.
Hint. Use binary variables xi jk with xi jk = 1 if k is written into position (i, j).
1.4. Show that the IP

max{cT x : Ax ≤ b, x ∈ {0, 1}n }

is equivalent to the following quadratic programming problem

max cT x − MxT (e − x) : Ax ≤ b, x ∈ [0, 1]n ,




where M is a sufficiently large number. Estimate the value of M for given integer-
valued A, b and c?
1.5. Consider the quadratic knapsack problem
n n j−1
∑ c jx j + ∑ ∑ ci j xi x j → min,
j=1 j=2 i=1
n
∑ a j x j ≥ b, x ∈ {0, 1}n .
j=1

Introducing binary variables yi j to represent the products xi x j , formulate this prob-


lem as an IP.
34 1 Introduction

1.6. The binary classification problem is among the most important problems in
machine learning. We are given a set {x1 , . . . , xk } of ”positive” points from Rn , and
a set {y1 , . . . , x̃l } of ”negative” points also from Rn . Ideally, we would like to find
a hyperplane H(a, 1), which is called a linear classifier, that separates positive and
negative points:
aT xi ≤ 1, i = 1, . . . , k,
(1.32)
aT y j > 1, j = 1, . . . , l.
In most practical cases this is impossible, and therefore we are looking for a hy-
perplane that minimizes some ”classification error”. Intuitively, the most natural
measure for the classification error is the number of points, both positive and nega-
tive, that are on the ”wrong” side of the hyperplane H(a, 1). With this measure, we
need to find a vector a ∈ Rn that violates the minimum number of inequalities in
(1.32)6 . Formulate this problem as a MIP.
1.7. Consider a bimatrix game in which the gains of the first and second players
are given by two matrices A = [ai j ]m×n and B = [bi j ]m×n . A pair of vectors (mixed
strategies) (p, q) ∈ Rm × Rn is a Nash equilibrium if it satisfies the following con-
straints:
n m n
∑ ai j q j ≤ ∑ ∑ ai j pi q j , i = 1, . . . , m,
j=1 i=1 j=1
m m n
∑ ai j pi ≤ ∑ ∑ ai j pi q j , j = 1, . . . , n,
i=1 i=1 j=1
m n
(1.33)
∑ pi = 1, ∑ q j = 1,
i=1 j=1

pi ≥ 0, i = 1, . . . , m,
q j ≥ 0, j = 1, . . . , n.
a) Prove that for any solution (p, q) to (1.33) the following complementary slack-
ness conditions are valid:
!
m n n
pi ∑ ∑ ai j pi q j − ∑ ai j q j = 0, i = 1, . . . , m,
i=1 j=1 j=1
!
m n m
qj ∑ ∑ ai j pi q j − ∑ bi j pi = 0, j = 1, . . . , n.
i=1 j=1 i=1

b) Defining

U1 = max ai j − min ai j and U2 = max bi j − min bi j ,


1≤i≤m, 1≤i≤m, 1≤i≤m, 1≤i≤m,
1≤ j≤n 1≤ j≤n 1≤ j≤n 1≤ j≤n

we formulate the following MIP:


6 Since this optimization problem is NP-hard, in practice other measures of the classification error
are used, which are less adequate but easier to optimize.
1.11 Exercises 35

w → max, (1.34a)
pi + xi ≤ 1, i = 1, . . . , m, (1.34b)
n
−U1 xi ≤ ∑ ai j q j − v1 ≤ 0, i = 1, . . . , m, (1.34c)
j=1

q j + y j ≤ 1, j = 1, . . . , n, (1.34d)
m
−U2 y j ≤ ∑ ai j pi − v2 ≤ 0, j = 1, . . . , n, (1.34e)
i=1
m n
∑ pi = 1, ∑ q j = 1, (1.34f)
i=1 j=1

pi ≥ 0, xi ∈ {0, 1}, i = 1, . . . , m, (1.34g)


q j ≥ 0, y j ∈ {0, 1}, j = 1, . . . , n, (1.34h)
w ≤ v1 , w ≤ v2 . (1.34i)

Let (p∗ , q∗ , x∗ , y∗ , v∗1 , v∗2 , w∗ ) be an optimal solution to (1.34). Using the statement
of item a), prove that (p∗ , q∗ ) is a Nash equilibrium such that the minimum, w∗ =
min{v∗1 , v∗2 }, of the gains of both players is maximum.
1.8. Solve the following knapsack problems:

a) 15x1 + 19x2 + 24x3 + 27x4 → max,


2x1 + 3x2 + 4x3 + 5x4 ≤ 7,
x1 , x2 , x3 , x4 ∈ Z+ ;
b) 12x1 + 5x2 + 17x3 + 9x4 + 7x5 → max,
3x1 + 5x2 + 2x3 + 4x4 + 6x5 ≤ 9,
x1 , x2 , x3 , x4 , x5 ∈ {0, 1};
c) x1 + 2x2 + 3x3 + 2x4 + 3x5 → max,
12x1 + 9x2 + 15x3 + 11x4 + 6x5 ≤ 29,
x1 , x2 , x3 , x4 , x5 ∈ {0, 1}.

1.9. Consider the single-product lot-sizing problem from Sect. 1.7.1. Let H(t) de-
note the cost of the optimal solution to the subproblem in which the number of
periods in the planning horizon is t (0 ≤ t ≤ T ). Let us introduce the notations
T t t
def def def
wt = ct + ∑ hτ , dτt = ∑ dk , Ĥ(t) = H(t) + ∑ hτ d1τ .
τ=t k=τ τ=1

a) Prove the validity of the following recurrence formula:

Ĥ(0) = 0,
(1.35)
Ĥ(t) = min {Ĥ(τ − 1) + fτ + wτ dτt }, t = 1, . . . , T.
1≤τ≤t
36 1 Introduction

b) Using (1.35), solve the example of the lot-sizing problem with the following
parameters: T = 4, d = (2, 4, 4, 2)T , c = (3, 2, 2, 3)T , h = (1, 2, 1, 1)T and f =
(10, 20, 16, 10)T .

1.10. Aggregation of systems of linear equations with integer coefficients. Consider


the system of equations
n
∑ ai j x j = bi , i = 1, 2, (1.36)
j=1

with non-negative integer coefficients ai j , b1 , and b2 . Prove the following theorem.


Theorem 1.2. Let λ1 and λ2 be coprime integers, λ1 does not divide b2 , and λ2
does not divide b1 . If λ1 > b2 − a min and λ2 > b1 − amin , where amin is the mini-
mum nonzero number among the coefficients ai j , then the set of non-negative integer
solutions to (1.36) coincides with the set of non-negative integer solutions to their
linear combination
n
∑ (λ1 a1 j + λ2 a2 j )x j = λ1 b1 + λ2 b2 .
j=1

1.11. A matrix is said to be totally unimodular if all its minors (determinants of


square submatrices) are 0 or ±1. Prove the following statement.
Theorem 1.3. The polyhedron P(A, b) is integer (all its vertices are integer vectors)
for any integer vector b if and only if the matrix A is totally unimodular.

1.12. Justify the following criterion.


Theorem 1.4. An integer matrix with elements 0, ±1 is totally unimodular if each of
its columns contains at most two non-zero elements, and its rows can be partitioned
into two subsets in such a way that: (1) if both non-zero elements of some column
are of the same sign, then these elements are in the rows from different subsets; (2)
if non-zero elements of some column have different signs, then these elements are in
the rows of the same subset.
Chapter 2
MIP Models

The most reliable way to learn how to formulate complex practical problems as
MIPs is to study those MIP models which now are regarded as classical. In this
chapter, we consider examples of MIP formulations for various practical applica-
tions. Not all of our formulations are the strongest because in some cases to elab-
orate a strong formulation, we must conduct an in-depth analysis of the structure
of the problem being solved in order to take into account its specific features. Nev-
ertheless, each of the models studied here can be a starting point for developing a
solution tool for the corresponding practical application. Let us also note that many
practical MIP applications are also discussed in the other chapters. In addition, a lot
of applications are presented in the exercises.

2.1 Set Packing, Partitioning, and Covering Problems

Given a finite set S and a family E = {S1 , . . . , Sn } of its subsets. Often the pair
H = (S, E ) is called a hypergraph. By analogy with graphs, the elements of the
set S are called vertices, and the subsets in E are called hyperedges. A subset of
hyperedges J ∈ E is called a packing if every vertex from S belongs to at most
one hyperedge from J . If every vertex of S belongs to exactly one hyperedge from
J , then J is called a partition. Finally, if every vertex in S belongs to at least one
hyperedge from J , then J is called a cover.
Let us assign to each hyperedge S j its cost c j . The set packing problem is to find
a packing with the maximum total cost of its hyperedges. The set covering (resp.,
set partitioning) problem is to find a cover (resp., partition) with the minimum total
cost of its hyperedges. Note that if H is a graph (all sets S j contain two elements),
then the set packing problem is known as the weighted matching problem.
To simplify the exposition, we assume that S = {1, . . . , m}. The incidence matrix
A of the hypergraph H = (S, E ) has m rows, n columns, and its element ai j equals 1
if i ∈ S j , and ai j = 0 otherwise. For j = 1, . . . , n, we introduce a binary variable x j
that takes the value of 1 if hyperedge j is in the packing, partitioning or covering.

37
38 2 MIP Models

Now all three problems are very simply formulated as IPs:

(set packing problem) max{cT x : Ax ≤ e, x ∈ {0, 1}n }, (2.1)


(set partitioning problem) min{cT x : Ax = e, x ∈ {0, 1}n }, (2.2)
T n
(set covering problem) min{c x : Ax ≥ e, x ∈ {0, 1} }. (2.3)

Here e is the vector of size m with all components equal to 1.

Forming a Team of Performers

There are n candidates to participate in some project consisting of m jobs. Candi-


date j can execute a subset of jobs, S j , and wants to get c j for this. A team is a set of
workers that, for each project job, has at least one member able to perform this job.
We need to form a team from existing candidates spending the minimum amount of
money.
Obviously, this is a set partitioning problem that is given on the hypergraph H =
(S, {S1 , . . . , Sn }) with S = {1, . . . , m}.

Crew Scheduling

A number of flight legs are given (taken from the time table of some company). A
leg is a flight taking off from its departure airport at some time and landing later at
its destination airport. The problem is to partition these legs into routes, and then
assign exactly one crew to each route. A route is a sequence of flight legs such that
the destination of one leg is the departure point of the next, and the destination of
the last leg is the departure point of the first leg. For example, there might be the
following short rout:
• leg 1: from Paris to Berlin departing at 9:20 and arriving at 10:50;
• leg 2: from Berlin to Rom departing at 12:30 and arriving at 14:00;
• leg 3: from Rom to Paris departing at 16:30 and arriving at 18:00.
A schedule is good if the crews spend flying as much of the elapsed time as it
is possible subject to the safety regulation and contract terms are satisfied. These
terms regulate the maximum number of hours a pilot can fly in a day, the maximum
number of days before returning to the base, and minimum overnight rest times. The
cost of a schedule depends on several of its attributes, and the wasted times of all
crews is the main one.
In practice, the crew scheduling problem is solved in two stages. Let all the legs
be numbered from 1 to m. First, a collection of n reasonable routes S j ⊂ {1, . . . , m}
(that meet all the constraints mentioned above) are selected, and the cost c j of each
route j is calculated. This route-selection problem is far from being trivial, but it is
not our main interest here.
2.2 Service Facility Location 39

Given the set of potential routes, E = {S1 , . . . , Sn }, the second stage is to identify
a subset of them, J ⊆ E, so that each leg is covered by exactly one route, and the
total cost of all routes in J is minimum. Of course, this second stage problem is a
set partitioning problem

Combinatorial Auctions

At an auction, m objects are put up for sale. Suppose that in a certain round of trades,
the auctioneer received n bids from the buyers. The difference between the combi-
natorial auction and the usual one is that any buyer in its bid is allowed to value not
only a single object, but also any group of objects. Therefore, each bid j is described
by a pair (S j , c j ), where S j is a subset of objects for which the buyer who submitted
the bid agrees to pay c j . Naturally, no object can be sold twice. Therefore, two bids,
(S j1 , c j1 ) and (S j2 , c j2 ) such that S j1 ∩ S j2 6= 0/ cannot be satisfied simultaneously.
The auctioneer must decide which of the bids to satisfy so that the seller’s profit is
maximum.
Clearly, here we have a set packing problem.

2.2 Service Facility Location

The problem of facility location is critical to a company’s eventual success. Com-


panie’s decisions on locating its services are guided by variety of criteria. Locating
service centers close to the customers is especially important because this enables
faster delivery goods and services.
Given a set of customer locations N = {1, . . . , n} with b j customers at location
j ∈ N, a set of potential sites M = {1, . . . , m} for locating service centers. For each
i ∈ M, we know a fixed cost fi of locating a center at site i, a capacity ui of the
center at site i, and a cost ci j of serving customer j from site i during some planning
horizon. The facility location problem (FLP) is to decide where to locate service
centers so that to minimize the total cost of locating centers and serving customers.
Choosing the following decision variables
• yi = 1 if a service center is located at site i, and yi = 0 otherwise,
• xi j : number of customers at location j served from the service center established
at site i,
we formulate the FLP as follows:
40 2 MIP Models
m n m
∑ ∑ ci j xi j + ∑ fi yi → min, (2.4a)
i=1 j=1 i=1
m
∑ xi j = b j , j = 1, . . . , n, (2.4b)
i=1
n
∑ xi j ≤ ui yi , i = 1, . . . , m. (2.4c)
j=1

xi j ≤ min{ui , b j }yi , i = 1, . . . , m, j = 1, . . . , n, (2.4d)


yi ∈ {0, 1}, i = 1, . . . , m, (2.4e)
xi j ∈ Z+ , i = 1, . . . , m, j = 1, . . . , n. (2.4f)

Objective (2.4a) is to minimize the total cost of locating centers and serving cus-
tomers. Let us note, that if all ci j = 0 and all fi = 1, then the objective is to minimize
the number of service centers needed to serve all customers. Equations (2.4b) insure
that each customer is served. Inequalities (2.4c) reflect the capacity limitations: if
a service center is established at site i (yi = 1), then at most ui customers can be
served from this site i. Inequalities (2.4d), which are logically implied by (2.4c) and
therefore are redundant, are introduced to strengthen the formulation.

2.3 Portfolio Management: Index Fund

Portfolio optimization is the problem of investing a given capital in a number of


securities in order to maximize the return with a limited ”risk.”
There are two orthogonal portfolio management strategies: active and passive.
When an active strategy is used, the methods of analysis and forecasting are used to
achieve the required level of efficiency. In contrast, any passive strategy advises not
to rely on forecasts, but to diversify investments to minimize the risk. The goal is
to create and maintain a portfolio that reflects changes in a broad market population
(or market index). Such a portfolio is called an index fund.
Formation of the index fund begins with the choice of a broad market index as
an approximation of the entire market, for example, the Standard and Poor’s List of
500 stocks (S&P 500). In its pure form, the index approach consists in buying all
assets in the same proportions as they are present in the index. In practice, this is
difficult or even impossible to accomplish. Therefore, the market index is aggregated
by a relatively small index fund of shares of not more than q types, where q is
substantially smaller than the number of all types of shares in the index. Such an
approach does not necessarily lead to the formation of an optimal portfolio relative
to the return/risk ratio.
The input data for the model are given by an n × n-matrix [ρi j ], whose element
ρi j estimates the ”similarity” between the shares i and j (ρi j is smaller for more
similar shares). For example, we can estimate the coefficients ρi j by the returns of
2.3 Portfolio Management: Index Fund 41

shares for T previous periods. Let Ri (t) be the return (per one enclosed dollar) of
share i in period t. Then we can calculate
T
ρi j = ∑ pT −t (Ri (t) − R j (t))2 ,
t=1

where p ∈ (0, 1] is a discount factor, which is introduced to increase the significance


of recent periods in comparison with the early ones.
It is necessary to determine what stocks and in what proportions should be
present in the portfolio. The first step in solving this problem is to select shares
for including in the index fund. We introduce the following variables:
• yi = 1 if share i is in the index fund, and yi = 0 otherwise;
• xi j = 1 if share i represent share j in the index fund, and xi j = 0 otherwise.
Now the problem of forming the index fund is written as follows:
n n
∑ ∑ ρi j xi j → min, (2.5a)
i=1 j=1
n
∑ yi ≤ q, (2.5b)
i=1
n
∑ xi j = 1, j = 1, . . . , n, (2.5c)
i=1
xi j ≤ yi , i, j = 1, . . . , n, (2.5d)
xi j ∈ {0, 1}, i, j = 1, . . . , n, (2.5e)
yi ∈ {0, 1}, i = 1, . . . , n. (2.5f)

Objective (2.5a) is to build an index fund that most accurately represents the
market index. Inequality (2.5b) do not allow the index fund to contain more than q
shares. Equalities (2.5c) require that each share of the market index be represented
by a share from the index fund. Inequalities (2.5d) do not allow the shares that are
not in the index fund to represent other shares.
This may seem strange, but it turns out that (2.5) is a special case of (2.4), which
is an IP formulation of the problem of locating service centers, when m = n, ci j =
ρi j , bi = n, fi = 0. It worth noting that such a coincidence of the formulations of
seemingly completely different problems are encountered quite often.
When (2.5) is solved and the set of shares in the index fund, I = {i : yi = 1}, is
known, we can proceed with the formation of the portfolio. First we calculate the
def
weights wi = ∑nj=1 V j xi j of all shares i ∈ I, where V j is the total value of all shares
of type j in the market index. In other words, wi is the total market value of all
shares represented by share i in the index fund. Therefore, the proportion of capital
invested in each share i from the index fund (i ∈ I) must be equal to wi /(∑k∈I wk ).
42 2 MIP Models

2.4 Multiproduct Lot-Sizing

We need to work out an aggregate production plan for n different products processed
on a number of machines of m types for a planning horizon that extends over T
periods.
Inputs parameters:
• lt : duration (length) of period t;
• mit : number of machines of type i available in period t;
• fit : fixed cost of producing on one machine of type i in period t;
• Timin , Timax : minimum and maximum working time of one machine of type i;
• c jt : per unit production cost of product j in period t;
• h jt : inventory holding cost per unit of product j in period t;
• d jt : demand for product j in period t;
• ρ jk : number of units of product j used for producing one unit of product k;
• τi j : per unit production time of product j on machine of type i;
• sij : initial stock of product j at the beginning of the planning horizon.
• s fj : final stock of product j at the end of the planning horizon.
A production plan specifies on which machines and in which quantities each
product is produced in each of the periods. The goal is to determine a production
plan that can be implemented on existing equipment and the total production and
inventory cost is minimum.
Let us introduce the following variables:
• x jt : amount of product j produced in period t;
• s jt : amount of product j in stock at the end of period t;
• yit : number of machines of type i working in period t.
Now we formulate the problem as follows:
T n T m
∑ ∑ (h jt s jt + c jt x jt ) + ∑ ∑ fit yit → min, (2.6a)
t=1 j=1 t=1 i=1
n
sij + x j1 = d j1 + s j1 + ∑ ρ jk xk,1 , j = 1, . . . , n, (2.6b)
k=1
n
s j,t−1 + x jt = d jt + s jt + ∑ ρ jk xk,t , j = 1, . . . , n, t = 2, . . . , T, (2.6c)
k=1
n
∑ τi j x jt ≤ lt yit , i = 1, . . . , m, t = 1, . . . , T, (2.6d)
j=1

s jT = s fj , j = 1, . . . , n, (2.6e)
0 ≤ s jt ≤ u j , x jt ≥ 0, j = 1, . . . , n, t = 1, . . . , T, (2.6f)
0 ≤ yit ≤ mit , yit ∈ Z, i = 1, . . . , m, t = 1, . . . , T. (2.6g)
2.5 Balancing Assembly Lines 43

Objective (2.6a) is to minimize the total production and inventory expenses. Each
of the balance equations in (2.6b) and (2.6c) joins two adjacent periods for each of
the products: the stock in period t − 1 plus the amount of product produced in pe-
riod t equals the demand in period t plus the amount of product used when produc-
ing other products, and plus the stock in period t. Inequalities (2.6d) require that the
working times of all machines be withing given limits; besides, if machine i does not
work in period t (yit = 0), then no product is produced by this machine (all x jt = 0).

2.5 Balancing Assembly Lines

Assembly lines are special product-layout production systems that are typical for
the industrial production of high quantity standardized commodities. An assembly
line consists of a number of work stations arranged along a conveyor belt. The work
pieces are consecutively launched down the conveyor belt and are moved from one
station to the next. At each station, one or several operations, which are necessary
to manufacture the product, are performed. The operations in an assembly process
usually are interdependent, i.e., there may be precedence relations that must be en-
forced. The problem of distributing the operations among the stations with respect
to some objective function is called the assembly line balancing problem (ALBP).
We will consider here the simple assembly line balancing problem which is the core
of many other ALBPs.
The manufacturing of some product consists of a set of operations O = {1, . . . , n}.
We denote by to the processing time of operation o ∈ O. The precedence relations
between the operations are represented by a digraph G = (O, E), where (o1 , o2 ) ∈ E
means that operation o1 must be finished before operation o2 starts. Suppose that
the demand for the product is such that the assembly line must have a cycle time
C, which means that the running time of each station on one product unit must not
exceed C.
The simple assembly line balancing problem (SALBP) is to decide what is the
minimum number of stations that is enough for the line running with the given cycle
time to fulfill all the operations in an order consistent with the precedence relations.
An example of SALBP is presented in Fig. 2.1. Here we have n = 11 operations
that correspond to the vertices of the digraph representing precedence relations be-
tween these operations. The numbers over vertices are the processing times of oper-
ations.
To formulate SALBP as an IP, we need to know an upper bound, m, on the num-
ber of needed stations. In particular, we can set m to be the the number of station in
a solution build by one of the heuristics developed for solving SALBPs.
For example, let us consider the heuristic that assigns operations, respecting
precedence relations, first to Station 1, then to Station 2, and so on until all the
operations are assigned to the stations. If we apply this heuristic to the example of
Fig. 2.1 when the cycling time is C = 45, we get the following assignment:
• operations 1 and 2 are accomplished by station 1,
44 2 MIP Models

13
25 9 : 9m
11 
2m - 4m - 7m


XX 8 @
z 8m @
X
X
HH@
16 13 7 R15 9
1m - 3m - 6m m - 11m
H
j
H@
: 10
-

H
HH 8 
j m
H 
5 

Fig. 2.1 Example of SALBP

• operations 3, 4, 5, and 6 by station 2,


• operations 7, 8, and 9 by station 3,
• operations 10 and 11 by station 4.
So in this example we can set m = 4. Let us also note that this heuristic solution is
not optimal as there exists an assignment that uses only three stations:
• operations 1, 3, 5, and 6 are accomplished by station 1,
• operations 2, 4, and 7 by station 2,
• operations 8, 9, 10, and 11 by station 3.
To write an IP, we introduce the following variables:
• ys = 1 if station s is open (in use), ys = 0 otherwise;
• xso if operation o is assigned to station s, xso = 0 otherwise;
• zo = s if operation o is assigned to station s.
With these variables the formulation of SALBP is as follows:
m
∑ ys → min (2.7a)
s=1
m
∑ xso = 1, o = 1, . . . , n, (2.7b)
s=1
n
∑ to xso ≤ Cys , s = 1, . . . , m, (2.7c)
o=1
m
∑ s xso = zo , o = 1, . . . , n, (2.7d)
s=1
zo1 ≤ zo2 , (o1 , o2 ) ∈ E, (2.7e)
ys−1 ≥ ys , s = 2, . . . , m, (2.7f)
xso ≤ ys , o = 1, . . . , n, s = 1, . . . , m, (2.7g)
xso ∈ {0, 1}, s = 1, . . . , m, o = 1, . . . , n, (2.7h)
ys ∈ {0, 1}, s = 1, . . . , m, (2.7i)
zo ∈ R+ , o = 1, . . . , n. (2.7j)
2.6 Electricity Generation Planning 45

Objective (2.7a) is to minimize the number of open stations. Equations (2.7b) re-
quire that each operation be assign to exactly one station. Inequalities (2.7c) reflect
the capacity restrictions inducing that the total running time of each open stations
does not exceed the cycle time. Equations (2.7d) establish the relation between the
assignment variables, binary x and integer z. Each precedence relation constraint
in (2.7e) require that, for a pair (o1 , o2 ) ∈ E of related operations, operation o1 be
assigned to the same or an earlier station than operation o2 ; this guaranties that op-
eration o1 is finished before operation o2 starts. Inequalities (2.7f) and (2.7g) ensure
that earlier stations are opened first.

2.6 Electricity Generation Planning

The unit commitment problem is to develop an hourly (or half-hourly) electricity


production schedule spanning some planning horizon (a day or a week) so as to
decide which generators will be producing and at what levels. The very essence of
this problem is to appropriately balance of using generators with different capacities:
it is cheaper to produce electricity on more powerful generators while less powerful
and smaller generators take less time to switch on or off in case of necessity.
Let T be the number of periods in the planning horizon. Period 1 follows period
T . We know the demand dt for electricity in each period t. It is required that, in each
period, the total capacity of all active generators be at least q times more than the
demand (q is a level of reliability).
Let n be the number of generators, and let generator i have the following charac-
teristics:
• li , ui : minimum and maximum per period production levels (capacities);
• ri1 , ri2 : ramping parameters (when a generator is on in two successive periods, its
output cannot decrease by more than ri1 , and increase by more than ri2 );
• gi : start-up cost (if a generator is off in some period, it costs gi to start it in the
next period);
• fi , pi : fixed and variable costs (if in some period a generator is producing at level
v, it costs fi + pi v).
With a natural choice of variables
• xit = 1 if generator i produces in period t, and xit = 0 otherwise,
• zit = 1 if generator i is switched on in period t, and zit = 0 otherwise,
• yit : amount of electricity produced by generator i in period t,
we write down the following formulation:
n T
∑ ∑ (gi zit + fi xit + pi yit ) → min (2.8a)
i=1 t=1
n
∑ yit = dt , t = 1, . . . , T, (2.8b)
i=1
46 2 MIP Models
n
∑ ui xit ≥ q dt , t = 1, . . . , T, (2.8c)
i=1
li xit ≤ yit ≤ ui xit , i = 1, . . . , n, t = 1, . . . , T, (2.8d)
−ri1 ≤ yit − yi,((t−2+T ) mod T )+1 ≤ ri2 , i = 1, . . . , n, t = 1, . . . , T, (2.8e)
xit − xi,((t−2+T ) mod T )+1 ≤ zit , i = 1, . . . , n; t = 1, . . . , T, (2.8f)
zit ≤ xit , , i = 1, . . . , n, t = 1, . . . , T, (2.8g)
xit , zit ∈ {0, 1}, i = 1, . . . , n, t = 1, . . . , T, (2.8h)
yit ∈ R+ , i = 1, . . . , n, t = 1, . . . , T. (2.8i)

Objective (2.8a) is to minimize the total (for all n generators over all T periods)
cost of producing electricity plus the sum of start-up costs. Equations (2.8b) guar-
antee that, for each period, the total amount of electricity produced by all working
generators meets the demand at that period. Inequalities (2.8c) require that, in any
period, the total capacity of all working generators be at least q times more than the
demand at that period. The lower and upper bounds in (2.8d) impose the capacity
restrictions for each generator in each period; simultaneously, these constrains en-
sure that non-working generators do not produce electricity. Two-sided inequalities
(2.8e) guarantee that generators cannot increase (ramp up) or decrease (ramp down)
their outputs by more than the values of their ramping parameters. Let us note that
period ((t − 2 + T ) mod T ) + 1 is immediately followed by period t. Inequalities
(2.8f) and (2.8g) reflect the fact that any generator is working in a given period only
if it has been switched on in this period or it was working in the preceding period.

2.7 Designing Telecommunication Networks

Let us denote by V the set of nodes (subscribers) of a telecommunications network to


be designed, and let E denote the set of all pairs of nodes that exchange information.
The pair G = (V, E) is called a demand graph. Note that this logical graph is not
a representation of the structure of the designed network. For each demand e =
(i, j) ∈ E, we know the intensity of communications, de , between nodes i and j.
In the designed telecommunications network, one can set an unlimited number of
identical rings with a capacity (bandwidth) of U. Every demand must be assigned
entirely to one ring. The sum of the intensities of all demands assigned to one ring
must not exceed the capacity of the ring. If a demand e = (i, j) is assigned to a ring,
then two multiplexers, each of cost α, must be installed on this ring, one at node i
and the other at node j. Each multiplexer has S slots for installing special cards.
There are T different types of cards, a card of type t has a bandwidth of ct and it
costs βt . Our goal is to assign demands to the rings in such a way that all the above
requirements are met, and the total cost of all installed multiplexers and cards is
minimum.
2.8 Placement of Logic Elements on the Surface of a Crystal 47

Let R be an upper bound on the number of rings to be established. We use the


following variables:
• yer = 1 if demand e is assigned to ring r, and yer = 0 otherwise;
• xir = 1 if a multiplexer is installed at node i on ring r, and xir = 0 otherwise;
• zirt : number of cards of type t used at node i on ring r.
Now we can write the model:
R R T
α ∑ ∑ xir + ∑ ∑ ∑ βt zirt → min, (2.9a)
r=1 i∈V r=1 i∈V t=1
R
∑ yer = 1, e ∈ E, (2.9b)
r=1

∑ de yer ≤ U, r = 1, . . . , R, (2.9c)
e∈E
T
∑ de yer ≤ ∑ ct zirt , i ∈ V, r = 1, . . . , R, (2.9d)
e∈E(i,V ) t=1
T
∑ zirt ≤ Sxir , i ∈ V, r = 1, . . . , R, (2.9e)
t=1
yer ≤ xir , yer ≤ x jr , e = (i, j) ∈ E, r = 1, . . . , R, (2.9f)
xir ∈ {0, 1}, i ∈ V, r = 1, . . . , R, (2.9g)
yer ∈ {0, 1}, e ∈ E, r = 1, . . . , R, (2.9h)
zirt ∈ Z+ , i ∈ V, r = 1, . . . , R, t = 1, . . . , T. (2.9i)

Objective (2.9a) is to minimize the total cost of installed multiplexers and cards.
Equations (2.9b) assign every demand to exactly one ring. Inequalities (2.9c) guar-
antee that the bandwidth of any ring is not exceeded. Similar inequalities (2.9d)
require that the bandwidth of any multiplexer, which is the sum of the capacities of
all cards installed on the multiplexer, be not exceeded. Here E(i,V ) stands for the
set of edges e = (i, j) ∈ E incident to node i. Inequalities (2.9e) do not allow us to
insert more cards into any multiplexer than there are slots, and they also prevent the
insertion of cards into slots of not used multiplexers (if xir = 0, then zirt = 0 for all
t = 1, . . . , T ). Finally, each pair of inequalities in (2.9f) implies that, if a demand
e = (i, j) is assigned to ring r (yer = 1), then the multiplexers must be installed on
this ring at both nodes i and j (xir = x jr = 1).

2.8 Placement of Logic Elements on the Surface of a Crystal

We have a set C of logic elements (gates) that implement basic boolean functions.
Geometrically the surface of a crystal of size k × q can be considered as a rectan-
48 2 MIP Models

gular area with the uniform rectangular grid on it (like a sheet of a school notebook
in a box). The cells on the crystal are numbered from 1 to k q. For simplicity of
exposition we will assume that each gate can be placed into one cell of the crystal,
i.e., each gate can be considered as a unit square1 . The gates are connected to each
other by signal circuits (hereinafter simply ”circuits”). Any circuit n ∈ N is given
as a subset N (n) ⊆ C of gates, which it connects. Our goal is to place the set of
gates, C , into a subset of cells, I ⊆ {1, . . . , k q} (|I | ≥ |C |), so that to minimize
the sum of the semiperimeters of the minimal rectangles bounding the circuits.
The problem of placing a set of gates on the surface of a crystal is very complex
and usually it is solved in two stages. At the stage of global placement, the crystal
surface is divided into a set of disjoint rectangles, each of which is assigned a sub-
set of gates (without specifying specific positions). At the stage of detailed (local)
placement, for each of these rectangles, it is necessary to solve the placement prob-
lem with an indication of exact positions of all the gates. Here we will consider only
the problem of detailed placement.
Input data:
• C : set of gates;
• I : subset of crystal cells;
• N : set of circuits;
• N (n) ⊆ C : set of gates that are connected by circuit n ∈ N ;
• ai : distance from the center of cell i ∈ I to the crystal left side;
• bi : distance from the center of cell i ∈ I to the crystal top side.
Let us introduce the following variables:
• zci = 1 if gate c ∈ C is placed into cell i, and zci = 0 otherwise;
• xnmin , xnmax , ymin max min min max max
n , yn : the pairs (xn , yn ) and (xn , yn ) are the coordinates
of the left-bottom and right-top corners of the minimal rectangle that contains all
gates c ∈ N (n).
In these variables our IP formulation is as follows:

∑ (xnmax − xnmin +ymax min


n − yn ) → min, (2.10a)
n∈N

∑ zci ≤ 1, i∈I, (2.10b)


c∈C

∑ zci = 1, c ∈ C, (2.10c)
i∈I

∑ ai zci ≥ xnmin , c ∈ N (n), n ∈ N , (2.10d)


i∈I

∑ ai zci ≤ xnmax , c ∈ N (n), n ∈ N , (2.10e)


i∈I

∑ bi zci ≥ ymin
n , c ∈ N (n), n ∈ N , (2.10f)
i∈I

1In rare cases, when the gate occupies several cells, it can be represented as several gates that are
unit squares which should be placed in an adjacent manner.
2.9 Assigning Aircrafts to Flights 49

∑ bi zci ≥ ymax
n , c ∈ N (n), n ∈ N , (2.10g)
i∈I
zci ∈ {0, 1}, c ∈ C , i ∈ I , (2.10h)
xnmin , xnmax , ymin max
n , yn ≥ 0, n ∈ N . (2.10i)

Objective (2.10a) is to minimize the sum of the semiperimeters of the rectangles


that frame the circuits. Inequalities (2.10b) do not allow us to place two gates into
the same cell. Equations (2.10c) ensure that each gate will be placed exactly into
one cell. For each circuit n ∈ N , four inequalities in (2.10d)–(2.10g) determine the
coordinates of the left-bottom, (xnmin , ymin max max
n ), and the right-top, (xn , yn ), corners
of the rectangle containing all gates of circuit n.

2.9 Assigning Aircrafts to Flights

The flight schedule of even an average airline is huge and usually is stored in a
database in which the information is presented in a form similar to that shown in
Table. 2.1. In this particular example, we see that there is a flight from XYZ airport

Table 2.1 Flight Schedule


Departure Arrival Type Cost
Flight
airport time airport time airplane
201 XYZ 6:55 ZYX 9:45 734 8570
201 XYZ 6:55 ZYX 9:45 757 12085
201 XYZ 6:55 ZYX 9:45 767 13095
202 ZYX 11:05 XYZ 14:00 734 8570
202 ZYX 11:05 ZYZ 14:00 757 12085
202 ZYX 11:05 XYZ 14:00 767 13095

to ZYX airport departing at 6:55 and arriving at 9:45. This flight can be performed
by Boeing-734, Boeing-757, or Boeing-767 aircrafts with the flight costs of $8570,
$12085, or $13095, respectively.
Any feasible assignment of aircraft to flights obeys the following requirements:
• no more aircrafts of each type can be used than there is in stock;
• aircrafts arriving at the airport must either fly away or remain on the ground;
• aircrafts must depart from the airports where they landed earlier.
The problem of assigning aircrafts to flights is to find a feasible assignment of min-
imum cost.
Input data:
• n: number of flights;
50 2 MIP Models

• m: number of aircraft types;


• qi : number of planes of type i;
• T j : set of aircrafts suitable for flight j;
• ci j : cost of flight j performed by a plane of type i;
• l: number of airports;
• rk : number of events at airport k; any event e corresponds to the time moment
t(k, e) when at least one company‘s plane is landing or taking off at the airport;
we assume that the events are numbered from 0 to rk − 1, and they follow accord-
ing to the order of their occurrence starting from event 0;
• list of all n flights, in which flight j is described by the four-tuple (adj , edj ; aaj , eaj )
which means that the flight departures from airport adj at time t(adj , edj ) when
event edj occurs, and later arrives at destination airport aaj at time t(aaj , eaj ) when
event eaj occurs.
Let us introduce two families of variables:
• xi j = 1 if a plane of type i is assigned to flight j, and xi j = 0 otherwise;
• fike : number of planes of type i at airport k at time t(k, e).
In these variable our problem is formulated as follows:
n
∑ ∑ ci j xi j → min, (2.11a)
j=1 i∈T j

∑ xi j = 1, j = 1, . . . , n, (2.11b)
i∈T j

fi,k,(e+1) mod rk = fike + ∑ xi j − ∑ xi j , (2.11c)


j: i∈T j , aaj =k, eaj =e j: i∈T j , adj =k, edj =e

i = 1, . . . , m, k = 1, . . . , l, e = 0, . . . , rk − 1,
l
∑ xi j + ∑ fi,k,rk −1 ≤ qi , i = 1, . . . , m, (2.11d)
j: i∈T j , k=1
t(adj ,edj )>t(aaj ,eaj )

xi j ∈ {0, 1}, j = 1, . . . , n, i ∈ T j , (2.11e)


fike ∈ Z+ , i = 1, . . . , m, k = 1, . . . , l, e = 0, . . . , rk − 1. (2.11f)

Objective (2.11a) of this IP is to minimize the total cost of all flights. Equations
(2.11b) ensure that each flight will be assigned to exactly one aircraft type. Accord-
ing to the balance equations (2.11c), for each airport k and every event e that occurs
there, the number of aircraft of any type i at the airport until the next event is equal
to their number at time t(k, e) plus the number of aircrafts landing at time t(k, e) and
minus the number of aircrafts taking off at time t(k, e). Since (2.11c) is valid, the
number of aircraft of each type remains constant during a day. Inequalities (2.11d)
require that at midnight the total number of aircrafts of each type in the air and on
the ground be not more than their number.
2.10 Optimizing the Performance of a Hybrid Car 51

2.10 Optimizing the Performance of a Hybrid Car

A hybrid car among many other things has an internal combustion engine, a mo-
tor/generator connected to a battery, and a braking system. We will consider an
extremely simple parallel car model in which the motor/generator and the internal
combustion engine are directly connected to the driving wheels. The internal com-
bustion engine transfers mechanical energy to the wheels, and the braking system
takes away this energy from the wheels turning it into heat. The motor/generator can
work as an electric motor using the energy of the battery and feeding it to the wheels,
or as a generator when it consumes mechanical energy from the wheels or directly
from the internal combustion engine, and converts this mechanical energy into elec-
tricity charging the battery. When the generator consumes mechanical energy of the
wheels and charges the battery, it is called a regenerative brake.
A diagram illustrating energy flows in a hybrid car is presented in Fig. 2.2. Here
the arrows indicate positive directions of energy transmission. The engine power
peng is always positive and is transmitted in the direction from the engine to the
wheels. The power of the braking system, pbr , is always non-negative, and it is
positive when the car brakes. The energy consumption of the wheels, preq , is positive
when it is spent on driving the car (when the car accelerates, goes uphill or evenly
moves along the road), and preq is negative when the car brakes or goes down the
hill. We consider the motor/generator as two devices operating in turn. When the
motor is running, the energy pm is fed from it to the wheels, and when the generator
is running, it receives the energy pg from the wheels.

Engine Brakes
W
peng pbr
6 h
? - e
preq e
pm 6 pg
? l
s
Motor Generator

6
?
Battery

Fig. 2.2 Energy flows in a hybrid car

The car is tested on a track with fixed characteristics. The speed of the car on
each section of the route is predefined. Therefore, the time of passing the track is
also known and is equal to T seconds. We will build a discrete-time model with T
time intervals, each lasting one second. Because the profile of the route is known and
the speed on all route sections is also set, then it is possible to calculate the power
req
Pt required for feeding to the wheels. We also know the following parameters:
eng
• Pmax : maximum engine power;
52 2 MIP Models
g
• Pmax : maximum generator power;
• m : maximum motor power;
Pmax
• batt : maximum battery energy (charge);
Emax
• η: fraction of energy lost when converting mechanical energy into electricity,
and then into battery charge and vice versa;
• t1 ,t2 : for any time interval of continuous t2 seconds, the electric motor should run
no more than t1 seconds.
If the engine runs at a power of p, then per unit of time it consumes F(p) fuel
units. We assume that F : R+ → R+ is an increasing convex function.
For t = 1, . . . , T , we introduce the following variables:
eng
• pt : engine power in period t;
• ptm : motor power in period t;
g
• pt : generator power in period t;
• ptbr : braking system power in period t;
• Et : battery charge (energy) in period t;
• yt = 1 if the motor/generator works as a motor, and yt = 0 if the motor/generator
works as a generator.
We can determine an optimal operation mode of a hybrid car solving the follow-
ing program:
T
∑ F(pteng ) → min, (2.12a)
t=1
eng g req
pt + pt − pt − ptbr
m
= Pt , t = 1, . . . , T, (2.12b)
g
Et − (1 + η)ptm + (1 − η)pt = Et+1 , t = 1, . . . , T, (2.12c)
t
∑ yτ ≤ t1 , t = t2 , . . . , T, (2.12d)
τ=t−t2 +1

ET +1 ≥ E1 , (2.12e)
batt
0 ≤ Et ≤ Emax , t = 1, . . . , T, (2.12f)
eng eng
0 ≤ pt ≤ Pmax , t = 1, . . . , T, (2.12g)
0 ≤ ptm m
≤ Pmax yt , t = 1, . . . , T, (2.12h)
g g
0 ≤ pt ≤ Pmax , t = 1, . . . , T, (2.12i)
ptbr ≥ 0, t = 1, . . . , T, (2.12j)
yt ∈ {0, 1}, t = 1, . . . , T. (2.12k)

Objective (2.12a) is to minimize fuel consumption. Equations (2.12b) ensure that


at any time the right amount of energy will be supplied to the wheels. Each balance
equation in (2.12c) relates the battery charges for two neighboring periods. Inequal-
ities (2.12d) do not allow the electric motor to run more than t1 seconds during any
continuous time interval of t2 seconds. Inequality (2.12e) is introduced for a fair
2.11 Short-Term Financial Management 53

comparison of a hybrid car with a non-hybrid one: at the finish, the battery charge
should be no more than the battery charge at the start.
Since the objective function is nonlinear, (2.12) is not a MIP. But we can approx-
imate the convex function F with a piecewise linear function, and then, using the
method described in Sect. 1.1.4, we can represent this piecewise linear function as
linear by introducing new variables and constraints.

2.11 Short-Term Financial Management

Short-term financial management is one of the tasks of the accounting of a large


firm. If the management of finances is inefficient, the incomes will be received by
the banks, in which the funds are stored, and not by their owner. Free money should
also work. Profit can be significantly increased if the firm works actively on the
securities market.
Suppose that the planning horizon is divided into T periods of usually varying
duration, and let period T + 1 represents the end of the horizon. There are n types of
securities on the market. The company’s portfolio at the beginning of the planning
horizon is represented by a vector s of size n, where si ≥ 0 is the number of securities
of type i in the portfolio. The costs of selling and buying a security of type i in
period t are csit and cbit , respectively. Note that the values of csit and cbit can be less
than or greater than the nominal value of a security of type i.
Short-term financial sources (other than selling securities from the portfolio) are
represented by k open credit lines. The maximum amount of borrowing along line l
is ul . Loans can be obtained at the beginning of each period, and they are to be
returned after the completion of the planning horizon. To assess the effectiveness of
all borrowing, the costs, flt , are calculated, where flt is the monthly rate of interest
along credit line l multiplied by the time (in months) remaining from the beginning
of period t to the end of the planning horizon.
Exogenous (external) cash flows are given by the values dt , t = 1, . . . , T . If dt > 0
(resp., dt < 0), then the firm will receive dt (resp., pay −dt ) at the beginning of
period t. We assume that the cash reserve at the beginning of the planning horizon is
taken into account when calculating d1 . For each period t = 1, . . . , T , the minimum
cash requirement, qt , is also specified.
It is necessary to balance the cash budget in such a way to maximize the firm
”wealth” (cash plus the sale values of all securities minus the total amount of all
borrowings, taking into account interest) at the end of the planning horizon.
Let us introduce the following variables:
• xit : number of securities of type i at the end of period t;
• xits : number of securities of type i sold in period t;
• xitb : number of securities of type i purchased in period t;
• yt : amount of cash at the end of period t;
• zlt : amount of money borrowed from credit line l in period t.
54 2 MIP Models

In these variables our problem is formulated as follows:


n T k
yT + ∑ csi,T +1 xi,T − ∑ ∑ (1 + flt )zlt → max, (2.13a)
i=1 t=1 l=1
n k n
d1 + ∑ csi1 xi1
s
+ ∑ zl1 = y1 + ∑ cbi1 xi1
b
, (2.13b)
i=1 l=1 i=1
n k n
dt + yt−1 + ∑ csit xits + ∑ zlt = yt + ∑ cbit xitb , t = 2, . . . , T, (2.13c)
i=1 l=1 i=1
b s
si + xi1 − xi1 = xi1 , i = 1, . . . , n, (2.13d)
xi,t−1 + xitb − xits = xit , i = 1, . . . , n, t = 2, . . . , T, (2.13e)
T
∑ zlt ≤ ul , l = 1, . . . , k, (2.13f)
t=1
yt ≥ qt , t = 0, . . . , T, (2.13g)
xit , xits , xitb ∈ Z+ , i = 1, . . . , n, t = 1, . . . , T, (2.13h)
yt ∈ R+ , t = 1, . . . , T, (2.13i)
zlt ∈ R+ , l = 1, . . . , k, t = 1, . . . , T. (2.13j)

Objective (2.13a) is to maximize the firm ”wealth” at the end of the planning
horizon. Equations (2.13c) and (2.13e) balance the budgets of, respectively, cash and
securities in periods 2, . . . , T . The similar balance constraints, (2.13b) and (2.13d),
are applied only for period 1. Inequalities (2.13f) ensure that the total borrowing
from any credit line does not exceed the volume of this credit line. Inequalities
(2.13g) require that the necessary minimum of cash be available in any period.

2.12 Planning Treatment of Cancerous Tumors

In the past, oncologists used the apparatus with two or three comparatively large
beams (10 × 10 centimeters in cross section) of a fixed orientation. The number of
beams in modern devices is constantly increasing, and each beam is divided into
several smaller rays, the size and intensity of which can vary within certain limits.
The intensity of a ray at the points along its path (and to a much lesser extent at the
points closest to it) is measured in doses. The unit of the dose is Gray (Gy) defined
to be the amount of energy per unit of mass received from the beam in the ionization
process around a given point. Treatment of cancerous tumors with such devices is
known as intensive modulated radiation therapy (IMRT).
Treatment with the IMRT method begins with elaborating a treatment plan. The
aim of the planning is to guarantee that, as a result of the treatment, the cancer cells
receive the required dose, and the dose received by healthy tissues must be safe
2.12 Planning Treatment of Cancerous Tumors 55

(there must be no irreversible damage). Planning begins with the definition of the
critical area around the tumor. The critical region is covered by a three-dimensional
uniform rectangular lattice. Let us assume that the lattice points are numbered from
1 to l, and let L = {1, . . . , l}. Let T ⊂ L denote the set of lattice points inside the
tumor. The rest of the critical area is broken (by type of tissue) into subdomains. Let
K denote the number of such subdomains, and Hk ⊂ L be the set of lattice points
inside subdomain k. Table 2.2 presents the parameters that characterize an example
of planning treatment for prostate cancer. Here |T | = 2438 points belong to the
tumor, |H1 | = 1566 points belong to the region immediately surrounding the tumor
(”collar”), additional unclassified |H2 | = 1569 points (other) lie near the tumor. The
bladder (|H3 | = 1292 points) and the rectum (|H4 | = 1250 points) require special
attention.

Table 2.2 Example of planning prostate cancer treatment


Number of Limiting dose Threshold dk % to Uni-
Tissue points bk (in cGy) (in cGy) threshold formity
Collar 1566 15000
Other 1569 15000
Bladder 1292 10000 8000 80 %
Rectum 1250 10000 7500 75 %
Tumor 2438 0.9

Suppose that each beam can be directed under one of n possible angles. A sam-
ple is a particular way of breaking the beam into rays, indicating their dimensions
and intensities. Let Pj denote the set of possible samples for a beam directed at an
angle j. Using special software, one can calculate the dose ai j p obtained at point i
from a beam of unit intensity directed at angle j if sample p (p ∈ Pj ) is used. It is
assumed that the dose at any point is approximately equal to the sum of the doses
received from all the beams. It is necessary to determine the intensities x j p of all the
beams in order to satisfy a number of conditions for doses at points of the critical
regions. We will introduce these conditions when we explain the constraints of the
following model:

t → max, (2.14a)
n
1
t≤ ∑ ∑ ai j p x j p ≤ α t, i ∈ T, (2.14b)
j=1 p∈Pj
n
∑ ∑ ai j p x j p ≥ s, i ∈ T 0, (2.14c)
j=1 p∈Pj
n
∑ ∑ ai j p x j p ≤ bk , i ∈ Hk , k = 1, . . . , K, (2.14d)
j=1 p∈Pj
56 2 MIP Models
n
∑ ∑ ai j p x j p ≤ dk + (bk − dk )yi , i ∈ Hk , k ∈ K̂, (2.14e)
j=1 p∈Pj

∑ yi ≤ b(1 − fk )|Hk |c, k ∈ K̂, (2.14f)


i∈Hk

x j p ≥ 0, p ∈ Pj , j = 1, . . . , n, (2.14g)
yi ∈ {0, 1}, i ∈ ∪k∈K̂ Hk . (2.14h)

In this model, the variable t represents the minimum dose at the tumor points.
The goal (2.14a) is to maximize this minimum dose. The uniformity coefficient, α,
in (2.14b) sets the lower bound for the ratio of the minimum and maximum doses at
the tumor points. In the example from Table 2.2 this coefficient is equal to 0.9.
In order not to miss the microscopic areas of the affected tissues in the immediate
vicinity of the tumor, a set of points, T 0 , surrounding the tumor is selected, and it
is required that each of these points receive at least the minimum dose of s. This
condition is expressed by (2.14c).
Inequalities (2.14d) limit the doses received by healthy tissues. Here bk is the
limiting dose for subdomain k. In the example from Table 2.2 the dose limits are set
for the ”collar”, bladder, rectum and other tissues.
To prevent irreversible damage to certain tissue types k ∈ K̂ ⊆ {1, . . . , K}, it is re-
quired that the fraction of points with a dose exceeding a given threshold dk (dk < bk )
be not greater than fk (0 < fk < 1). In the example from Table 2.2 such proportions
are given for the bladder and rectum. In our model, this condition is expressed by
Ineqs. (2.14e) and (2.14f), where, for each point i ∈ Hk (k ∈ K̂), we introduce an
auxiliary binary variable yi taking the value of 1 only if the dose at point i exceeds
the threshold dk .
Concluding the discussion of MIP (2.14), it should be noted that we cannot solve
such MIPs using standard software. Since each of the sets Pj consists of a very
large number of samples, the number of variables x j p is usually huge. To solve such
problems, it is necessary to develop a branch-and-price algorithm that is based on
the technique of column generation, which is discussed in Chap. 7.

2.13 Project Scheduling

Let us consider a project that consists of n jobs. There are qr renewable2 (non-
perishable) resources, with, respectively, Rri units of resource i available per unit of
time. There are also qn nonrenewable3 (perishable) resources, with, respectively, Rni

2 Renewable resources are available in the same quantities in any period. Manpower, machines,
storage spaces are renewable resources.
3 In contrast to a renewable resource, which consumption is limited in each period, overall con-

sumption of a nonrenewable resource is limited for the entire project. Money, energy, and raw
materials are nonrenewable resources.
2.13 Project Scheduling 57

units of resource i available for the entire project. It is assumed that all resources are
available when the project starts.
The jobs can be processed in different modes. Any job cannot be interrupted,
thus, if a job once started in some mode, it has to be completed in the same mode.
If job j is processed in mode m (m = 1, . . . , M j ), then
• it takes pmj units of time to process the job,
• ρ rjmi units of renewable resource i (i = 1, . . . , qr ) are used in each period when
job j is processed,
• and ρ njmi units of nonrenewable resource i (i = 1, . . . , qn ) are totally consumed.
Precedence relations between jobs are given by an acyclic digraph G = (J , R)
def
defined on the set of jobs J = {1, . . . , n}: for any arc ( j1 , j2 ) ∈ R, job j2 cannot
start until job j1 is finished.
A project schedule specifies when each job starts and in which mode it is pro-
cessed. The goal is to find a schedule with the minimum makespan that is defined to
be the completion time of the last job.
Let us assume that we know an upper bound H on the optimal makespan value.
We can take as H the makespan of a schedule produced by one of numerous project
scheduling heuristics. The planning horizon is divided into H periods numbered
from 1 to H, and period t starts at time t − 1 and ends at time t.
To tighten our formulation, we can estimate (say, by the critical path method) the
earliest, es j , and latest, ls j , start times of all jobs j.
First, we define the family of decision binary variables:
• x jmt = 1 if job j is processed in mode m and starts in period t (at time t − 1), and
x jmt = 0 otherwise.
For modeling purposes, we also need the following families of auxiliary variables:
• T : schedule makespan;
• d j ∈ R: duration of job j (d j depends on the mode in which job j is processed);
• s j ∈ R: start time of job j.
In these variables the model is written as follows:

T → min, (2.15a)
Mj ls j
∑ ∑ x jmt = 1, j = 1, . . . , n, (2.15b)
m=1 t=es j
Mj ls j
sj = ∑ ∑ tx jmt , j = 1, . . . , n, (2.15c)
m=1 t=es j
Mj ls j
dj = ∑ ∑ pmj x jmt , j = 1, . . . , n, (2.15d)
m=1 t=es j

T ≥ s j + d j, j = 1, . . . , n, (2.15e)
58 2 MIP Models

n Mj min(τ,ls j )
∑∑ ∑m ρ rjmi x jmt ≤ Rri , i = 1, . . . , qr , τ = 1, . . . , H, (2.15f)
j=1 m=1 t=max(τ−p j +1,es j )

n Mj ls j
∑∑ ∑ ρ njmi x jmt ≤ Rni , i = 1, . . . , qn , (2.15g)
j=1 m=1 t=es j

s j2 − s j1 ≥ d j2 , ( j1 , j2 ) ∈ R, (2.15h)
x jmt ∈ {0, 1}, t = es j , . . . , ls j , m = 1, . . . , M j , j = 1, . . . , n, (2.15i)
d j , s j ∈ R, j = 1, . . . , n. (2.15j)

Objective (2.15a) is to minimize the makespan. Equations (2.15b) ensure that


each job is processed only in one mode, and it starts only once. For the schedule
given by the values of x jmt variables, (2.15c) and (2.15d) calculate the start times,
s j , and durations, d j , of all jobs j. Inequalities (2.15e) imply that, being minimized,
T is the completion time of the job finished the last, i.e., T is the makespan. The
limitations on the renewable resources are imposed by (2.15f): for each period τ and
each resource i, the total consumption of this resource by all jobs that are processed
in this period cannot exceed the given limit Rri . Due to (2.15b), each inequality in
(2.15g) restricts the usage of a particular nonrenewable resource. The precedence
relations between jobs are given by (2.15h).

2.14 Short-Term Scheduling in Chemical Industry

It is easier to start with an example. Two products, 1 and 2, are produced from three
different raw products A,B, and C according to the following technological process.
• Heating. Heat A for 1 h.
• Reaction 1. Mix 50% feed B and 50% feed C and let them for 2 h to form inter-
mediate BC.
• Reaction 2. Mix 40% hot A and 60% intermediate BC and let them react for 2 h
to form intermediate AB (60%) and product 1 (40%).
• Reaction 3. Mix 20% feed C and 80% intermediate AB and let them react for 1 h
to form impure E.
• Separation. Distill impure E to separate pure product 2 (90%, after 1 h) and pure
intermediate AB (10% after 2 h). Discard the small amount of residue remaining
at the end of the distillation. Recycle the intermediate AB.
The above technological process is represented by the State-Task-Network (STN)
shown in Fig. 2.3.
The following processing equipment and storage capacities are available.
• Equipment:
– Heater: capacity 100 kg, suitable for task 1;
2.14 Short-Term Scheduling in Chemical Industry 59


Prod.
1

6
40% 2 h
 
1 h- Hot 40% 60% Int 
Heating - Reaction 2 -
A 2h AB
 
6 6 10% 2 h
60%
80%
  
Feed Int Imp. - Separation
A BC
  
E

6 6
2h 1h 90% 1 h
 ? 
?
Feed 50% Prod.
- Reaction 1 Reaction 3
B 2
 
50%
6  6
20%
Feed
C


Fig. 2.3 State-task network for example process

– Reactor 1: capacity 80 kg, suitable for tasks 2,3,4;


– Reactor 2: capacity 50 kg, suitable for tasks 2,3,4;
– Still: capacity 200 kg, suitable for task 5.
• Storage capacity for
– feeds A,B,C: unlimited;
– hot A: 100 kg;
– intermediate AB: 200 kg;
– intermediate BC: 150 kg;
– intermediate E: 100 kg;
– products 1,2: unlimited.
A number of parameters are associated with the tasks and the states defining the
STN, and with the available equipment items.
• Task i is defined by:
Ui : set of units capable of performing task i;
Siin : set of states that feed task i;
Siout : set of states to which task i outputs its products;
ρisin : proportion of input of task i from state s ∈ Siin , ∑s∈Sin ρisin = 1;
i
ρisout : proportion of output of task i to state s ∈ Siout , ∑s∈Sout ρisout = 1;
i
pis : processing time of output of task i sent to state s ∈ Siout ;
60 2 MIP Models

def
di : duration of task i, di = maxs∈Sout pis .
i

• State s is defined by:


Tsout : set of tasks receiving the product from state s;
Tsin : set of tasks producing the product for state s;
z0s : initial stock in state s;
us : storage capacity for the product in state s;
cs : unit cost (price) of the product in state s;
hs : cost of storing a product unit in state s.
• Unit j is characterized by:
I j : set of tasks that can be performed by unit j;
Vimax min
j ,Vi j : respectively, maximum and minimum loading of unit j when used
for performing task i.
Let n, q, m denote, respectively, the number of tasks, states, and units. The
scheduling problem for batch processing system is stated as follows.
Given: STN of a batch process and all the information associated with it, as well
as a planning horizon of interest.
Determine: schedule for each equipment unit (i.e. which task, if any, the unit per-
forms at any time during the planning horizon), as well as product flows inside
the STN.
Goal: maximize the total cost of the products produced by the end of the planning
horizon minus the total storage cost during the planning horizon.

MIP Formulation

Our formulation is based on the discrete representation of time. The planning hori-
zon is divided into a number of periods of equal duration. We number these periods
from 1 to T , and assume that period t starts at time t − 1 and ends at time t. Events
of any type — such as the start or end of processing any batch of a task, changes in
the availability of equipment units and etc. — are only happen at the beginning or
end of the periods.
Preemptive operations are not allowed and materials are transferred instanta-
neously from states to tasks and vice versa.
We introduce the following variables:
• xi jt = 1 if unit j starts processing task i at the beginning of period t, and xi jt = 0
otherwise;
• yi jt : total amount of products (batch size) used to start a batch of task i in unit j
at the beginning of period t;
• zst : amount of material stored in state s at the beginning of period t.
Now the MIP model is written as follows:
2.15 Multidimensional Orthogonal Packing 61

q q T
∑ cs zs,T − ∑ ∑ hs zst → max, (2.16a)
s=1 s=1 t=1
min{t,T −di }
∑ ∑ xi jτ ≤ 1, j = 1, . . . , m, t = 1, . . . , T, (2.16b)
i∈I j τ=max{0,t−di +1}

Vimin min
j xi jt ≤ yi jt ≤ Vi j xi jt , j = 1, . . . , m, i ∈ I j , t = 1, . . . , T, (2.16c)
0 ≤ zst ≤ us , s = 1, . . . , q, t = 1, . . . , T, (2.16d)
z0s = zs1 + ∑ ρisin ∑ yi j1 , s = 1, . . . , q, (2.16e)
i∈Tsin j∈Ui

zs,t−1 + ∑ ρisout ∑ yi j,t−pis = zst + ∑ ρisin ∑ yi jt ,


i∈Tsout : t>pis j∈Ui i∈Tsin j∈Ui

s = 1, . . . , q, t = 1, . . . , T, (2.16f)
xi jt = 0, t > T − di , j = 1, . . . , m, i ∈ I j , (2.16g)
xi jt ∈ {0, 1}, yi jt ∈ R+ , j = 1, . . . , m, i ∈ I j , t = 1, . . . , T, (2.16h)
zst ∈ R+ , s = 1, . . . , q, t = 1, . . . , T. (2.16i)

Objective (2.16a) is to maximize the total profit that equals the total cost of the
products in all states at the end of the planning horizon minus the expenses for
storing products during the planning horizon. Inequalities (2.16b) ensure that at any
time any unit cannot process more than one task. The variable bounds in (2.16c)
restrict the batch size of any task to be within the minimum and maximum capacities
of the unit performing the task. The stock limitations are imposed by Ineqs. (2.16d):
the amount of product stored in any state s must not exceed the storage capacity
for this state. The product balance relations from (2.16f) ensure that, for any state
s in each period t > 1, the amount of product entering the state (the stock from
the previous period plus the input from the tasks ending in period t − 1) equals the
amount of product leaving the state (the stock at the end of period t plus the amount
of product consumed by the tasks that started in period t). Equations (2.16e) are
specializations of the balance relations for period 1, which has no preceding period.
In (2.16g) we set to zero the values of some variables to enforce all the tasks to
finish within the planning horizon.

2.15 Multidimensional Orthogonal Packing

In an m-dimensional (orthogonal) packing problem we need to pack a number of


small m-dimensional rectangular boxes (items) into a large m-dimensional rectan-
gular box (container) so that no two items overlap, and all item edges are parallel to
the container edges.
62 2 MIP Models

Two-dimensional (m = 2) packing problems arise in different industries, where


steel, wood, glass, or textile materials are cut. In such cases these packing prob-
lems are also known as the two-dimensional cutting stock problems. The problem to
optimize the layout of advertisements in a newspaper is also formulated as a two-
dimensional packing problem. Three-dimensional (m = 3) packing problems — also
known as the container loading problems — appear as important subproblems in lo-
gistics and supply chain applications.
Formally, we have a large m-dimensional rectangular box (container) which sizes
are given by a vector L = (L1 , . . . , Lm )T ∈ Zm , and we also have a set of n small
m-dimensional rectangular boxes (items), item r sizes are given by a vector l r =
(l1r , . . . , lmr )T ∈ Zm .
In the (orthogonal) m-dimensional knapsack problem (m-KP), for each item r, we
also know its cost cr . The goal is to pack into the container — which is also called a
knapsack — a subset of items of maximum total cost so that no two items overlap,
and all item edges are parallel to the knapsack edges. If the items cannot be rotated
(say, when cutting decorated materials) and every edge of each item must be parallel
to the corresponding knapsack edge, we have an m-KP without rotation. Unless
otherwise stated, in what follows we shall consider the m-KP without rotation. This
restriction will be removed in Sect. 2.15.3, where we consider some extensions of
our basic IP formulation.
In the m-dimensional strip packing problem (m-SPP), it is assumed that one edge
(let us call it the height) of the container — which now is called a strip — is suffi-
ciently big so that all the items can be packed into the strip, and the objective is to
minimize the height of the occupied part of the strip.
In the m-dimensional bin packing problem (m-BPP) there are many large boxes
of equal sizes — which are called bins — and the objective is to pack all n items into
a minimum number of bins. We can translate any m-BPP instance into an instance
of (m + 1)-SPP, where the additional (m + 1)-st direction (the height of the strip)
is used to count bins: Lm+1 is an upper bound on the number of needed bins, and
r
lm+1 = 1 for all items r.
The main requirement in each of the above packing problems is that any two
items put into the container do not overlap. There are basically two ideas on how
this non-overlapping requirement is formulated in IPs. The first one originates from
the floor-planning applications, and it is considered in Sect. 1.2.1 for the two-
dimensional floor planning. We can easily extend the disjunctive approach used
there to formulate the m-KP as a MIP. But this disjunctive approach does not result
in a tight formulation because of using large coefficients to represent the disjunc-
tions by linear inequalities. Therefore such disjunctive MIP formulations are almost
useless in practice (using the best MIP solvers one cannot hope to solve m-KPs even
of moderate size).
An alternative approach for representing the non-overlapping requirement is to
consider the container as an m-dimensional grid which cells are m-dimensional unit
cubes. As usual, we associate each cell with its origin defined to be its corner that
is nearest to the container origin. Let L denotes the set ∏m k=1 {0, 1, . . . , Lk − 1} of
grid cell origins. We say that an item is placed at a point p ∈ L if its corner that
2.15 Multidimensional Orthogonal Packing 63

is nearest to the container origin is placed at p. Instead of requiring that no pair of


items overlap, now we require that any cell be covered by at most one item (see
Exercise 2.12). Although this approach leads to a much tighter IP-formulation, the
size of this formulation is huge, and its constraint matrix is very dense. Therefore,
this formulation cannot be used for solving packing problems even of moderate
sizes.

2.15.1 Basic IP Formulation

Here we consider a modeling approach that combines the disjunctive approach with
the discrete representation of the container. The latter will allow us to formulate the
non-overlapping disjunctions by linear inequalities with small coefficients.
First, let us define two families of decision binary variables:
• zr = 1 if item r is placed into the knapsack, and zr = 0 otherwise;
• yri j = 1 if item r is placed at a point p ∈ L with pi = j, and yri j = 0 otherwise.
For modeling purposes, we also need two families of auxiliary variables:
def
• xri j = 1 if the open unit strip U ji = {w ∈ Rm : j < wi < j + 1} intersects item r,
and xri j = 0 otherwise;
• sr1 ,r2 ,i = 1 if items r1 and r2 are separated by a hyperplane that is orthogonal to
axis i, and sr1 ,r2 ,i = 0 otherwise.
In these variables the m-KP is written as follows:
n
∑ cr zr → max, (2.17a)
r=1
Li −lir
∑ yri j = zr , i = 1, . . . , m, r = 1, . . . , n, (2.17b)
j=0
min{ j,Li −lir }
xri j = ∑ yr,i, j1 , j = 0, . . . , Li − 1, i = 1, . . . , m,
j1 =max{0, j−lir +1}
r = 1, . . . , n, (2.17c)
m
∑ sr1 ,r2 ,i ≥ zr1 + zr2 − 1, r2 = r1 + 1, . . . , n,
i=1
r1 = 1, . . . , n − 1, (2.17d)
xr1 ,i, j + xr2 ,i, j + sr1 ,r2 ,i ≤ zr1 + zr2 , j = 1, . . . , Li , i = 1, . . . , m,
r2 = r1 + 1, . . . , n, r1 = 1, . . . , n − 1, (2.17e)
zr ∈ {0, 1}, r = 1, . . . , n, (2.17f)
yri j ∈ {0, 1}, j = 0, . . . , Li − lir , i = 1, . . . , m,
r = 1, . . . , n, (2.17g)
64 2 MIP Models

xri j ∈ {0, 1}, j = 0, . . . , Li − 1, i = 1, . . . , m,


r = 1, . . . , n, (2.17h)
sr1 ,r2 ,i ∈ {0, 1}, i = 1, . . . , m, r2 = r1 + 1, . . . , n,
r1 = 1, . . . , n − 1. (2.17i)

Objective (2.17a) is to maximize the total cost of items placed into the knapsack.
Equations (2.17b) ensure that the values of y-variables uniquely determine the po-
sitions of all items placed into the knapsack. Simultaneously, these equations set
to zero the values of those y-variables that correspond to the items not placed into
the knapsack (zr = 0). Equations (2.17c) reflect the relationship between the x and
y-variables: a strip U ji crosses item r only if coordinate i of its nearest to the origin
corner is between max{0, j − lir + 1} and min{ j, Li − lir }. These equations together
with (2.17b) also impose the restrictions on the item sizes, namely, if item r is placed
into the knapsack, then, for any i = 1, . . . , m, the number of strips U ji crossing r is
lir , and these strips are sequential. Two families of inequalities, (2.17d) and (2.17e),
imply that each pair of items placed into the knapsack will be separated by at least
one hyperplane that is orthogonal to a coordinate axis. The other relations, (2.17f)–
(2.17i), declare that all variables are binary.

2.15.2 Tightening Basic Model

Here, we introduce a family of knapsack inequalities that can significantly strengthen


our IP (2.17). Let us denote by Vol the volume, ∏m i=1 Li , of the knapsack, and by volr
the volume, ∏m l
i=1 i
r , of item r. With this notations, we introduce the following knap-

sack inequalities:
n
∑ volr zr ≤ Vol, (2.18)
r=1
n
volr Vol
∑ r xri j ≤ , j = 1, . . . , Li , i = 1, . . . , m. (2.19)
r=1 li Li

Inequality (2.18) imposes a natural restriction that the sum of item volumes cannot
exceed the knapsack volume. Inequalities (2.19) reflect the fact that the sum of the
volumes of the intersections of all the items with any unit strip orthogonal to some
coordinate axis cannot exceed the volume of the intersection of the knapsack with
that strip.
Computational experiments showed that these knapsack inequalities may greatly
tighten our basic IP formulation.
2.16 Single Depot Vehicle Routing Problem 65

2.15.3 Rotations and Complex Packing Items

In this section we show how to extend our IP formulation of m-KP to cover the
cases when the rotation of items is allowed, and when the items are the unions of
rectangular boxes.
To model the first case, let us consider a version of m-KP when all n items are
partitioned into k groups, I1 , . . . , Ik , and from each group no more than one item can
be placed into the container. To take into account this additional restriction, we need
to add to (2.17) the following inequalities:

∑ zi ≤ 1, q = 1, . . . , k.
i∈Iq

If it is allowed to rotate the items, we put into one group all the items resulting
from all possible rotations of a particular item.
To model the case when some items are the unions of two or more rectangular
boxes, let us assume that our input n items are divided into groups of one or more
items, and all items from any group are together put or not put into the knapsack. In
each group of items we choose one item, r̄, called the base; any non-base item r in
this group is assigned a reference, re f (r) = r̄, to the base item, re f (r̄) = −1. Each
non-base item r is also assigned a vector vr ∈ Rm : if or̄ is the nearest to the origin
corner of the base item r̄ = re f (r), then or̄ + vr is the nearest to the origin corner of
item r. In other words, the shape of any group is determined by its base r̄ and the
vectors vr assigned to all non-base items r of this group.
The following equations model the above defined group restrictions;

zr = zre f (r) , r = 1, . . . , n, re f (r) ≥ 0,


re f (r)
ori = oi + vri zr , i = 1, . . . , m, r = 1, . . . , n, re f (r) ≥ 0.

2.16 Single Depot Vehicle Routing Problem

There is a depot that supplies customers with some goods. This depot has a fleet
of vehicles of K different types. There are qk vehicles of type k, and each such a
vehicle is of capacity uk (maximum weight to carry). We also know the fixed cost,
fk , of using one vehicle of type k during a day.
At a particular day, n customers have ordered some goods to be delivered from
the depot to their places: customer i is expecting to get goods of total weight di .
To simplify the notations, we will also consider the depot as a customer with zero
demand. For a vehicle of type k, it costs ckij to travel from customer i to customer j.
A route for a vehicle is given by a list of customers, (i0 = 0, i1 , . . . , ir , ir+1 = 0),
in which none of the customers, except for customer 0 (depot), is met twice. This
list determines the order of visiting customers. The route is feasible for vehicles of
type k if the total demand of the customers on this route does not exceed the vehicle
66 2 MIP Models

capacity, ∑rs=1 dis ≤ uk . The cost of assigning a vehicle of type k on this route is
fk + ∑r+1 k
s=1 cis−1 ,is .
The vehicle routing problem (VRP) is to select a subset of routes such that each
customer is just on one route, and then assign a vehicle of sufficient capacity (from
the depot fleet) to each selected rout so that the total cost of assigning vehicles to
the routes is minimum.

3j4 j
 BM 19 1
1 
3 1j
2  B
 1B
1  B 3 B
6j m
6
=

14 1 B3
A1
Y
H 2 B
j 4 7j
HH
1 4
B
j
AU
m
BN
3 2 P 13
PP 5 BM 
q 8j 1 2
1 P B 5
XXX 2 2
z 0m j
XX B
X
 X yX 2
2 12m9
 1
XX
X 11m

]
J 2
J 1
2
1 15m4
J
1 2 1
m
10
2 - 5j 

Fig. 2.4 Three rout solution for a VRP

Figure 2.4 displays a solution to some example VRP. Here we have 15 customers
(represented by the nodes numbered from 1 to 15) serving from one depot (node 0),
the numbers next to the nodes are customer demands, and the number adjacent to
each arc is the traveling cost for the car assigned to the route that contains this arc.
In our example we have three routes:
• 0 → 4 → 14 → 3 → 6 → 2 → 8 → 0 of total demand 18 and cost 9,
• 0 → 7 → 1 → 9 → 13 → 11 → 0 of total demand 15 and cost 11,
• 0 → 12 → 10 → 5 → 15 → 0 of total demand 9 and cost 7.
To formulate the VRP as an IP, we need the following family of decision binary
variables:
• xikj = 1 if some vehicle of type k travels directly from customer i to customer j,
and xikj = 0 otherwise.
In addition, we also need one family of auxiliary variables:
• yi j : weight of goods that are carried by the vehicle assigned to the route from
customer i to customer j.
In these variables the model is written as follows:
2.16 Single Depot Vehicle Routing Problem 67

K n K n
∑ ∑ ( fk + ck0, j )x0,k j + ∑ ∑ ∑ ckij xikj → min, (2.20a)
k=1 j=1 k=1 i=1 j∈{0,...,n}\{i}
n
∑ x0,k j ≤ qk , k = 1, . . . , K, (2.20b)
j=1
K n
∑ ∑ xikj = 1, j = 1, . . . , n, (2.20c)
k=1 i=0
n n
∑ xikj − xkji
∑ = 0, j = 1, . . . , n, k = 1, . . . , K, (2.20d)
i=0 i=0
n n
∑ yi j − ∑ y ji = d j , j = 1, . . . , n, (2.20e)
i=0 i=0
K
0 ≤ yi j ≤ ∑ (uk − di )xikj , i, j = 0, . . . , n, (2.20f)
k=1
xiik = 0, i = 0, . . . , n, k = 1, . . . , K, (2.20g)
xikj ∈ {0, 1}, i, j = 0, . . . , n, k = 1, . . . , K. (2.20h)

Objective (2.20a) is to minimize the total fixed cost of using vehicles plus the
traveling cost of all used vehicles along their routes. Inequalities (2.20b) ensure that
the number of vehicles of any particular type that leave the depot does not exceed
the number of such vehicles in the depot fleet. Equations (2.20c) guarantee that each
customer will be visited by just one vehicle, while (2.20d) guarantee that any vehicle
that arrives at a customer will leave that customer. The next family of equations,
(2.20e), reflects the fact that any vehicle before leaving a customer must upload the
goods ordered by that customer. Each inequality from (2.20f) imposes the capacity
restriction: if a vehicle of type k travels directly from customer i to customer j,
then the total cargo weight on its board cannot exceed uk − di ; moreover, if none of
vehicles (of any type) travels directly from customer i to customer j, then yi j = 0.
One can easily argue that these capacity restrictions guarantee that each used vehicle
will never carry goods of total weight greater than the vehicle capacity.
A note of precaution is appropriate here. Formulation (2.20) may be very weak
because the variable bound constraints in (2.20f) are usually not tight. Let us assume
that some inequality yi j ≤ ∑kk=1 (uk − di )xikj holds as equality. If the capacity uk is
big, and the demand di is small, then uk − di is big. If we further assume that the
sum of demands of those customers that are on the same route with customer i and
are visited after customer i is small, then yi j is also small and, therefore, xikj takes a
small fractional value. Many binary variables taking fractional values is usually an
indicator of a difficult to solve problem. Therefore, one can hardly expect that (2.20)
can be used for solving to optimality even VRPs of moderate size. Nevertheless, if
implemented properly, an application based on this formulation can produce rather
good approximate solutions for VRPs of practical importance.
68 2 MIP Models

2.16.1 Classical Vehicle Routing Problem

Here we consider a special case of the VRP that is more ”uniformly” structured. Let
us assume that there are m vehicles in the depot fleet, all of them are of the same
type (K = 1) and the same capacity U. Fixed costs of using vehicles are not taken
into account. This special VRP is known as the classical vehicle routing problem
(CVRP).
def
Let N = {0, 1, . . . , n}, and let r(S) denote the minimum number of vehicles
needed to serve a subset S ⊆ N \ {0} of customers. The value of r(S) can be com-
puted by solving the 1-BPP (see Sect. 2.15) with all bins of capacity U, and the item
set S, where the length of item i is l1i = di . Since 1-BPP is NP-hard in the strong
sense, in practice, r(S) is approximated from below by the value b(∑i∈S di ) /Uc.
To formulate the CVRP as an IP, we use the following family of decision binary
variables:
• xi j = 1 if some vehicles directly travels from customer i to customer j, and xi j = 0
otherwise.
In these variables the model is written as follows:

∑ ∑ ci j xi j → min, (2.21a)
i∈N j∈N\{i}

∑ x0, j ≤ m, (2.21b)
j∈N\{0}

∑ x ji = 1, i ∈ N \ {0}, (2.21c)
j∈N\{i}

∑ ∑ x ji ≥ r(S), S ⊆ N \ {0}, S 6= 0,
/ (2.21d)
j∈N\S i∈S

xi j ∈ {0, 1}, i ∈ N, j ∈ N \ {i}. (2.21e)

Objective (2.21a) is to minimize the total delivery cost. Inequality (2.21b) ensure
that no more than m vehicles may leave the depot, and, therefore, there may be no
more than m routes. Equations (2.21c) guarantee that each customer will be visited
exactly once. Inequalities (2.21d) guarantee that the routs defined by the values of
xi j -variables are feasible, i.e., each of them leaves the depot, and, due to definition
of r(S), the total weight of all customer demands on this route does not exceed the
vehicle capacity.
Since (2.21d) contains exponentially many inequalities, IP (2.21) can be solved
only by a cutting plane algorithm (see Sect 4), and this is possible only if the sepa-
ration problem (see Sect 4.7) for (2.21d) can be solved very quickly. Unfortunately,
in general this is not the case. However, when x is an integer vector that satisfied
all constraints from (2.21) but (2.21d), then the problem of finding in (2.21d) an
inequality that is violated at x is trivial, and we leave it to the reader to elaborate
such a separating procedure.
2.17 Notes 69

2.17 Notes

Sect. 2.1. In more detail, the set packing, set partitioning and set covering problems
are discussed in [98, 123]. The polyhedral structure of the set packing problem is
also considered in Sect. 5.5.
The problems similar to the problem of crew scheduling have always been a
fruitful area for applications of the set covering and set partitioning problems (see
[80]). If you are interested in how trades are organized at combinatorial auctions,
see [100].
Sect. 2.2. The problem of locating service centers was first formulated as an IP in
[15]. The polyhedral structure of this problem is studied in [98, 142].
Sect. 2.3. Exact and approximate methods of solving the problem of the formation
of the index fund are studied in [40].
Sect. 2.4. A classification of multi-product lot-sizing models is given in [143]. The
polyhedral structures of some of these models are studied in [98, 142].
Sect. 2.5. The assembly line balancing problems and the algorithms for their solu-
tion are discussed in [120].
Sect. 2.6. In the literature on the optimization of the performance of energy systems,
the unit commitment problem is one of the highest priority [126].
Sect. 2.7. From the sources [21, 131], you can learn more about the problems of
designing telecommunication networks and the methods for solving them.
Sect. 2.8. A good source about the detailed placement problem is the survey [125].
Sect. 2.9. The problem of assigning aircrafts to flights and its IP formulation is
studied in [1].
Sect. 2.10. The model for determining an optimal operation mode of a hybrid car is
an extended version of the model from [29].
Sect. 2.11. The problem of short-term financial management was formulated back
in 1969 [101], and its essence has not changed since then.
Sect. 2.12. A historical reference on the application of MIP for the treatment of
cancer tumors by the method of intensive modulated radiation therapy is given in
[112]. A description of the column generation algorithm for solving (2.14) is also
proposed there.
Sect. 2.13. A good survey on the resource-constrained project scheduling is given
in [128].
Sect. 2.14. Using STNs for describing technological processes was proposed in [82].
A MIP formulation for the problem of optimizing technological processes was also
presented there.
Sect. 2.15. The disjunctive approach for modeling multidimentional packing prob-
lems originates from the floor-planning applications [130]. The first IP formulation
of a two-dimensional packing problem (namely, the cutting stock problem) based
on the discretization of the container space was given by Beasley [20] (see also
Exercise 2.12). Formulation (2.17) is published in this book for the first time.
70 2 MIP Models

Sect. 2.16. Many IP alternative formulations have been proposed for different varia-
tions of the vehicle routing problem. The single-commodity flow formulation (2.20)
was first presented in [53]. Formulation (2.21) was given in [84] as an extension of
the IP formulation for the traveling salesman problem proposed in [43] (see also
(6.13)). For a survey on vehicle routing, see [38].

2.18 Exercises

2.1. Frequency assignment. Frequency ranges numbered from 1 to k must be as-


signed to n radio stations. For each pair of station, i, j, we know the value pi j of
the interference parameter, which means that the modulus of the difference of the
frequency ranges assigned to the stations i and j must be at least pi j . The goal is to
minimize the maximum assigned range number.
Formulate this problem as an IP.
2.2. ATM allocation. A bank wants to install a certain number of automated teller
machines (ATMs) in a rural area in which there are n communities. ATMs can be
placed in any of these locations. It is known that ki bank’s customers reside in com-
munity i, and it takes ti j minutes to drive from community i to community j.
1) What is the minimum number of ATMs to be installed and at what locations must
they be installed so that the travel time from any community to the nearest ATM
does not exceed T minutes?
2) Where to place no more than q ATMs to maximize the number of customers that
can get to the nearest ATM for no more than T minutes?
Formulate both problems as MIPs. Compare two formulations. Which one is
more appropriate for using in practice?
2.3. Assume that in the scheduling problem considered in Sect. 1.8 it is additionally
assumed that, if both jobs j1 and j2 are assigned to the same machine k, to start
processing job j2 immediately after job j1 , τ kj1 , j2 units of time are spent preparing
the machine.
Extend the time-index formulation (1.26) to take into account this new problem
feature.
2.4. Scheduling multiple machines. The enterprise has orders for batches of parts,
each batch contains n different parts. Each part must be successively processed by
machines 1, 2, . . . , m. The processing time of part i on machine k is pik . It is neces-
sary to determine the order of processing parts on each of the machines so that the
processing time of one batch is minimum.
Write two formulations for this problems: in one use a continuous time model,
and in the other a discrete one.
2.5. Control of fuel consumption. Consider a linear dynamical system that is de-
scribed by the following linear recursion
2.18 Exercises 71

x(t) = Ax(t − 1) + bu(t − 1), t = 1, . . . , T,

where an m × n-matrix A and a vector b ∈ Rn are given parameters, x(t) ∈ Rn and


u(t) ∈ R are, respectively, the state of the system and the control (signal) in period t.
We need to define the controls u(0), . . . , u(T − 1) in order to transfer the system
from an initial state x(0) = x0 to a final state x(T ) = xdest consuming the minimum
T −1
amount of fuel that is determined by the formula ∑t=0 f (u(t)).
Assuming that f is a piecewise linear function with the break-points

(u1 , f1 = f (u1 )), . . . , (uk , fk = f (uk )),

formulate this problem as a) an LP if f is convex, or b) a MIP in the general case.


2.6. Single-product lot-sizing with backlogging. Let us consider the single-product
lot-sizing problem studied in Sect. 1.7.1. But now it is assumed that a part of the
demand in some periods can be satisfied by deliveries in later periods. It is allowed
that no more than d¯jt unit of the product of the total demand in d jt units can be
delivered not in time but later in periods t +1, . . . , T . Each unit of product j, supplied
not in time, is sold at a discount equal to r j .
Modify (2.6) to take into account these new features.
2.7. Prove that we can strengthen IP (2.7), which is a formulation of the simple
assembly line balancing problem, using in place of (2.7e) the following alternative
formulation of the precedence relations:
k k
∑ xi, j1 ≤ ∑ xi, j2 , k = 1, . . . , n − 1, ( j1 , j2 ) ∈ E.
i=1 i=1

2.8. Modify IP (2.7), which is a formulation of the simple assembly line balancing
problem, to take into account the following requirements to uniformly load the sta-
tions: a) any open station work time (on one product) must be at least q1 percent
of the cycle time, b) the work times of the maximum and minimum loaded stations
must not differ by more than q2 percent.
2.9. Clearing problem. There are m banks in a country. The current balance of bank i
is bi . Several times a day, the Interbank Settlement Center receives a list of payments
Pk = (ik , jk , A1k , A2k , Sk ), k = 1, . . . , n. The fields of the tuple Pk are interpreted as
follows: it is necessary to transfer the sum Sk from the account A1k in bank ik to the
account A2k in bank jk . The goal is to accept as many payments as possible, provided
that the new balance (calculated taking into account the payments made) of each of
the banks will be non-negative. Note that for this optimization problem the fields A1k
and A2k of the payment records are insignificant.
Formulate this clearing problem as an IP.
2.10. Sport tournament scheduling. The teams participating in the basketball cham-
pionship are divided into two conferences: Western and Eastern. There are n1 teams
in the western conference, and n2 teams in the eastern conference. One round of the
championship lasts T weeks. Each team must play no more than once a week and
must play 2k times with each team in its division and 2q times with each team in
72 2 MIP Models

the other division. For each pair of opponents, half of the games must be played on
the site of each team. Of all the round-robin schedules, the best is that for which the
minimum interval between games of the same pair of teams is maximum.
Formulate the problem of finding the best tournament schedule as an IP.
2.11. Nearest substring problem4 . Let A be some finite set of symbols (alphabet).
The sequence of characters s = ”s1 s2 . . . sk ” (si ∈ A ) is called a string of length
|s| = k in the alphabet A . A substring of string s is a string ”si1 si2 . . . sim ” composed
of the characters of string s and written in the same order in which they are present
in s, i.e., 1 ≤ i1 < i2 < · · · < im ≤ |s|. The distance d(s1 , s2 ) between two strings s1
and s2 of the same length is defined as the number of positions in which these strings
differ. For example, if s1 = ”ACT ” and s2 = ”CCA”, then d(s1 , s2 ) = 2. If |s1 | < |s2 |,
then the distance d(s1 , s2 ) is defined to be the maximum distance d(s1 , s̄2 ) for all
substrings s̄2 of length |s1 | in string s2 .
Given a list (s1 , s2 , . . . , sn ) of strings in some alphabet A , the length of each
string si is at least m. We need to find a string s of length m such that the maximum
distance d(s, si ) (i = 1, . . . , n) is minimum. Formulate this problem as an IP.
2.12. Formulate the 2-KP as an IP using only the following binary variables: yri j ,
j = 0, . . . , Li − lir , i = 1, 2, r = 1, . . . , n, where yr,1,s = yr,2,t = 1 only if (s,t) is the
item r corner that is nearest to the knapsack origin.
2.13. Balanced airplane loading. The cargo plane has three cargo compartments:
front (1st), central (2nd) and tail (3rd). The base of compartment i is a rectangle of
width Wi and length Li , the total weight of cargo in compartment i must not exceed
Gi tone, i = 1, 2, 3.
We need to load n containers into the plane, the containers cannot be stacked
on top of each other. The weight of container j is g j , and its base is of width w j
and length l j . The goal is to load the containers into the plane so that the weights of
cargo in different compartments are balanced: the difference between the largest and
smallest ratios of the total weight of the cargo in the compartment to the maximum
allowable weight of cargo in this compartment must be minimum.
Formulate the problem of balanced airplane loading as an IP.

4 The problems of comparing strings in different statements are often encountered in many appli-
cations of computational biology.
Chapter 3
Linear Programming

After discovering the interior point methods, it seemed that the new methods would
completely displace the simplex algorithms from the practical use. New methods
proved to be very efficient in practice, especially when solving large-scale LPs. A
crucial requirement for an LP algorithm to be used in MIP is its ability to quickly
perform reoptimization, i.e., having found an optimal solution of some LP, the
method must be able to quickly find an optimal solution of a ”slightly” modified
version of the just solved LP. None of the interior point algorithms is able to do
reoptimization quickly. The wide use of MIPs in practice and the ability of the dual-
simplex method to quickly perform reoptimization enabled the latter to survive in
the competition with the interior points methods. Moreover, the acute practical need
for efficient MIP algorithms stimulated researches that resulted in a significant in-
crease in the efficiency of the simplex algorithms in practice. And today the best
implementations of the simplex algorithms are quite competitive with the best im-
plementations of the best interior point methods. Therefore, here we will study only
the simplex algorithms, and the main attention will be paid to the dual simplex
method.

3.1 Basic Solutions

Let us consider the LP in canonical form:

zP = max{cT x : Ax ≤ b}, (3.1)

where A is a real m × n-matrix, c ∈ Rn , b ∈ Rm , and x = (x1 , . . . , xn )T is a vector of


variables (unknowns).
In what follows we will assume that the constraint matrix A in (3.1) is of full col-
umn rank, i.e., rank(A) = n. Let M = {1, . . . , m} and N = {1, . . . , n} be, respectively,
the sets of rows and columns of A. We denote by AJI the submatrix of A with rows

73
74 3 Linear Programming

from a set I ⊆ M and columns from a set J ⊆ N. If J = N (resp., I = M), then instead
of ANI (resp., AJM ) we write AI (resp., AJ ).
A subset I of n linearly independent rows of A is called a (row) basic set, the
matrix AI is called a basic matrix, and the only solution x̄ = A−1 I bI of the linear
system AI x = bI is a basic solution. If in addition x̄ is feasible, i.e., it satisfies the
system of inequalities Ax ≤ b, then x̄ is called a feasible basic solution, and I is
called a feasible basic set. Note also that the feasible basic solutions are nothing
def
else as the vertices of the polyhedron P(A, b) = {x ∈ Rn : Ax ≤ b}.
To clarify the above definitions, let us consider the feasible region of an LP with
n = 3 variables and m = 7 constraints
−x1 ≤ 0, H1
− x2 ≤ 0, H2
− x3 ≤ 0, H3
x1 + x2 + x3 ≤ 4, H4
2x1 ≤ 5, H5
3x2 ≤ 7. H6

These constraints are shown in Fig. 3.1. The basic solutions are depicted as bold
dots, and feasible basic solutions in addition are circled.

x3 6
sfx{124}
@
H2 @ H1
@
@
@ x
H4 @sf {246}
@
@

sf {245} sfx{123}
H6 @
x
sfx{236} @s -
@ ! x2
!! x{134}

H5@ ! !
@ ! sfx!
sf @ sf! s {346}
!
x{235} ! x ! !@ x{356} H3
! {345} @s
!! x{456}
s!
!
x1 x{234}

Fig. 3.1 Vertices and basic solutions

A feasible basic solution (vertex) x̄ is said to be degenerate if it corresponds to


two different basic sets I1 and I2 , i.e., AI1 x̄ = bI1 , AI2 x̄ = bI2 , |I1 | = |I2 | = n, I1 6=
I2 . Geometrically this means that the vertex x̄ lies on more than n facets of the
3.2 Primal Simplex Method 75

polyhedron P(A, b). For example, two vertices (1, 0, 3) and (2, 2, 0)T of the polytope
from Fig. 1.4 are degenerate, and all the others are nondegenerate. A polyhedron
P(A, b) that has degenerate vertices is called degenerate. Similarly, an LP having
degenerate feasible basic solutions is called degenerate.
A basic set I and the corresponding to it basic solution x̄ = A−1I bI are called dual
feasible if the vector of potentials π T = cT A−1 I is non-negative. In this case, the
point ȳ = (ȳI = π, ȳM\I = 0) is a feasible solution to the dual LP1

zD = min{bT y : AT y ≥ c, y ≥ 0}. (3.2)

We have the equality

cT x̄ = cT A−1 T T
I bI = π bI + bM\I 0 = b ȳ.

On the other hand, for any feasible solution x of the primal LP (3.1) and any
feasible solution y of the dual LP (3.2), we have

cT x ≤ yT Ax ≤ yT b. (3.3)

It follows that x̄ and ȳ are optimal solutions to the primal, (3.1), and dual, (3.2),
LPs if the basic set I is simultaneously primal and dual feasible. In the context of
duality, feasible basic sets and solutions are also called primal feasible, and the
solutions to the dual LP are called dual solutions (for the primal LP). We also note
that the components of an optimal dual solution are also called shadow prices (for
the primal LP).

3.2 Primal Simplex Method

Let I be a feasible basic set, B = AI and x̄ = B−1 bI be the corresponding basic matrix
and feasible basic solution. Now and in what follows, we assume that the order of
the elements in the basic set is fixed, i.e., I is a list, and we denote the i-th element
of this list by I[i].
Let us remove from the basic set some row index I[t]. The solution set of the lin-
def
ear system AI\I[t] x = bI\I[t] is the line {x(λ ) = x̄ − λ B−1 et : λ ∈ R}. Let us imagine
that we put an n-dimensional chip into the point x̄ = x(0), and then, increasing λ ,
move this chip within the feasible polyhedron P(A, b) along the ray {x(λ ) : λ ≥ 0}
until it rests against some hyperplane given by As x = bs for s 6∈ I. Let x̂ = x(λ̂t ) be
the intersection point of our ray with his hyperplane. Notice that

kx̂ − x̄k bi − Ai x̄ bs − As x̄
λ̂t = = min =
kB−1 et k i6∈I, −1
Ai B et As B−1 et
Ai B−1 et >0

1Duality in linear programming is discussed in Sect. 3.5. The variables yi of the dual LP (3.2) are
dual variables for the primal LP (3.1).
76 3 Linear Programming

and x̂ = B̂−1 bIˆ, where B̂ = AIˆ and the new basic row set, I,
ˆ is defined by the rule

ˆ = s, i = t,
I[i] (3.4)
I[i], i 6= t.

It is not difficult to see that the following equality holds

B̂−1 = B−1 I(t, u)−1 , (3.5)

where uT = As B−1 , and the matrix I(t, u) is obtained from the identity matrix I by
substituting the row vector uT for the row t. Notice that
1
I(t, u)−1 = I − et uT + etT

ut
 
1 0 ... 0 0 0 ... 0
 0 1 ... 0 0 0 ... 0
.. .. . . .. .. .. . . .. 
 

 . . . . . . . .

 0 0 ...
 (3.6)
1 0 0 ... 0 
= − u1 − u2 . . . − ut−1 1 ut+1 un  .

 ut ut ut ut − ut . . . − ut 
 0 0 ... 0 0 1 ... 0 
 
 .. .. . . .. .. .. . . .. 
 . . . . . . . .
0 0 ... 0 0 0 ... 1

In LP, such a change in the basis is called a pivot operation. Column t is called a
pivot column, row s is a pivot row, and the element As B−1 et , which in the matrix
AB−1 is in row s and column I[t], is called a pivot element.
Let us illustrate the pivot operation using the example polytope from Fig. 3.1.
Let I = {2, 4, 6} and t = 1. Then x̄ = x{246} and the ray {x(λ ) : λ ≥ 0} is directed
from the point x{246} along the edge [x{246} , x {346} ] to the point x{346} , which lies
on the hyperplane H3 . Therefore, Iˆ = {3, 4, 6} and x̂ = x{346} are the new feasible
basic set and solution.
To ensure that, after performing the pivot operation, the objective function will
increase,
kx̂ − x̄k kx̂ − x̄k
cT x̂ − cT x̄ = − −1 cT B−1 et = − −1 πt > 0, (3.7)
kB et k kB et k
the index t is chosen so that the directing vector −B−1 et forms an acute angle with
the gradient, c, of the objective function: cT B−1 et = πt > 0.
It may happen that all components of the vector AB−1 et are nonpositive. Then
λ̂t = ∞ (we assume that the minimum over the empty set of alternatives is infinite),
and this means that we can move our chip along the ray {x(λ ) : λ ≥ 0} infinitely
long, remaining within the polyhedron P(A, b). If, in addition, πt < 0, then the ob-
jective function will increase to infinity.
3.2 Primal Simplex Method 77

simplex(c, A, b, I) // I is a feasible basic set


{
B−1 = A−1 −1 T
I ; x := B bI ; π := c B ;
T −1

while (π 6∈ Rn+ ) { // otherwise, x is an optimal solution


choose an index t such that πt < 0;
v := −B−1 et . // −v is column t of the inverse basic matrix B−1
if (Av ≤ 0)nreturn (false, x, v); // objective function is unbounded
λ := min bi −
o
Ai x : i 6∈ I, A v > 0 ;
Ai v i
chose an index s, for which the value λ is attained;
I[t] := s; uT := As B−1 ; B−1 := B−1 I(t, u)−1 ;
x := x + λ v; // x = A−1 I bI
π T := π T I(t, u)−1 ; // π T = cT B−1
}
return (true, x, y = (yI = π, yM\I = 0));
}

Listing 3.1. Primal simplex method

A detailed description of the (primal) simplex method is presented in Listing 3.1.


The parameters of the LP being solved, A, b and c, as well as an initial feasible
basic set I constitute the input to the simplex procedure that implements the primal
simplex method. The method terminates for two reasons.
1) The current feasible basic set I becomes also dual feasible. In this case, the output
is a triple
(true, x, y = (yI = π, yM\I = 0)),
where x is an optimal solution to the primal LP (3.1), and y is an optimal solution
of the dual LP (3.2).
2) If Av ≤ 0, then the point x(λ ) = x + λ v is a feasible solution to (3.1) for all
λ ≥ 0 and limλ →∞ cT x(λ ) = ∞, i.e., the objective function of the primal LP is
unbounded. In this case, the procedure returns the triple (false, x, v), where the
pair (x, v) forms a certificate of unboundedness (see Sect. 3.4).
Example 3.1 We need to solve the next LP:

x1 + 2x3 → max,
x1 + 2x2 + x3 ≤ 4,
x1 + x3 ≤ 3,
− x2 + x3 ≤ 1,
−x1 ≤ 0,
− x2 ≤ 0,
− x3 ≤ 0.

Solution. The polyhedron of feasible solutions for this LP is depicted in Fig. 3.2.
When the simplex method starts working with the feasible basic set I = (4, 5, 6), its
iterations are as follows.
78 3 Linear Programming

x3
x(2)

x(3) x(1)

x2
x(0)

x1

Fig. 3.2 Polyhedron of the LP from Exercise 3.1

 
−1 0 0
0. I = (4, 5, 6), B−1 =  0 −1 0 , x(0) = (0, 0, 0)T , π = (−1, 0, −2)T .
0 0 −1

1. t = 3, v = (0, 0, 1)T , and since A1 v = 1, b1 − A1 x(0) = 4, A2 v = 1, b2 − A2 x(0) = 3,


b3 − A3 x(0) = 1, A3 v = 1, we have
 
4 3 1
λ = min , , = 1. s = 3,
1 1 1
 
−1 0 0
I = (4, 5, 3), B−1 =  0 −1 0 ,
0 −1 1
x(1) = x(0) + λ v = (0, 0, 1)T , π = cT B−1 = (−1, −2, 2)T .

2. t = 2, v = (0, 1, 1)T , and since A1 v = 3, b1 − A1 x(1) = 3, A2 v = 1, b2 − A2 x(1) = 2,


A6 v = −1, we have
 
3 2
λ = min , = 1, s = 1,
3 1
−1 0 0
 
1 1 1
I = (4, 1, 3), B−1 =  3 3 − 3  ,
1 1 2
3 3 3
1 2 4 T
 
x(2) = x(1) + λ v = (0, 1, 2)T , π= − , , .
3 3 3
3.2 Primal Simplex Method 79
T
3. t = 1, v = 1, − 13 , − 31 , and since A2 v = 23 , b2 − A2 x(2) = 1, A5 v = 13 , b5 −
A5 x(2) = 1, A6 v = 13 , b6 − A6 x(2) = 2, we have
 
1 1 2 3
λ = min , , = , s = 2,
2/3 1/3 1/3 2
 3 1
− 2 −1

2
I = (2, 1, 3), B−1 = − 12 12 0  ,
− 21 12 1
3 1 3 T
   T
1 1
x(3) = x(2) + λ v = , , , π= , ,1 .
2 2 2 2 2
T
Since all potentials πi are non-negative, then x∗ = x(3) = 23 , 21 , 32 is an optimal
T
solution to our LP, and y∗ = 21 , 12 , 1, 0, 0, 0 is an optimal solution to the dual LP.
t
u

3.2.1 How to Find a Feasible Basic Solution

A feasible basic set is one of the inputs to the simplex procedure. But how to find
such a set? It is possible to transform the initial LP in order to obtain an equivalent
LP for which a feasible basic set can be simply identified. There are several ways to
perform such a transformation, but we will consider only one of them.
Again, let us consider LP (3.1). First, say, by the method of Gaussian elimination,
we find a set I of n linearly independent rows of the constraint matrix A. Let B = AI
and x̄ = B−1 bI . If the vector of the residuals b − Ax̄ is non-negative, then we are
finished: x̄ is a feasible solution (Ax̄ ≤ b), and I is a feasible basic set. Otherwise,
we solve the following LP:

−xn+1 → max,
Ax + axn+1 ≤ b, (3.8)
0 ≤ xn+1 ≤ 1,

where the components of the vector a ∈ Rm are defined by the rule: ai = 0 if Ai x̄ ≤ bi


and ai = bi − Ai x̄ − 1 otherwise. We obtain a feasible basic set Iˆ for (3.8) by adding
to the set I the index, m + 1, of the inequality xn+1 ≤ 1. We also note that the basic
set Iˆ determines the feasible basic solution (x̄, 1). If xn+1 6= 0 in an optimal solution
to (3.8), then (3.1) does not have feasible solutions. Otherwise, removing m + 2 (the
index of −xn+1 ≤ 0) from an optimal basic set for (3.8), we obtain a feasible basic
set for (3.1).
So, we can solve (3.1) in two steps (or phases): in the first step, we solve (3.8) to
find a feasible basic set for (3.1), and in the second step, we solve (3.1). These two
steps can be combined into one step, and we have to solve not two, but only one LP:
80 3 Linear Programming

cT x − Mxn+1 → max,
Ax + axn+1 ≤ b, (3.9)
0 ≤ xn+1 ≤ 1,

where M is a sufficiently large number. We could estimate the value of M, but any
theoretical estimate is usually too large, and it is not easy (for reasons of numerical
stability) to use it in practice. Therefore in practice, the solution of (3.9) begins with
a moderately large value of M. If it turns out that in the obtained optimal solution
the component xn+1 is non-zero, then M is doubled and the solution of this LP
is continued by the simplex method. This is repeated until either a solution with
xn+1 = 0 is found or the value of M exceeds a certain threshold value. In the latter
case, it is concluded that (3.1) does not have feasible solutions.

3.2.2 Pricing Rules

The calculation of the potential vector π in the simplex procedure is called pricing,
or a pricing operation. Obviously, there can be many negative components πi and,
therefore, we need a rule for an unambiguous choice of index t. The following rules
(strategies) are best known:
”first negative”: t = min{i : 1 ≤ i ≤ n, πi < 0};
”most negative”: t ∈ arg min πi ;
1≤i≤n
πi
”steepest edge”: t ∈ arg min ;
1≤i≤n kB−1 ei k
”maximum increase”: t ∈ arg min λ̂i πi .
1≤i≤n
The meaning of these rules, with the exception for the steepest edge rule, must
be clear from their names. When the steepest edge rule is used, the simplex-method
moves from the current vertex to the next one along an edge which directing vec-
tor, −B−1 et , forms the most acute angle, of value φt , with the objective (gradient)
vector c:
−cT B−1 et 1 −πt
cos(φt ) = −1
= · −1 .
kck · kB et k kck kB et k
The larger cos(φt ), the sharper the angle between the vectors −B−1 et and c is.
In practice, for many years the most negative rule (also known as Danzig’s rule)
prevailed. The first negative rule is the easiest to implement, but in comparison,
say, with the most negative rule the number of iterations of the simplex method
can increase substantially. The maximum increase rule requires computations that
take too much time at each iteration and, therefore, this rule is not practical. The
same could be said about the steepest edge rule until the formulas were found for
recalculating the squares of the column norms.
Lemma 3.1. Let I be a feasible basic set, Iˆ be the basic set determined by (3.4) for
some t ∈ {1, . . . , n}, and let B = AI , B̂ = AIˆ and
3.3 Dual Simplex Method 81

def T
γi = kB−1 ei k2 = eTi B−1 B−1 ei , i = 1, . . . , n.

Then 
1

 γt , i = t,
ut2


def
γ̂i = kB̂−1 ei k2 = (3.10)
ui u2
 γi − 2 αi + 2i γt , i 6= t,



ut ut
where uT = As B−1 , v = B−1 et , α T = vT B−1 .
Proof. In view of (3.5), (3.6), and since
 ui
 ei − ut et , i 6= t,
  

1
I − et (uT + etT ) ei =
ut 1
 − et , i = t,


ut
we obtain

γ̂t = (1/ut2 )γt ,


 T  
ui T ui
γ̂i = ei − et B−1 B−1 ei − et
ut ut
T ui T u2 T
= eTi B−1 B−1 ei − 2 eTi B−1 B−1 et + 2i etT B−1 B−1 et
 
ut ut
ui u2
= γi − 2 αi + i2 γt , i 6= t. 
ut ut

3.3 Dual Simplex Method

In this section we consider another version of the simplex method known as the
dual simplex method. This simplex method is called dual, because, solving an LP, it
essentially repeats the work of the primal simplex method applied to the dual LP. It
should also be noted that the dual simplex method is the main LP method in MIP.
Again, we consider an LP of the form (3.1). Let I be a dual feasible basic set and
let B = AI , x̄ = B−1 bI , π T = cT B−1 ≥ 0 and ȳ = (ȳI = π, ȳN\I = 0). Let us recall
that ȳ is a feasible solution to the dual LP (3.2). If a basic solution x̄ is feasible, then
it is optimal. Otherwise, there is an inequality As x ≤ bs , s 6∈ I, that is violated at x̄.
The dual objective function bT y decreases if we move from the point ȳ along the ray
y(θ ) = ȳ − θ v (θ ≥ 0), where, for uT = As B−1 , the components of the vector v are
determined by the rule
82 3 Linear Programming

dual-simplex(c, A, b, I); // I is a dual feasible basic set


{
B−1 := A−1 T T −1
I ; π := c B ; x := B bI ;
−1

while (x 6∈ P(A, b)) {


chose an inequality s violated at x: As x > bs ;
uT := As B−1 ; // uT is the row As written in the basis AI
if (u ≤ 0) {
ys = 1;
yI[ j] = −u j , j = 1, . . . , n;
yi = 0, i ∈ {1, . . . , m} \ (I ∪ {s});
return (false,y); // there are no feasible solutions
} n o
λ := min πuii : i = 1, . . . , n; ui > 0 ;
choose an index t for which the value λ is attained;
I[t] := s;
π := π − λ u; πt := λ ; // compute π T = cT A−1
I
B−1 := B−1 I(t, u)−1 ; // compute A−1 I
x := x + (bs − As x)B−1 et ; // compute x = B−1 bI
}
return (true, x, y = (yI = π, yM\I = 0));
}

Listing 3.2. Dual Simplex Method


−1, i = s,

vi = u j , i = I[ j], j = 1, . . . , n,

0, i ∈ {1, . . . , m} \ (I ∪ {s}).

Let θ̂s denote the maximum value of θ such that y(θ ) is still a feasible solution
to the dual LP, and let ŷ = y(θ̂s ). Notice, that

kŷ − ȳk kŷ − ȳk πi


θ̂s = = = min ,
kvk 1 + kAs B−1 k 1≤i≤n, ui
ui >0

and if
πi
t ∈ arg min ,
1≤i≤n, ui
ui >0

then ŷ = (ŷIˆ = π̂, ŷN\Iˆ), where π̂ = cT A−1



, and the new dual feasible basic set Iˆ is
defined by (3.4). In this case, the dual objective function decreases by the amount

bT ŷ − bT ȳ = −θ̂s (uT bI − bs ) = −θ̂s (As B−1 bI − bs )


kŷ − ȳk(bs − As x̄) (3.11)
= θ̂s (bs − As x̄) = p .
1 + kAs B−1 k2

If all components of u are non-positive, then θ̂s = ∞ and the dual objective func-
tion indefinitely decreases along the ray {y(θ ) : θ ≥ 0}. If (3.1) had a feasible
3.3 Dual Simplex Method 83

solution x, then by (3.3) we would have cT x ≤ bT y(θ ) for any positive θ . But since
cT x is finite and limθ →∞ bT y(θ ) = −∞, we conclude that (3.1) does not have feasible
solutions.
A detailed description of the dual simplex method is presented in Listing 3.2.
The input of the dual-simplex procedure is composed of a triple (c, A, b) describing
an LP, and a dual feasible basic set I. The method terminates for two reasons.
1) The current dual feasible basic set I also becomes feasible. In this case, the output
is a triple
(true, x, y = (yI = π, yM\I = 0)),
where x is an optimal solutions to the primal LP (3.1), and y is an optimal solution
to the dual LP (3.2).
2) If u ≤ 0, then (3.1) does not have feasible solutions. In this case, the procedure
returns a pair (false,y), where y is a ”certificate of infeasibility” (see Sect. 3.4).

3.3.1 Adding New Constraints and Changing Bounds

The dual simplex method has a specific feature that predetermined its wide use in
MIP.
Suppose that we have already solved an instance of (3.1). And now we want to
solve one of the following modifications of the just solved LP:

max{cT x : Ax ≤ b, Hx ≤ β }, (3.12)
T
max{c x : Ax ≤ b̃}. (3.13)

If I is an optimal basic set for (3.1), then I will be a dual feasible basic set for
both new LPs, (3.12) and (3.13), which allows us to use I as an initial basic set in the
dual-simplex procedure. If the changes are small (just a few inequalities in Hx ≤ β ,
or kb − b̃k is small enough), we can expect that the dual simplex method will need
to perform relatively few iterations to build a solution to the modified program.
In a very similar way, the primal simplex method can be used for reoptimization
when the objective function is changed or new columns (variables) are added. This
is due to the fact that with such changes the primal feasibility of a feasible basic
solution is preserved.

3.3.2 How to Find a Dual Feasible Basic Solution?

In practice, very often LPs are appear in the following most general two-sided form:

max{cT x : b1 ≤ Ax ≤ b2 , d 1 ≤ x ≤ d 2 }, (3.14)
84 3 Linear Programming

where c, d 1 , d 2 ∈ Rn , b1 , b2 ∈ Rm , A is a real m × n-matrix, and x is an n-vector of


variables. The dual-simplex procedure can be easily modified for solving the LPs
in with two-sided constraints. To do this, let us number the inequalities in (3.14) as
follows:
i: Ai x ≤ b2i ,
−i : −Ai x ≤ −b1i ,
m+ j : x j ≤ d 2j ,
−m − j : −x j ≤ −d 1j .
Then we define Āi = Ai , b̄i = b2i , Ā−i = −Ai , b̄−i = −b1i , and Ām+ j = e j , b̄m+ j = d 2j ,
Ā−m− j = −e j , b̄−m− j = −d 1j to rewrite the constraints of (3.14) in the form Āx ≤ b̄.
Since two row vectors Āi and Ā−i are linearly dependent, the indices i and −i cannot
simultaneously be in any basic set.
The point x0 , with the coordinates
 2
0 d j , c j ≥ 0,
xj =
d 1j , c j < 0,

is an optimal solution to the following trivial LP:

max{cT x : d 1 ≤ x ≤ d 2 }.

From what was said in Sect. 3.3.1, it follows that x0 is a dual feasible basic solution
for (3.14). The dual feasible basic set corresponding to x0 is I = {m + j : c j ≥
0} ∪ {−m − j : c j < 0}.

3.3.3 The Dual Simplex Method Is a Cutting Plane Algorithm

The dual simplex method for solving LP (3.1) can be considered as a cutting plane
algorithm. Let us illustrate this with an example.
Example 3.2 We need to solve the LP

x1 + 2x2 → max,
x1 + x2 ≤ 4,
−x1 + x2 ≤ 1,
(3.15)
−2x1 − x2 ≤ −2,
0 ≤ x1 ≤ 3,
0 ≤ x2 ≤ 3.

Solution. We begin with the dual feasible basic solution x(0) = (3, 3)T , at which
the objective function attains its maximum over the parallelepiped

P0 = {x ∈ R2 : 0 ≤ x1 ≤ 3, 0 ≤ x2 ≤ 3} (see Fig. 3.3.a).


3.3 Dual Simplex Method 85

x2 x2 x2
x(0) 6 xr (1)
3 r re 3 r e
6 6
3
@ @rex(2)
2 2 @ 2 @
P1 @r
1 r @r
P0 P2
1 1
r r - r r - r r -
0 1 2 3 x1 0 1 2 3 x1 0 1 2 3 x1

a b c

Fig. 3.3 Interpretation of the dual simplex method as a cutting plane algorithm

1. Since the point x(0) does not satisfy the first inequality from (3.15), we cut it
off by the hyperplane x1 + x2 = 4 (Fig. 3.3.b). After this, we perform the iteration of
the dual simplex method:

s = 1, u = (1, 1)T , λ = min{1, 2} = 1, t = 1, I = (1, 5),


     
−1 1 −1 (1) 1 1
B = , x = , π= .
0 1 3 1
Note that x(1) is the maximizer of the objective function over the polytope

P1 = {x ∈ R2 : 0 ≤ x1 ≤ 3, 0 ≤ x2 ≤ 3, x1 + x2 ≤ 4}.

2. Since x(1) violates the second inequality in (3.15), we cut it off using the hyper-
plane −x1 + x2 = 1 (Fig. 3.3.c). Then we perform the iteration of the dual simplex
method:
1
s = 2, u = (−1, 2)T , λ = , t = 2, I = (1, 2),
2
     
−1 1 1 −1 (2) 1 3 1 3
B = , x = , π= .
2 1 1 2 5 2 1
Note that x(2) is the maximizer of the objective function over the polytope

P2 = {x ∈ R2 : 0 ≤ x1 ≤ 3, 0 ≤ x2 ≤ 3, x1 + x2 ≤ 4, −x1 + x2 ≤ 1}.

Since the point x(2) satisfies all constraints in (3.15), then it is an optimal solution
to (3.15). t
u

3.3.4 Separation Rules

The search for a violated inequality As x ≤ bs in the dual simplex method is called
separation. It is clear that a point x can violate many inequalities and, therefore, we
86 3 Linear Programming

need a rule for an unambiguous choice of index s. The following rules (strategies)
are best known:
”first violated”: s = min{i : Ai x > bi , i 6∈ I};
”most violated”: s ∈ arg max(Ai x − bi );
i6∈I
Ai x − bi
s ∈ arg max p
”steepest edge”: ;
i6∈I 1 + kAi B−1 k2
”maximum decrease”: s ∈ arg max θ̂i (Ai x − bi ).
i6∈I

We can say about all these rules almost the same as was said in Sect 3.2.2 about
the corresponding pricing rules. In practice, various variations of the ”most violated”
and ”steepest edge” rules are used. To make the separation based on the steepest
edge rule practical, the formulas were obtained for recalculating the row norms of
the matrices AB−1 .
Lemma 3.2. Let I be a basic set, and let the basic set Iˆ be constructed according to
(3.4) for some t ∈ {1, . . . , n}, B = AI , B̂ = AIˆ and

def T
ηi = 1 + kAi B−1 k2 = 1 + Ai B−1 B−1 ATi , i = 1, . . . , m.

Then

 ˆ
2, i ∈ I,


 1

 ηs , i = I[t],
def
η̂i = 1 + kAi B̂−1 k2 = ut2 (3.16)
(Ai v)2


 2
 ηi − (Ai α)(Ai v) + ηs , i 6∈ Iˆ ∪ I[t],


ut ut2

where v = B−1 et , uT = As B−1 , α = B−1 u.


Proof. Since
 
−1 −1 1 1 1
B̂ = B I − et (u + et ) = B−1 − vuT − vetT ,
T T
ut ut ut
T
ηs = 1 + As B−1 B−1 ATs = 1 + uT u = 1 + kuk2 ,

we have
T
η̂i = 1 + Ai B̂−1 B̂−1 ATi
1 T 1 T T T
  
−1 1 T 1 T −1
= 1 + Ai B − vu − vet B − vu − vet Ai
ut ut ut ut
T 2 2
= 1 + Ai B−1 B−1 ATi − Ai B−1 uvT ATi − Ai B−1 et vT ATi +
ut ut
1 2 1
2
Ai vuT uvT ATi + 2 Ai vuT et vT ATi + 2 Ai vetT et vT ATi
ut ut ut
3.4 Why an LP Does Not Have a Solution? 87

2 2 kuk2 2 1
= ηi − (Ai α)(Ai v) − (Ai v)2 + 2 (Ai v)2 + (Ai v)2 + 2 (Ai v)2
ut ut ut ut ut
2 (Ai v)2
= ηi − (Ai α)(Ai v) + ηs .
ut ut2

To complete the proof, it suffices to note that for i = I[t]

ηi = 2, Ai = et B, Ai v = et BB−1 et = 1, Ai α = et BB−1 u = ut ,

and therefore η̂i = (1/ut2 )ηs . t


u

3.4 Why an LP Does Not Have a Solution?

Having solved an LP on the computer and received a message that the problem
did not have a solution, we would probably want to know the reason for this, in
particular, in order to try to correct possible errors in our formulation.
It is said that an LP has no solution if: 1) its constraint system is infeasible (there
are no feasible solutions) or 2) the objective value is unbounded over the set of
feasible solutions.
To understand the reason for the inconsistency of a system of linear inequalities,
let us consider a simple example:

2x1 + 5x2 + x3 ≤ 5,
x1 + 2x2 ≥ 3,
x2 ≥ 0,
x3 ≥ 0.

Summing together the first inequality, the second multiplied by −2, and the third
and the fourth, multiplied by −1, we obtain the false inequality 0 ≤ −1. Hence, we
can conclude that the system of inequalities in question is incompatible.
Strange as it may seem, but in the general case, a system of linear inequalities is
incompatible if and only if the false inequality 0 ≤ −1 can be derived from it. Let
us give a more precise formulation of this criterion known as Farkas’ lemma.
Lemma 3.3 (Farkas). A system of inequalities Ax ≤ b has no solutions if and only
if there exists a vector y ≥ 0 such that yT A = 0 and yT b < 0.
Proof. We call a vector y that satisfies the conditions of Lemma 3.3 a certificate
of infeasibility for the system of inequalities Ax ≤ b.
The necessity of the assertion of Lemma 3.3 is obvious. Let us prove the suf-
ficiency. First we recall that the dual simplex method decides, that the system of
inequalities of LP (3.1) is infeasible if, performing an iteration with a basic set I, it
turns out that the vector u = As B−1 is non-positive. At this point, we can determine
a certificate of infeasibility, y ∈ Rm , by the rule:
88 3 Linear Programming

ys = 1,
yI[ j] = −u j , j = 1, . . . , n,
yi = 0, i ∈ {1, . . . , m} \ (I ∪ {s}).

Indeed,

yT A = −uT B + As = −As B−1 B + As = 0,


yT b = −uT bI + bs = −As B−1 bI + bs = −As x + bs < 0. 

The objective function of LP (3.1) is not bounded if and only if there exists a
feasible ray
{x(λ ) = x0 + λ v : λ ≥ 0},
along which the objective function strictly increases, i.e., cT v > 0, Ax0 ≤ b and
Av ≤ 0. Such a pair (x0 , v) is called a certificate of unboundedness for LP (3.1).
Note that the simplex procedure from Listing 3.1, after detecting that the objective
function is unbounded, returns a certificate of unboundedness.

3.5 Duality in Linear Programming

Justifying the correctness of the primal and dual simplex methods, we established
that there is a close relationship between the dual LPs (3.1) and (3.2). This relation-
ship is expressed in the following theorems.
Theorem 3.1 (duality). For the pair of dual LPs (3.1) and (3.2) the following
alternatives take place:
1) both LPs have solutions and then zP = zD ;
2) if one of the LPs, (3.1) or (3.2), has a solution, and the other has not, the objec-
tive function of the LP that has a solution is unbounded;
3) both LPs have no solutions.
Proof. We have actually proved assertions 1) and 2) when justifying the correct-
ness of the primal and dual simplex methods. To prove the validity of assertion 3),
it suffices to give an example of a pair of dual LPs for which this assertion holds:
max{−x : 0 x ≤ −1}, min{−y : 0 y = −1, y ≥ 0}. 

Theorem 3.2. Let x̄ and ȳ be feasible solutions of the primal, (3.1), and dual, (3.2),
LPs respectively. Then the following conditions are equivalent:
a) x̄ and ȳ are optimal solutions to the primal and dual LPs;
b) cT x̄ = bT ȳ;
c) (complementary slackness condition)

ȳT (b − Ax̄) = 0 and x̄T (c − AT ȳ) = 0.


3.6 Linear Programs With Two-Sided Constraints 89

Proof. The equivalence of conditions a) and b) follows from the duality theo-
rem. Let us prove the equivalence of conditions b) and c). Taking into account the
inequalities Ax̄ ≤ b and ȳT A ≥ cT , we have

cT x̄ ≤ ȳT Ax̄ ≤ ȳT b = cT x̄.

Therefore
cT x̄ = ȳT Ax̄ = ȳT Ax̄ = ȳT b,
hence
x̄T (c − AT ȳ) = 0 and ȳT (b − Ax̄) = 0. 
Informally, we say that two LPs are dual to each other if all the statements of
Theorems 3.1 and 3.2 hold for them. A formal rule for writing the dual LP for a
given LP is presented in Exercise 3.1.

3.6 Linear Programs With Two-Sided Constraints

As we noted in Sect. 3.3.2, in practice it is most convenient to write LPs in


Form (3.14) with two-sided constraints. It is not at all necessary to reduce such an
LP to the canonical form (3.1). We can easily modify both, primal and dual, simplex
methods for solving LPs with two-sided constraints.
Now a basis is defined by a quadruple (I, b; J, d), where
def
• I ⊆ M = {1, . . . , m} is a row basic set;
• b ∈ Rm , bi is equal to b1i or b2i for i ∈ I;
def
• J ⊆ N = {1, . . . , n} is a column basic set;
• d ∈ Rn , d j is equal to d 1j or d 2j for j ∈ N \ J.
The basis (I, b; J, d) uniquely determines
• basic matrix B = AJI ,    
N\J
• basic solution x̄ = (x̄J , x̄N\J ) = B−1 bI − AI dN\J , dN\J ,
• vectors of shadow prices, ȳT = (ȳI , ȳM\I ) = (cTJ B−1 , 0), and reduced costs, c̄ =
c − ȳT A.
The basic solution x̄ is feasible if

d 1j ≤ x̄ j ≤ d 2j , j ∈ J,
b1i ≤ Ai x̄ ≤ b2i , i ∈ M \ I.

The feasible basic solution x̄ is optimal if the following conditions are satisfied:
• for i ∈ I, if bi = b2i > b1i , then ȳi ≥ 0, and if bi = b1i < b2i , then ȳi ≤ 0;
• for j ∈ N \ J, if d j = d 2j > d 1j , then c̄ j ≥ 0, and if d j = d 1j < d 2j , then c̄ j ≤ 0.
90 3 Linear Programming

For the LP in canonical form, the pivot operation is to substitute a non-basic


row for a basic one. For the LP with two-sided constraints, the pivot operation is
more complicated. Recalling the indexing of two-sided inequalities introduced in
Sect. 3.3.2, which made it possible to transform the LP with two-sided constraints
into an LP in canonical form, it is not difficult to imagine that the following options
are possible:
• substitution of a non-basic row for a basic one if |s| ≤ m and |I[t]| ≤ m;
• substitution of a non-basic column for a basic one if |s| ≥ m and |I[t]| ≥ m;
• addition of a non-basic row and a non-basic column if |s| ≤ m and |I[t]| ≥ m;
• deletion of a basic row and a basic column if |s| ≥ m and |I[t]| ≤ m.
In particular, for the LP in standard form

max{cT x : Ax = b, x ≥ 0},

considered in most LP manuals, the row basic set I contains all m rows, and each
pivot operation always consists in substituting a non-basic column for a basic one.

3.7 Notes

The first who proposed an algorithm for solving a general LP was L.V. Kantorovich
— Nobel Prize winner in Economics in 1975 — (see Exercise 3.11). But still
J. Danzig is deservedly considered to be the father of linear programming for his
invention of the simplex method. A book of J. Danzig [42] remains a classic book
on linear programming. The author’s views on linear programming have changed
significantly after studying an excellent brochure of L.G. Khachijan [81].
Those who wants to learn more about linear programming we refer to one of the
sources [37, 102, 110, 117, 122, 134].
Sects. 3.2.2 and 3.3.4. The rules for recalculating column and row norms were ob-
tained in [57]. The computational efficiency of using the steepest edge rules was
proved in [51].
Sect. 3.8. The statements of Exercises 3.11 and 3.14 were taken respectively from
[122] and [78]. The DEA method described in Exercise 3.13 was proposed in [33].

3.8 Exercises

3.1. The rules for writing the dual LP for a given LP are presented in Table 3.1.
Prove that the statements of Theorems 3.1 and 3.2 are also true for the pair of LPs
from this table.
Hint. Convert the primal LP to the canonical form, write down the dual to the
obtained LP, compare the new pair of dual LPs with the original pair from the table.
3.8 Exercises 91

Table 3.1 Dual LPs


Primal LP Dual LP
zP = max cT x zD = min bT y
Ai x ≤ bi , i ∈ M1 yi ≥ 0 , i ∈ M1
Ai x = bi , i ∈ M2 yi ∈ R , i ∈ M2
Ai x ≥ bi , i ∈ M3 yi ≤ 0 , i ∈ M3
x j ≥ 0 , j ∈ N1 yT A j ≥ c j , j ∈ N1
x j ∈ R , j ∈ N2 yT A j = c j , j ∈ N2
x j ≤ 0 , j ∈ N3 yT A j ≤ c j , j ∈ N3

3.2. Using the rules from Table 3.1, write down the dual LPs for the following LPs:
a) 2x1 − 4x2 + 3x3 → max, b) 5x1 − x2 + 4x3 → max,
x1 + x2 − x3 = 9, x1 + x2 + x3 = 12,
−2x1 + x2 ≤ 5, 3x1 − 2x3 ≥ 1,
x1 − 3x3 ≥ 4, x2 − x3 ≤ 2,
x1 ≥ 0, x1 , x3 ≥ 0;
x3 ≤ 0;

c) Formulation (1.15) of the transportation problem.


3.3. Prove that the set Xr,α from Exercise 1.2 is the projection, onto the space of
x-variables, of the solution set of the system of inequalities
n
rλ + ∑ yi ≤ α,
i=1
xi − yi − λ ≤ 0, i = 1, . . . , n,
yi ≥ 0, i = 1, . . . , n,

with 2n + 1 variables (λ ∈ R, x, y ∈ Rn ) and 2n + 1 constraints.


Hint. First, show that x ∈ Xr,α if and only if
( )
n n
α ≥ max ∑ xi vi : ∑ vi = r, 0 ≤ vi ≤ 1, i = 1, . . . , n .
i=1 i=1

Then write down the dual to the LP in the right-hand side of this inequality.
3.4. Consider the LP
( )
n n
max ∑ c j x j : ∑ a j x j ≤ b, 0 ≤ x j ≤ u j , j = 1, . . . , n (3.17)
j=1 j=1

with c j , a j > 0 for j = 1, . . . , n. Let a permutation π : {1, . . . , n} → {1, . . . , n} be such


that
cπ(1) cπ(2) cπ(n)
≥ ≥ ··· ≥ ,
aπ(1) aπ(2) aπ(n)
92 3 Linear Programming

and let an integer r be chosen to satisfy


r−1 r
∑ aπ( j) uπ( j) ≤ b and ∑ aπ( j) uπ( j) > b.
j=1 j=1

Show that the components of an optimal solution to (3.17) are defined by the rule:

xπ( j) = uπ( j) , j = 1, . . . , r − 1,


b − ∑r−1
j=1 aπ( j) uπ( j)
xπ(r) = ,
aπ(r)

xπ( j) = 0, j = r + 1, . . . , n.

3.5. Can we solve LPs using a computer program that solves the systems of linear
inequalities Ax ≤ b?
3.6. Use Theorem 3.1 to prove the following important result from game theory.
Theorem 3.3 (von Neumann). For every real m × n-matrix A,
m n
max min
x∈Σm 1≤ j≤n
min max ∑ ai j y j ,
∑ ai j xi = y∈Σ n 1≤i≤m
i=1 j=1

m
where Σm denotes an m-dimensional simplex x ∈ Rm

+ : ∑i=1 xi = 1 .

3.7. Arbitrage. We have at our disposal n financial assets, the price of j-th of them
at the beginning of the investment period is p j . At the end of the investment period,
the price of asset j is a random variable v j . Suppose that m scenarios (outcomes)
are possible at the end of the investment period, and then v j is a discrete random
variable. Let vi j be the value of v j when scenario i occurs. From the elements vi j ,
we compose the m × n-matrix V = [vi j ].
A trading strategy is represented by a vector x = (x1 , . . . , xn )T : if x j > 0, then we
buy x j units of asset j, and if x j < 0, then −x j units of asset j are sold. A trading
strategy is called an arbitrage, if it allows us to earn today without any risk of losses
at the end of the investment period:

pT x < 0, (3.18)
V x ≥ 0. (3.19)

The strict inequality (3.18) means that at the beginning of the investment period we
get more than spend. And the validity of all inequalities ∑nj=1 vi j x j ≥ 0 from (3.19)
means that the reverse trading strategy, −x, will not be loss-making at the end of the
period in any of m possible scenarios.
Since prices adjust very quickly on the market, an opportunity to earn from an
arbitrage also disappears very quickly. Therefore, in mathematical financial models,
it is often assumed that arbitrage does not exist. Prove the following statements.
3.8 Exercises 93

a) There exists no arbitrage if and only if the system V T y = p, y ≥ 0 is consistent.


b) If for fixed prices p1 , . . . , p j−1 , p j+1 , . . . , pn both LPs

pmin T
j = min{p j : V y = p, y ≥ 0, p j ≥ 0},
pmax
j = max{p j : V T y = p, y ≥ 0, p j ≥ 0}

have solutions, then there exists no arbitrage if and only if pmin max
j ≤ pj ≤ pj .

3.8. Cycling in a simplex method is a situation where a sequence of basic sets repeats
cyclically. Cycling can occur only when solving degenerate LPs (see Sect. 3.1 for
the definition of degeneracy). In practice cycling occurs often when solving combi-
natorial optimization problems.
Solve the following LP
3 1
4 x1 − 20x2 + 2 x3 − 6x4 → max,
1
1: 4 x1 − 8x2 − x3 + 9x4 ≤ 0,
1 1
2: 2 x1 − 12x2 − 2 x3 + 3x4 ≤ 0,
3: x3 ≤ 1,
4: −x1 ≤ 0,
5: −x2 ≤ 0,
6: −x3 ≤ 0,
7: −x4 ≤ 0

by the primal simplex method starting with the basic set I = (4, 5, 6, 7) and using
the following rules for resolving ambiguities when selecting rows for entering and
leaving the basic set:
a) the entering row is the row I[s] for s ∈ arg min1≤i≤n πi ;
b) the leaving row has the minimum index, t, among the row indices on which the
value of λ is attained (see Listing 3.1).

3.9. In the literature, several rules were proposed for eliminating cycling in the sim-
plex algorithms. Perhaps the most useful in practice is the lexicographic rule. A
nonzero vector x ∈ Rn is said to be lexicographically positive if its first non-zero
component is positive. A vector x is lexicographically greater than a vector y ∈ Rn
if the vector x − y is lexicographically positive. Prove the following theorem, which
conveys the essence of the lexicographic rule as applied to the dual simplex method.
Theorem 3.4. Suppose that when the dual simplex method starts all columns of the
matrix  T
def c
A(I) = A−1
A I
are lexicographically positive. Then, during the execution of the algorithm, the
columns of the matrix A(I) remain lexicographically positive, the vector of resid-
uals
94 3 Linear Programming

−cT x
 
def
r(x) =
b − Ax
strictly lexicographically increases from iteration to iteration, and the method ter-
minates after a finite number of iterations if the row I[t] leaving the basic set is
selected according to the following rule:

A(I) j
 
t ∈ arg lexmin : u j > 0, j = 1, . . . , n .
uj

Here the operator lexmin means the choice of the lexicographically minimal vector.

3.10. Overdetermined systems of linear equations. Given a real m × n-matrix A and


a vector b ∈ Rm . If m > n, then the system Ax = b may have no solutions. In such
cases, as a solution to the system, we seek a vector x, for which the vector of residu-
als, Ax − b, has the minimum norm. In practice, three norms, l2 , l1 and l∞ , are most
often used. Depending on the type of norm, one of the following unconditional op-
timization problem must be solved:
m
kAx − bk2 = ∑ (Ai x − bi )2 → min, (3.20)
i=1
m
kAx − bk1 = ∑ |Ai x − bi | → min, (3.21)
i=1
kAx − bk∞ = max |Ai x − bi | → min . (3.22)
1≤i≤m

Problem (3.20) is solved simply: its solutions are solutions to the system of linear
equations AT Ax = AT b. Two other problems, which are not smooth optimization
problems, are more difficult to solve. Formulate (3.21) and (3.22) as LPs.
3.11. Show that each LP can be reduced to the LP

λ → max,
m
∑ ti j = 1, i = 1, . . . , n,
i=1
m n
∑ ∑ ai jkti j = λ , k = 1 . . . , q,
j=1 i=1

ti j ≥ 0, i = 1, . . . , m; j = 1, . . . , n,

which was investigated by L.V. Kantorovich. This LP admits the following interpre-
tation. To produce a unit of some final product, one unit of each of q intermediate
products is used. There are n machines that can perform m tasks. When machine i
performs task j then ai jk units of intermediate product k are produced per one shift.
If ti j is the fraction of time when machine i performs task j, then λ is the number of
final product units produced.
3.8 Exercises 95

3.12. Hypothesis testing. Let X be a discrete random variable taking values from the
set {1, . . . , n} and having a probability distribution that depends on the value of a
parameter θ ∈ {1, . . . , m}. The probability distributions of X for m possible values
of θ are given by an n × m-matrix P with elements pk j = P(X = k|θ = i), i.e., the
i-th column of P defines the probability distribution for X provided that θ = i.
We consider the problem of estimating the value of parameter θ by observing
(sampling) values of X. In other words, a value of X is generated for one of the
m possible distributions (values of θ ), and we need to determine which distribu-
tion (value of θ ) was used in this case. The values of θ are called hypotheses, and
guessing, which of m hypotheses is true, is called hypothesis testing.
A probabilistic classifier for θ is a discrete random variable θ̂ , which depends
on the observed value of X and takes values from {1, . . . , m}. Such a classifier can
be represented by an m × n-matrix T with the elements tik = P(θ̂ = i|X = k). If we
observe the value X = k, then the classifier with probability tik chooses the value θ̂ =
i as an estimate of the parameter θ . The quality of the classifier can be determined
by the m × m-matrix D = T P with the elements di j = P(θ̂ = i|θ = j), i.e., di j is the
probability of predicting θ̂ = i when θ = j.
It is necessary to determine a probabilistic classifier for which the maximum of
the probabilities of classification errors,

1 − dii = ∑ di j = P(θ̂ 6= i|θ = i), i = 1, . . . , m,


j6=i

is minimum. Formulate this problem as an LP.


3.13. Data Envelopment Analysis (DEA) is used to compare the performance of a
number of similar service units (bank branches, restaurants, educational institutions,
health care and many others). DEA does not require a cost valuation of the services
provided. Suppose that there are n + 1 departments that are numbered from 0 to
n. For some test period, department i (i = 0, 1, . . . , n) used ri j units of resource j
( j = 1, . . . , m), and rendered sik services of type k (k = 1, . . . , l). The efficiency of
department i is estimated by the ratio

def ∑lk=1 sik uk


Ei (u, v) = ,
∑mj=1 ri j v j

where uk and v j are weights that need to be determined.


To compare department 0 with the other departments, the following problem of
fractional linear programming is solved:

γ0 = max{E0 (u, v) : Ei (u, v) ≤ 1 for i = 1, . . . , n, u ∈ Rl+ , v ∈ Rm


+ }. (3.23)

If γ0 < 1, then department 0 does not work efficiently, and it should adopt the expe-
rience of other departments i for which Ei (u∗ , v∗ ) = 1, where (u∗ , v∗ ) is an optimal
solution to (3.23).
Formulate (3.23) as an LP.
96 3 Linear Programming

3.14. In a Markov decision process (with a finite number of states and discrete time),
if at a given period of time the system is in state i from a finite set of states S, we
choose an action a from a finite set of actions A(i), which provides profit ria ; then the
system passes to a new state j ∈ S with a given probability pia j that depends on the
action a undertaken in state i. Our goal is to find a decision strategy that maximizes
the expected average profit for one period during an infinite time horizon. It is known
that an optimal strategy can be sought among stationary strategies. A stationary
strategy (or policy) π, every time the system is in state i, prescribes to undertake
the same action π(i) ∈ A(i). We also note that there is a stationary strategy, which
is optimal for all initial states of the system, and therefore, such a strategy is called
uniform.
Let (x∗ , y∗ ) be an optimal solution to the following LP

∑ ∑ ria xia → max,


i∈S a∈A(i)

∑ ∑ (δi j − pia j )xia = 0, j ∈ S,


i∈S a∈A(i)
(3.24)
1
∑ x ja + ∑ ∑ (δi j − pia j )yia = |S| , j ∈ S,
a∈A( j) i∈S a∈A(i)

xia , yia ≥ 0, a ∈ A(i), i ∈ S.

Here δi j is the Kronecker delta (δi j = 1 if i = j, and δi j = 0 if i 6= j). Let us define


∗ > 0}, if ∗ > 0,

∗ def {a : xia ∑a∈A(i) xia
A (i) =
{a : y∗ia > 0}, if ∑a∈A(i) xia ∗ = 0.

Show that any stationary strategy π ∗ such that π ∗ (i) ∈ A∗ (i) is optimum.
3.15. A firm produces some products using a number of identical machine. At the
beginning of each week, each machine is in one of the following states: excellent,
good, average, or bad. Working a week, the machine generates the following income
depending on its state: $100 in excellent state, $80 in good state, $50 in average
state, and $10 in bad state. After inspecting each of the machines at the end of the
week, the firm can decide to replace it with a new one in excellent state. A new
machine costs $200. The state of any machine deteriorates over time as shown in
the table below.
State Excellent Good Average Bad
Excellent 0.7 0.3 0.0 0.0
Good 0.0 0.7 0.3 0.0
Average 0.0 0.0 0.6 0.4
Bad 0.0 0.0 0.0 1.0
It is necessary to determine a strategy of replacing machines that generates the
maximum per week profit in the long run. Write down LP (3.24) for this example,
and solve that LP using your favorite LP solver.
Chapter 4
Cutting Planes

As noted in Sect. 1.5, we can strengthen a MIP formulation by adding to it new in-
equalities valid for all feasible solutions, but invalid for the relaxation polyhedron.
Such inequalities are called cuts. We know that cuts can be added when formulat-
ing (reformulating) the problem. New inequalities can also be added in the process
of solving the problem. The cutting plane method for solving MIPs can be viewed
as an extension of the dual simplex method in which the separation procedure (for
searching violated inequalities) is not limited to verifying the constrains of the cur-
rent formulation, but it can also generate new cuts.
In this chapter, we study cuts that are used for solving general IPs and MIPs.
Here we also demonstrate the use of these cuts in the cutting plane algorithms. Any
cuts are useful in practice only if they can be generated (computed) by very fast
separation procedures. In the last section of this chapter we discuss the relationship
between the optimization and separation problems.

4.1 Cutting Plane Algorithms

Let us demonstrate how a cutting plane algorithm works on the following simple
example:
x1 + 2x2 → max,
3x1 + 2x2 ≤ 9,
(4.1)
x2 ≤ 2,
x1 , x2 ∈ Z+ .
First, we solve the relaxation LP for IP (4.1), which is obtained from (4.1) by
allowing integer variables to take also real values. The feasible polytope, P0 , of this
relaxation LP and its optimal solution, x(0) = ( 53 , 2)T , are depicted in Fig. 4.1.a.
Since x(0) is not an integer point, it is not a solution to (4.1).
A cutting plane or simply cut is an inequality that ”cuts off” the point x(0) from
the set

97
98 4 Cutting Planes

x2 x2
x(0) x(1)
6 6
2 r r
d 2 r rd r
J @@ J
J @J
1 P0 J 1 P1 @J
J @J
r Jr - r Jr -
@
0 1 2 3 x1 0 1 2 3 x1
a b

Fig. 4.1 Geometrical interpretation of cutting planes

X = {x ∈ Z2+ : 3x1 + 2x2 ≤ 9, x2 ≤ 2}


of feasible solutions to (4.1). There are several ways to build (generate) cuts. Some
of these methods will be considered later in this and the next chapters.
Here we cut off the point x(0) , starting from a very simple observation that both
inequalities, 3x1 + 2x2 ≤ 9 and x2 ≤ 2, cannot simultaneously be satisfied as equal-
ities at the points from X. Therefore, the inequality

3x1 + 2x2 + x2 ≤ 9 + 2 − 1, or 3x1 + 3x2 ≤ 10

holds for X, but not for x(0) (3· 53 +3·2 = 11 > 10). We can strengthen this inequality
if we first divide it by 3, and then round down the right-hand side:
 
10
x1 + x2 ≤ = 3.
3

So, we have found the cut x1 + x2 ≤ 3, and now we add it to (4.1) as an additional
constraint. As a result, a piece of the relaxation polytope P0 is cut off. Let us note
that this cut off area (the shadow area in Fig. 4.1.b) does not contain integer points,
and, therefore, no feasible point from X has been cut off, and X is contained in the
feasible polytope P1 of the new relaxation LP (with added cut). An optimal solution
to this LP is the point x(1) = (1, 2)T , which is integer and, therefore, x(1) is an optimal
solution to (4.1).

4.2 Chvátal-Gomory Cuts

Let A be a real m × n-matrix, b ∈ Rn , and let us suppose that the polyhedron P(A, b)
belongs to Rn+ . Surprisingly, but we can build all inequalities that define the convex
hull of the set P(A, b) ∩ Zn using a procedure based on the following very simple
observation.
Rounding principle: the inequality x ≤ bbc is valid for the set {x ∈ Z : x ≤ b}.
4.2 Chvátal-Gomory Cuts 99

Proceeding from this principle, we present the following very general procedure for
constructing cuts for the set X = P(A, b) ∩ Zn .
Chvátal-Gomory procedure:
1) choose u ∈ Rm +;
2) since u ≥ 0, the inequality uT Ax ≤ uT b is valid for X;
3) since x ≥ 0, the inequality ∑nj=1 buT A j cx ≤ uT b is valid for X;
4) since x is integer, the inequality
n
∑ buT A j cx ≤ buT bc (4.2)
j=1

is valid for X.

Theorem 4.1. If P(A, b) ⊆ Rn+ , then each valid for P(A, b) ∩ Zn inequality can be
obtained by applying the Chvátal-Gomory procedure a finite number of times.

Theorem 4.1 motivates the following definition. For a given set X = P(A, b) ∩ Zn
the Chvátal rank of a valid for X inequality α T x ≤ β is the minimum number of
applications of the Chvátal-Gomory procedure necessary to obtain an inequality
that is not weaker than α T x ≤ β . The Chvátal rank of the set X is the maximum
Chvátal rank of an inequality in the description of its convex hull.
For example, consider an integer set

X = {x ∈ Z4+ : 2x1 + x2 + x3 + x4 ≤ 4, x2 + x3 ≤ 1 x3 + x4 ≤ 1, x2 + x4 ≤ 1}.

The inequality x2 + x3 + x4 ≤ 1 is a Chvátal-Gomory cut (of Chvátal rank 1) with


the vector of multipliers 12 (0, 1, 1, 1)T . Adding this cut to the system of inequalities
describing X as the fifth inequality and using the multiplier vector 12 (1, 0, 0, 0, 1), we
obtain the cut x1 + x2 + x3 + x4 ≤ 2, which Chatal rank is 2.
It should also be noted that many classes of cuts for well-known combinatorial
sets are Chvátal-Gomory cuts of Chvátal rank 1.
Example 4.1 Given a graph G = (V, E), each edge e ∈ E is assigned a costs ce . A
subset of edges M ⊆ E is called a matching if no two edges from M have a com-
mon vertex. The maximum weighted matching problem is to find a matching M of
def
maximum cost c(M) = ∑e∈M ce (see also Sect. 2.1). Introducing binary variables
xe for e ∈ E with xe = 1 only if e ∈ M, we formulate this combinatorial optimization
problem as the following IP:

∑ ce xe → max, (4.3a)
e∈E

∑ xe ≤ 1, v ∈ V, (4.3b)
e∈E(v,V )

xe ∈ {0, 1}, e ∈ E, (4.3c)


100 4 Cutting Planes

where E(S, T ) denotes the set of edges with one end in S, and the other in T . The
convex hull of vectors x satisfying (4.3b) and (4.3c) is called a matching polytope.
We need to show that, for each subset S ⊆ V of odd cardinality, the inequality

|S| − 1
∑ xe ≤ (4.4)
e∈E(S,S)
2

is valid for the matching polytope.


Solution. Let S ⊆ V and |S| be odd. Summing up the inequalities

∑ xe ≤ 1, v ∈ S,
e∈E(v,V )

we have
∑ xe = 2 ∑ xe + ∑ xe ≤ |S|.
e∈E(S,V ) e∈E(S,S) e∈E(S,V \S)

Dividing the result by 2, we obtain the inequality

1 |S|
∑ xe + ∑ xe ≤ .
e∈E(S,S)
2 e∈E(S,V \S) 2

Rounding down first the right and then the left-hand sides of this inequality, we
derive (4.4). t
u

From a practical point of view, a class of cuts is useful only if one can efficiently
solve the separation problem for this class. With respect to Ineqs. (4.2), the separa-
tion problem is formulated as follows:

Given a point x̃ ∈ Rn+ ; find u ∈ Rn+ such that the corresponding inequality in
(4.2) is violated at x̃, or prove that x̃ satisfies all inequalities from (4.2).
It is known that this separation problem is NP-hard. But, on the other hand, there are
a number of important special cases when it is solved efficiently. We consider only
one of such special cases when x̃ is a vertex of the relaxation polyhedron P(A, b) ⊆
Rn+ .

Theorem 4.2. Let x̃ = A−1 I bI be a feasible basic solution to the system of inequalities
Ax ≤ b (a vertex of P(A, b)), where I ⊆ {1, . . . , m} is a feasible basic set. Suppose
that x̃i 6∈ Z, and denote by vT = eTi A−1
I the i-th row of the inverse basic matrix. Then
the inequality (Gomory cut)
n j k
(v − bvc)T AIj x j ≤ (v − bvc)T bI
 
∑ (4.5)
j=1

is valid for P(A, b) ∩ Zn but is violated at x̃.


4.2 Chvátal-Gomory Cuts 101

Proof. Since (4.5) is a Chvátal-Gomory cut, it is valid for X. Let us prove that
this inequality is violated at x̃:
n j k
∑ (v − bvc)T AIj x̃ j − b(v − bvc)bI c =
j=1
n
x̃i − ∑ bvcT AIj x̃ j − x̃i − bvcT bI =
 
j=1

x̃i − bvcT bI − x̃i − bvcT bI > 0.


 
t
u

Example 4.2 We need to solve the IP

x1 + 2x2 → max,
1: 3x1 + 2x2 ≤ 6,
2: −3x1 + 2x2 ≤ 0,
3: 0 ≤ x1 ≤ 2,
4: 0 ≤ x2 ≤ 2,
x1 , x2 ∈ Z

by the cutting plane algorithm that generates only Gomory cuts.


Solution. First, we solve the relaxation LP by the dual simplex method.
0. We start with the dual feasible basic solution x(0) = (2, 2)T that corresponds to
the basic set I = (3, 4). The inverse basic matrix B−1 and the potential vector π are
the following:    
10 1
B−1 = B = , π= .
01 2
1. The first inequality 3x1 + 2x2 ≤ 6 (s = 1) is violated at x(0) by β = 6 − 3 · 2 −
2 · 2 = −4 . Therefore, we calculate

uT = (3, 2) · B−1 = (3, 2),


 
1 2 1
λ = min , = ⇒ t =1 ⇒ I = (1, 4)
3 2 3

x2 x2 x2
6 6 6
2 x(2) 2 2
se @ (3) (4)
1
J 1 s sex
@ 1 s sex

J
@J
@
s
Js - s

@Js - s
@s -
0 1 2 x1 0 1 2 x1 0 1 2 x1
a b c

Fig. 4.2 Illustration for example 4.2


102 4 Cutting Planes

and then we perform the pivot operation:


1
− 23

B−1 := B−1 · I(1, u)−1 = 3 ,
0 1
! !
1 1
3 3
π= = ,
2 − 31 ·2 4
3
  1 2
(1) (0) −1 2
x =x + β B e1 = −4 3 = 3 .
2 0 2

2. Now the second inequality −3x1 +2x2 ≤ 0 (s = 2) is violated at x(1) by β = −2.


Therefore, we calculate
1 2

uT = (−3, 2) 3 3 = (−1, 4),
0 1
1
λ = min{−, (4/3)/4} = ⇒ t = 2 ⇒ I = (1, 2),
3
and then we perform the pivot operation:
 "1 1# 1
− 23 −
 
−1 −1 1 0 −1
B := B · I(2, u) = · 1 1 = 61 16 , 3
0 1 4 4 4 4
! !
1 1 2
− · (−1)
π = 3 31 = 31 ,
3 3
!
− 61
2  
(2) (1) −1 3 1
x =x + β B e2 = −2 1
= 3 .
2 4 2

3. The point x(2) satisfies all constraints of the relaxation LP. We have only one
fractional component x2 , which will be used to build the Gomory cut. For
   
T 1 1 T 1 1
v = , , (v − bvc) = , ,
4 4 4 4
 
3 2
taking into account that B = , we write down the cut
−3 2
           
1 1 3 1 1 2 1 1 6
, x1 + , x2 ≤ , .
4 4 −3 4 4 2 4 4 0

Having carried out the calculations, we obtain the inequality x2 ≤ 1 (Fig. 4.2.a),
which is added to our program as the 5-th constraint. This inequality is violated at
x(2) by β = 1 − 23 = − 12 .
Having the violated inequality (s = 5), we perform the next iteration of the dual
simplex method. First we calculate
4.2 Chvátal-Gomory Cuts 103
" #
1
− 16
 
T 6 1 1
u = (0, 1) 1 1
= , ,
4 4
4 4
 
4 4
λ = min 3, = ⇒ t =2 ⇒ I = (1, 5),
3 3

and then we perform the pivot operation:


" # 
1
− 16
 1 2
−1 −1 −1 6 1 0 −
B := B · I(2, u) = · = 3 3 ,
1 1 −1 4 0 1
4 4
! !
2
3 − 43 · 14 1
3
π= 4
= 4
,
3 3
1 − 23
    4
1
x(3) = x(2) + β B−1 e2 = 3 − = 3 .
2 2 1 1

4. The point x(3) satisfies all constraints of the relaxation LP (including one cut)
but is not integer. We have only one fractional component x1 , which we will use to
build the Gomory cut. For
   
T 1 2 T 1 1
v = ,− , (v − bvc) = , ,
3 3 3 3
 
32
taking into account that B = , we write down the cut
01
           
1 1 3 1 1 2 1 1 6
, x1 + , x2 ≤ , .
3 3 0 3 3 1 3 3 1

After the calculations, we obtain the inequality x1 + x2 ≤ 2 (Fig. 4.2.b), which will
be the 6-th in our IP. This new inequality is violated at x(3) by β = 2 − 34 − 1 = − 31 .
Having the violated inequality (s = 6), we perform the next iteration of the dual
simplex method. First we calculate
1 2  
− 1 1
uT = (1, 1) 3 3 = , ,
0 1 3 3
λ = min{1, 4} = 1 ⇒ t =1 ⇒ I = (6, 5),

and then we perform the pivot operation:


1 2    
− 3 −1 1 −1
B−1 := B−1 · I(1, u)−1 = 3 3 · = ,
0 1 0 1 0 1
   
1 1
π= 4 1 = ,
3 − 3 1
104 4 Cutting Planes
4    
1 1 1
x(4) = x(3) + β B−1 e1 = 3 − = .
1 3 0 1

The point x(4) is integer and it satisfies all constraints (including cuts) of our IP
(Fig. 4.2.c). Therefore, x(4) is an optimal solution of our example IP. t
u

4.3 Mixed Integer Rounding

It is clear that the rounding principle is not applicable when deriving cuts for mixed
integer sets, i.e., when there are variables of both types, integer and continuous.
Another simple observation will be useful here.
Disjunctive principle: if an inequality is valid for both sets X1 and X2 , then it
is also valid for their union X1 ∪ X2 .

Lemma 4.1. Let X = {(x, y) ∈ Z × R+ : x − y ≤ b}, f = b − bbc. The inequality


y
x− ≤ bbc (4.6)
1− f
is valid for X.

Proof. Let

X1 = X ∩ {(x, y) : x ≤ bbc} and X2 = X ∩ {(x, y) : x ≥ bbc + 1}.

For (x, y) ∈ X1 , summing up the inequalities (1 − f ) · (x − bbc) ≤ 0 and 0 ≤ y, we


obtain (1 − f ) · (x − bbc) ≤ y.
For (x, y) ∈ X2 , summing up the inequalities − f · (x − bbc) ≤ − f and x − bbc ≤
f + y, we again obtain (1 − f ) · (x − bbc) ≤ y.
Now, by virtue of the disjunctive principle, (4.6) is valid for the union X = X1 ∪
X2 . t
u

The statement of Lemma 4.1 is illustrated in Fig. 4.3, where the set X is repre-
sented by the straight horizontal lines. Inequality (4.6) cuts off from the relaxation
polyhedron, {(x, y) ∈ R × R+ : x − y ≤ b}, the shaded triangle.

Theorem 4.3. Let Y = {(x, y) ∈ Zn+ × R2+ : ∑nj=1 a j x j + y1 − y2 ≤ b}, f = b − bbc


and f j = a j − ba j c for j = 1, . . . , n. The inequality
n  
max{0, f j − f } 1
∑ j ba c + xj − y2 ≤ bbc (4.7)
j=1 1 − f 1 − f

is valid for Y .
4.3 Mixed Integer Rounding 105

y
6

x−y = b

x − 1−y f = bbc



 -
bbc b x

Fig. 4.3 Mixed integer rounding

Proof. Let us weaken the inequality ∑nj=1 a j x j + y1 − y2 ≤ b to

∑ ba j cx j + ∑ a j x j − y2 ≤ b,
j∈N1 j∈N2

where N1 = { j : 1 ≤ j ≤ n, f j ≤ f } and N2 = {1, . . . , n} \ N1 . Applying Lemma 4.1


to the inequality w − z ≤ b with

w= ∑ ba j cx j + ∑ da j ex j ∈ Z and z = y2 + ∑ (1 − f j )x j ≥ 0,
j∈N1 j∈N2 j∈N2

we get the inequality


z
w− ≤ bbc.
1− f
Substituting the expressions for w and z into this inequality, we obtain (4.7). t
u

Inequality (4.7) is also known as the mixed integer rounding of the inequality
∑nj=1 a j x j + y1 − y2 ≤ b.
Note that, for an inequality ∑nj=1 a j x j ≤ b with non-negative integer variables, its
mixed integer rounding
n  
max{0, f j − f }
∑ j ba c + x j ≤ bbc
j=1 1− f

is stronger than the integer Chvatal-Gomory cut,


n
∑ ba j cx j ≤ bbc,
j=1

if at least one of the numbers f j − f is positive.


Example 4.3 We need to separate the point (x̃, ỹ) = 32 , 27 from the mixed integer


set
X = {(x, y) ∈ Z+ × R+ : x + y ≤ 5, y − x ≤ 2}.
106 4 Cutting Planes

Solution. In practice, mixed integer rounding is used in conjunction with the


technique of mixing constraints. Introducing two non-negative slack variables, s1
and s2 , we rewrite two inequalities defining X as the equations

x + y + s1 = 5,
−x + y + s2 = 2.

In the extended space, the point (x̃, ỹ) corresponds to the point
 
3 7
(x̃, ỹ, s̃1 , s̃2 ) = , , 0, 0 .
2 2

Of three non-integer variables, y, s1 and s2 , we choose the variable y whose current


value is farthest from its nearest bound (in case of y, this is its lower bound). Then
we mix our two equations to exclude y, i.e., we subtract the second equation from
the first one to get the equation

2x + s1 − s2 = 3,

which is divided by 2:
1 1 3
x + s1 − s2 = .
2 2 2
Let us note that in general, the equation is divided by the coefficient of an integer
variable taking a fractional value. If there are several such variables, then as a divisor
we can try each of their coefficients.
Applying Theorem 4.3 to the inequality
1 1 3
x + s1 − s2 ≤ ,
2 2 2
we obtain its mixed integer rounding

y 6 x̃b
@
3 @ cut
@
@
2 @
@
1
@
@
@
@r -
1 2 3 4 5 x

Fig. 4.4 Illustration for Example 4.3


4.4 Fractional Gomory Cuts 107
1
2 s2
x− ≤1 or x − s2 ≤ 1.
1 − 12

Substituting s2 = 2+x −y into the last inequality, we obtain the cut y ≤ 3. In Fig. 4.4
the set X is represented by five bold lines and the point (5, 0); the region that is cut
off (by y ≤ 3) from the relaxation polyhedron is shaded. t
u

4.4 Fractional Gomory Cuts

In this section we will learn how to construct fractional Gomory cuts for a mixed
integer set P(A, b; S) ⊆ Rn+ . Let x̃ = A−1
I bI be a feasible basic solution, where I is a
feasible basic set of rows for the system of linear inequalities Ax ≤ b. Introducing
a vector of slack variables, s ∈ Rn+ , we rewrite the subsystem of basic inequalities,
AI x ≤ bI in the equality form AI x + s = bI , or

x + A−1 −1
I s = AI bI = x̃. (4.8)

We will denote the elements of the inverse basic matrix A−1 I by āi j .
Let us pick up an integer variable xi which current value x̃i is not integer. Let N1 (i)
and N2 (i) be, respectively, the index sets of the integer and non-integer variables s j
in the i-th equation from (4.8). We consider a slack variable s j to be integer if all
variables and coefficients in both parts of the inequality AI[ j] x ≤ bI[ j] are integers.
Now, let us rewrite the i-th equation from (4.8) in the form:

xi + ∑ āi j s j + ∑ āi j s j = x̃i . (4.9)


j∈N1 (i) j∈N2 (i)

Theorem 4.4. If xi is an integer variable and its value x̃i is not integer, f0 = x̃i − bx̃i c
and f j = āi j − bāi j c for j ∈ N1 (i) ∪ N2 (i), then the following fractional Gomory cut

f0 (1 − f j )
∑ f js j + ∑ sj +
j∈N1 (i): f j ≤ f0 j∈N1 (i): f j > f0
1 − f0
(4.10)
f0
∑ āi j s j − ∑ āi j s j ≥ f0
j∈N2 (i): āi j >0 j∈N2 (i): āi j <0
1 − f0

is valid for all (xi , s) ∈ Z+ × Rn+ such that (4.9) is satisfied and s j ∈ Z for all j ∈
N1 (i).
Proof. Let us write down the mixed integer rounding of (4.9):
108 4 Cutting Planes
 
f j − f0
xi + ∑ bāi j cs j + ∑ bāi j c +
j∈N1 (i): f j ≤ f0 j∈N1 (i): f j > f0
1 − f0
(4.11)
1
+ ∑ āi j s j ≤ bx̃i c.
j∈N2 (i): āi j <0
1 − f0

Using (4.9), we express the variable xi in terms of the variables s j , and then substi-
tute the resulting expression into (4.11); as a result, we obtain (4.10). t
u

Gomory fractional cuts (4.10) are written in terms of the slack variables s j . To
return to the original variables xi , we need to substitute s = bI − AI x into (4.10) to
get a cut for the set P(A, b; S).
Example 4.4 We need to solve the IP

x1 + x2 → max
1: 2x1 + 2x2 + x3 ≤ 9,
2: x2 − x3 ≤ 0,
3: 0 ≤ x1 ≤ 4, (4.12)
4: 0 ≤ x2 ≤ 3,
5: 0 ≤ x3 ≤ 5,
x1 , x2 ∈ Z

by the cutting plane algorithm that generates only fractional Gomory cuts.
Solution. This time, we will not practice the dual simplex method to solve the
relaxation LP for (4.12). We just start with its optimal basic solution:
    1
1 0 0 4 3
1 1 2 1
I = (1, 2, 3), B−1 =  3 3 − 3  , x(1) =  3  , π =  13  .
 
1 2 2 1
3 −3 −3
1
3 3

Let us choose the variable x2 , which is integer and which current value is not integer,
to start computing the first cut:
1 1 2 1
x2 + s1 + s2 − s3 = .
3 3 3 3
Here s3 is the only integer slack variable. Next we compute the coefficients
1 2 1
f0 = , f3 = − − (−1) = ,
3 3 3
and then we write down the cut
1 1 1 1
s1 + s2 + s3 ≥ ,
3 3 3 3
or
4.5 Disjunctive Inequalities 109

s1 + s2 + s3 ≥ 1. (4.13)
Substituting the expressions

s1 = 9 − 2x1 − 2x2 − x3 , s2 = 0 − x2 + x3 , s3 = 4 − x1 ,

into (4.13), after simplification, we obtain the cut in the original variables x1 , x2 , x3 :

6: x1 + x2 ≤ 4,

which is violated at x(1) by β = 4 − 4 − 31 = − 13 .


Next we apply the dual simplex method to carry out reoptimization. Compute
 
0 0 1  
1 1 2 1 1 1
uT = (1, 1, 0)  3 3 − 3  = , , ,
1 2 2 3 3 3
3 −3 −3
λ = min{1, 1, 1} = 1 ⇒ t =1 ⇒ I = (6, 2, 3)

and perform the pivot operation:


  
0 0 1
  
3 −1 −1 0 0 1
1 1 2
B−1 := B−1 · I(1, u)−1 =  3 3 − 3  · 0 1 0  = 1 0 −1 ,
 
1 2 2 0 0 1 1 −1 −1
3 −3 −3
 
4
   
0 4
(2) (1) −1 1 1    
x = x + β B e1 =  3  − 1 = 0 ,
1 3
3
1 0
   
1 1
1 − 1  
π = 3 3 = 0 .
1 1 0
3−3

The point x(2) is integer and satisfies all constraints in (4.12). Therefore, x(2) is
an optimal solution to (4.12). t
u

4.5 Disjunctive Inequalities

The union X1 ∪ X2 of two sets X1 and X2 is also called the disjunction of X1 and X2 ,
since the condition x ∈ X1 ∪ X2 is also written as the disjunction x ∈ X1 or x ∈ X2 .
The disjunctive principle was introduced in Sect. 4.3 where we studied the mixed
integer rounding cuts. Let us recall the essence of this simple principle: if an in-
equality holds for both sets, X1 and X2 , then it is also valid for X1 ∪ X2 . In this
110 4 Cutting Planes

section proceeding from this disjunctive principle, we will develop another version
of the disjunctive cuts.
From the algorithmic point of view, the disjunction of polyhedra is of special
interest.
Theorem 4.5. Let Pi = {x ∈ Rn+ : Ai x ≤ bi } for i = 1, 2. If both polyhedra, P1 and
P2 , are non-empty, then an inequality α T x ≤ β is valid for the union P1 ∪ P2 if and
only if there exists a pair of vectors y1 , y2 ≥ 0 such that (Ai )T yi ≥ α and (yi )T bi ≤ β
for i = 1, 2.
Proof. The sufficiency is verified simply. Suppose that there exists a pair of vec-
tors y1 , y2 ≥ 0 such that

(Ai )T yi ≥ α and (yi )T bi ≤ β for i = 1, 2.

If x ∈ P1 ∪ P2 , then Ai x ≤ bi for some i ∈ {1, 2}. Therefore

α T x ≤ (yi )T Ai x ≤ (yi )T b ≤ β .

Now we prove the necessity. For i = 1, 2, let an inequality α T x ≤ β be valid for


Pi and let xi and yi be optimal solutions for the pair of dual LPs

max{α T x : Ai x ≤ b, x ≥ 0},
min{bT y : (Ai )T y ≥ α, y ≥ 0}.

By the duality theorem (Theorem 3.1), we have bT yi = α T xi ≤ β . t


u

From Theorem 4.5, it directly follows the next method of solving the separation
problem for the disjunction of two polyhedra.
Separation procedure for conv(P1 ∪ P2 ):
Check whether the point x̃ ∈ Rn belongs to the set P1 ∪ P2 , and if not, find the
separating hyperplane by solving the next LP

x̃T α − β → max,
(Ai )T yi ≥ α, i = 1, 2,
i T i (4.14)
(b ) y ≤ β , i = 1, 2,
i
y ∈ Rn+ , i = 1, 2,
−1 ≤ β ≤ 1.

If α ∗ and β ∗ are components of an optimal solution to (4.14) and (α ∗ )T x̃ > β ,


then the inequality (α ∗ )T x ≤ β separates x̃ from conv(P1 ∪ P2 ). Otherwise,
x̃ ∈ conv(P1 ∪ P2 ).
Solving (4.14) we seek an inequality α T x ≤ β that holds for conv(P1 ∪ P2 ) and
that is most violated at the point x̃ under the normalization |β | ≤ 1. Without this
4.5 Disjunctive Inequalities 111

normalization condition, due to the homogeneity of the remaining constraints, the


objective function in (4.14) would be unbounded.
Example 4.5 We need to solve the IP

x1 + 2x2 → max,
1: −2x1 + 3x2 ≤ 4,
2: x1 + x2 ≤ 5,
(4.15)
3: x1 ≥ 0,
4: x2 ≥ 0,
x1 , x2 ∈ Z

by the cutting plane algorithm that generates only disjunctive cuts.


Solution. Let us start immediately with an optimal basic solution to the relaxation
LP for (4.15):
" # ! !
−1 − 51 35 (1)
11
5
1
5
I = (1, 2), B = 1 2 , x = 14 , π = 7 .
5 5 5 5

The solution x(1) is non-integer, and we will try to cut it off. The polytope P(1) of
feasible solutions of the relaxation LP is shown in Fig. 4.5.a. Since there are no
integer points in the strip 2 < x1 < 3, we can remove from P(1) all points from
this strip (in Fig. 4.5.a the deleted area is shaded). Now the feasible domain of our
(1) (1)
problem is contained in the union of two polyhedra, P1 and P2 , that are given by
the following systems of inequalities:

−2x1 + 3x2 ≤ 4,
1: 1 : −2x1 + 3x2 ≤ 4,
x1 + x2 ≤ 5,
2: 2: x1 + x2 ≤ 5,
(1) (1)
P1 : 3:
x1 ≤ 2, P2 : 3 : −x1 ≤ −3,
x1 ≥ 0, x1 ≥ 0,
x2 ≥ 0, x2 ≥ 0.
 
(1) (1)
To separate x(1) from conv P1 ∪ P2 , we need to solve the following LP:

x2 x2 x2
x(1) x(2)
rc
6 6 6
3 3 cr 3
x(3)
 Qr 2 r cr
 Q
2  @ 2
r (1) @ r
 (2) @
r
 @
1 P1 (1)
@ 1 P1 @ 1 @
r P2 @r - r @r - r @r -
0 1 2 3 4 5 x1 0 1 2 3 4 5 x1 0 1 2 3 4 5 x1
a b c

Fig. 4.5 Illustration for Example 4.5


112 4 Cutting Planes
11 14
5 α1 + 5 α2 − β → max,
−2y11 + y12 + y13 − α1 ≥ 0,
3y11 + y12 − α2 ≥ 0,
4y11 + 5y12 + 2y13 − β ≤ 0,
− 2y21 + y22 − y23 − α1 ≥ 0,
3y21 + y22 − α2 ≥ 0,
4y21 + 5y22 − 3y23 − β ≤ 0,
−1 ≤ β ≤ 1,
y11 , y12 , y13 , y21 , y22 , y23 ≥ 0.

Using any LP solver available to you, you can check that this LP has an optimal
solution for which the variables α1 , α2 , and β take the following values:
1 1
α1∗ = , α2∗ = , β ∗ = 1.
6 4
Thus, we have found the cut
1 1
x1 + x2 ≤ 1,
6 4
or

5: 2x1 + 3x2 ≤ 12.

We add this cut to (4.15) as the 5-th constraint. The polytope P(2) of the feasible
solutions of the new relaxation LP is shown in Fig. 4.5.b.
Now let us proceed to the reoptimization. At the point x(1) , the added inequality
(s = 5) is violated by 12 − 2 · 11 14 4
5 − 3 · 5 = − 5 . We calculate
" # 
− 15 35

T 1 12
u = (2, 3) · 1 2 = , ,
5 5
5 5
7
λ= ⇒ t = 2, I = (1, 5),
12
and then perform the pivot operation:
" #   " 1 #
−1 −1 −1 −1 3
1 0 −4 1
B := B · I(2, u) = 15 5
· 1 5 =
4
,
2 − 12 12 1 1
5 5 6 6
! !
1 7 1 1
5 − 12 ·5 12
π= 7
= 7
,
12 12
! !
11 1  
(2) (1) −1 5 4 4 2
x =x + β B e2 = 14
− · 1
= 8 .
5
5 6 3
4.5 Disjunctive Inequalities 113

The new solution x(2) is still non-integer, and we will try to cut it off. Since there
are no integer points in the strip 2 < x2 < 3, we can remove from P(2) all points
from this strip (the deleted area is shaded in Fig. 4.5.b). Now the feasible domain of
(2) (2)
our IP is contained in the union of two polyhedra, P1 and P2 , that are the solution
sets to the following systems of inequalities:

1: −2x1 + 3x2 ≤ 4, 1: −2x1 + 3x2 ≤ 4,


2: x1 + x2 ≤ 5, 2: x1 + x2 ≤ 5,
(2) 3: 2x1 + 3x2 ≤ 12, (2) 3: 2x1 + 3x2 ≤ 12,
P1 : P2 :
4: x2 ≤ 2, 4: −x2 ≤ −3,
x1 ≥ 0, x1 ≥ 0,
x2 ≥ 0, x2 ≥ 0.

(2)
From Fig. 4.5.b it is clear that the polyhedron P2 is empty and, therefore, the
second system of inequalities
 is incompatible.
 But we will not rely on the drawing.
(2) (2) (2)
To separate x from conv P1 ∪ P2 , we solve the following LP:

2α1 + 38 α2 − β → max,
−2y11 + y12 + 2y13 − α1 ≥ 0,
3y11 + y12 + 3y13 + y14 − α2 ≥ 0,
1 1 1
4y1 + 5y2 + 12y3 + 2y41 − β ≤ 0,
− 2y21 + y22 + 2y23 − α1 ≥ 0,
2 2 2
3y1 + y2 + 3y3 − y4 2 − α2 ≥ 0,
4y21 + 5y22 + 12y23 − 3y24 − β ≤ 0,
−1 ≤ β ≤ 1,
y11 , y12 , y13 , y14 , y21 , y22 , y23 , y24 ≥ 0.

This program has an optimal solution with the following components:


1
α1∗ = 0, α2∗ = , β ∗ = 1.
2
Therefore, our second cut is the inequality

6: x2 ≤ 2,

which is added to our already extended IP as the 6-th constraint. The feasible poly-
tope, P(3) , of the relaxation LP for this new IP is shown in Fig. 4.5.c.
Now let us proceed to the reoptimization. At the point x(2) , the last added in-
equality (s = 6) is violated by 2 − 83 = − 32 . We calculate
" # 
− 14 14

T 1 1
u = (0, 1) · 1 1 = , ,
6 6
6 6
114 4 Cutting Planes
 
1 7 1
λ = min , = ⇒ t = 1, I = (6, 5),
2 2 2

and then perform the pivot operation:


" # 
− 41 1   3 1
−1 −1 −1 4 6 −1 −2 2
B := B · I(1, u) = 1 1
· = ,
0 1 1 0
6 6
! !
1 1
2 2
π= 7
= ,
12 − 12 · 61 1
2
2 − 32
     
2 3
x(3) = x(2) + β B−1 e1 = 8 − · = .
3 3 1 2

Since the point x(3) is integral, then it is an optimal solution to (4.15). t


u

4.6 Lift And Project

In this section we will consider MIPs in which all integer variables are binary:

cT x → max,
Ax ≤ b, (4.16)
x j ∈ {0, 1}, j = 1, . . . , p,

where A is a real m × n-matrix, c ∈ Rn , b ∈ Rm , x is an n-vector of variables, and


0 < p ≤ n. To simplify the discussion, we assume that the inequalities −x j ≤ 0 and
x j ≤ 1 ( j = 1, . . . , p) are already included in the system Ax ≤ b.
Let X denote the set of feasible solution to (4.16), then P = P(A, b) is a relaxation
polyhedron for X (X ⊆ P). Next we describe a procedure that, for any j ∈ {1, . . . , p},
constructs a polyhedron Pj such that X ⊆ Pj ⊆ P.
Lift-and-project:
(Lift) Linearize the system of nonlinear inequalities

x j (Ax − b) ≤ 0, (1 − x j )(Ax − b) ≤ 0,

by substituting x j for x2j , and a new continuous variable yi for xi x j (i 6= j). Let M j
denote the polyhedron of the solutions of the resulting system of linear inequali-
ties.
Project the polyhedron M j ⊆ Rn × Rn−1 onto the space of x-variables. Let Pj =
lfpr(P) denotes the resulting polyhedron.
Recall that the projection of a set Q ⊆ U ×V onto the set U is the set
4.6 Lift And Project 115

def
projU (Q) = {u ∈ U : ∃ (u, v) ∈ Q}.

The inclusion X ⊆ Pj follows from the fact that, for x ∈ X, the point (x, y) belongs
to M j if we define yi = xi x j for all i 6= j. The inclusion Pj ⊆ P is valid because each
of the inequalities Ai x ≤ b is the sum of two inequalities defining M j :

x j Ai x ≤ bi x j and (1 − x j )Ai x ≤ bi (1 − x j ).

The above lift-and-project procedure does not give an explicit (in the form of a
system of linear inequalities) descriptions of the polyhedrons Pj . But we can still
describe each polyhedron Pj implicitly by providing a separation procedure. The
polyhedron M j is described by the following system of inequalities:

(A j − b)x j + AN\{ j} y ≤ 0,
AN\{ j} xN\{ j} + bx j − AN\{ j} y ≤ b,

def
where N = {1, . . . , n}. The point x̄ ∈ Rn belongs to Pj if and only if the following
system of inequalities is compatible:

AN\{ j} y ≤ x̄ j (b − A j ),
(4.17)
−AN\{ j} y ≤ (1 − x̄ j )b − AN\{ j} x̄N\{ j} .

By Farkas’ lemma (Lemma 3.3) the system of inequalities (4.17) has a solution if
and only if
x̄ j (b − A j )T u + ((1 − x̄ j )b − AN\{ j} x̄N\{ j} )T v ≥ 0
for all u, v ∈ Rm
+ such that

uT AN\{ j} − vT AN\{ j} = 0.

We can verify this condition of Farkas’ lemma by solving the following LP:

x̄ j (b − A j )T u + ((1 − x̄ j )b − AN\{ j} x̄N\{ j} )T v → min, (4.18a)


T N\{ j} T N\{ j}
u A −v A = 0, (4.18b)
m m
∑ ui + ∑ vi = 1, (4.18c)
i=1 i=1
u ≥ 0, v ≥ 0. (4.18d)

Here, (4.18c) is a normalization restriction, which is introduced to ensure that the


objective function in (4.18) is bounded.
Let z̄ and (ū, v̄) be the optimal objective value and an optimal solution of (4.18).
If z̄ ≥ 0, then the point x̄ belongs to Pj , and if z̄ < 0, then the inequality

x j (b − A j )T ū + ((1 − x j )b − AN\{ j} xN\{ j} )T v̄ ≥ 0,


116 4 Cutting Planes

or after rearranging,

(v̄T b − ūT (b − A j ))x j + v̄T AN\{ j} xN\{ j} ≤ v̄T b (4.19)

separates x̄ from Pj .
Example 4.6 We need to separate the point x̄ = (1/3, 2) from the solution set of the
following system:
1: 3x1 + 2x2 ≤ 5,
2: −x1 ≤ 0,
3: x1 ≤ 1,
4: −x2 ≤ 0,
5: x2 ≤ 2,
x1 ∈ {0, 1}.
Solution. First, for j = 1, we write down (4.18) applied to our problem instance:
2 1 2 2 2 2
z = u1 + u2 + u5 − v1 + v3 + 2v4 − v5 → min,
3 3 3 3 3 3
2u1 − u4 + u5 − 2v1 + v2 − v5 = 0,
u1 + u2 + u3 + u4 + u5 + v1 + v2 + v3 + v4 + v5 = 1,
u1 , u2 , u3 , u4 , u5 , v1 , v2 , v3 , v4 , v5 ≥ 0.
T T
The vectors ū = 31 , 0, 0, 0, 0 , v̄ = 0, 0, 0, 0, 23 constitute an optimal solution to
this LP, and the optimal objective value is z̄ = − 19 . Since z̄ < 0, by (4.19), we can
write the cut:  
4 2 2 4
− x1 + x2 ≤ or x1 + x2 ≤ 2.
3 3 3 3
t
u

The following theorem shows that each polyhedron Pj build by the lift-and-
project procedure is the convex hull of the union of two polyhedra. This means
that the lift-and-project cuts are specialized disjunctive cuts.
def
Theorem 4.6. Pj = Pj∗ = conv ((P ∩ {x ∈ Rn : x j = 0}) ∪ (P ∩ {x ∈ Rn : x j = 1})).
Proof. We assume that P 6= 0./ Otherwise, the result is trivial. First we prove the
inclusion Pj ⊆ Pj∗ .
/ then Pj∗ = P ∩ {x ∈ Rn : x j = 1}. We already know
If P ∩ {x ∈ Rn : x j = 0} = 0,
that Pj ⊆ P. Therefore, to prove the inclusion Pj ⊆ Pj∗ , it suffices to show that the
def
inequality x j ≥ 1 holds for Pj . Since P∩{x ∈ Rn : x j = 0} = 0,
/ then ε = min{x j : x ∈
P} > 0 and the inequality x j ≥ ε is valid for P. Since x j ≥ ε is a linear combination
of some inequalities from Ax ≤ b, the inequality (1 − x j )x j ≥ (1 − x j )ε is valid for
the nonlinear system constructed in the lift step. Then x2j is replaced with x j , and we
obtain the inequality (1 − x j )ε ≥ 0, from which it follows that x j ≥ 1.
The inclusion Pj ⊆ Pj∗ is proved similarly when P ∩ {x ∈ Rn : x j = 1} = 0. /
4.6 Lift And Project 117

Now we consider the case when both sets, P ∩ {x ∈


Rn : x j = 0} and P ∩ {x ∈ Rn : x j = 1}, are non-empty
@
P @@
(this is illustrated in Fig. 4.6). Suppose that the inequality 
 
α T x ≤ β is valid for Pj∗ . Since this inequality is also valid 
 
Pj
n n
for P ∩ {x ∈ R : x j ≤ 0} and P ∩ {x ∈ R : x j ≥ 1}, then,
x =0 xj = 1
by virtue of Proposition 1.3, there exist vectors λ 0 , λ 1 ∈ Rn+ , j
and numbers γ0 , γ1 ≥ 0 such that Fig. 4.6

α T = (λ 0 )T A + γ0 e j and β ≥ (λ 0 )T b,
α T = (λ 1 )T A − γ1 e j and β ≥ (λ 1 )T b − γ1 .

Consequently, the inequalities α T x − γ0 x j ≤ β and α T x − γ1 (1 − x j ) ≤ β are valid


for P. Therefore, the inequalities

(1 − x j )(α T x − γ0 x j − β ) ≤ 0,
x j (α T x − γ1 (1 − x j ) − β ) ≤ 0

and their sum


α T x − (γ0 + γ1 )(x j − x2j ) − β ≤ 0
are valid for the nonlinear system build in the lift step. Then x2j is replaced with x j
giving the inequality α T x ≤ b that is valid for M j , and, consequently, for Pj . This
completes the proof of the inclusion Pj ⊆ Pj∗ .
It remains to justify the validity of the inverse inclusion Pj∗ ⊆ Pj . We assume that

Pj 6= 0;/ otherwise, the result is trivial. Let the point x̄ belong to at least one of the
def
sets P ∩ {x ∈ Rn : x j = 0} or P ∩ {x ∈ Rn : x j = 1}. Define ȳi = x̄i x̄ j for i 6= j.
As x̄2j = x̄ j , then (x̄, ȳ) ∈ M j and x̄ ∈ Pj . Since Pj is a convex set, we conclude that
Pj∗ ⊆ Pj . t
u

Applying the lift-and-project procedure successively for j = 1, . . . , p, we can


compute the convex hull of the feasible set X.
Theorem 4.7. Defining P̄0 = P and P̄k = lfprk (P̄k−1 ) for k ≥ 1, we have the equality
P̄p = conv(X).
Proof. By induction, we prove the equality P̄j = conv(X j ), where

def
X j = {x ∈ Rn : Ax ≤ b, xi ∈ {0, 1} for i = 1, . . . , j}.

By definition, P̄0 = P = X0 . Let j ≥ 1 and assume that P̄j−1 = conv(X j−1 ). By The-
orem 4.6, we have

P̄j = conv((conv(X j−1 ) ∩ {x : x j = 0}) ∪ (conv(X j−1 ) ∩ {x : x j = 1}))


= conv(conv(X j−1 ∩ {x : x j = 0}) ∪ conv(X j−1 ∩ {x : x j = 1}))
= conv(X j ). t
u
118 4 Cutting Planes

4.7 Separation and Optimization

In the most general form, the separation problem is formulated as follows. Given
a set X ⊂ Rn and a point x̃ ∈ Rn , we need to prove that x̃ ∈ conv(X), or find a
hyperplane H(a, β ) separating x̃ from X, i.e., such that

aT x ≤ β , x ∈ X,
T
a x̃ > β .

We have already considered some special cases of this general problem. In par-
ticular, studying the Gomory cuts, we solved the problem of separating a vertex x̃ of
a polyhedron P(A, b) from a mixed integer set X = P(A, b; S). Note that the problem
of separating an arbitrary point x̃ (not a vertex of P(A, b)) from the set P(A, b; S) is
NP-hard. It is not surprising that the general separation problem is also NP-hard,
and we can hardly hope to develop an efficient algorithm for solving it. Here we
describe a procedure for solving the general separation problem, which is efficient
in practice for a number of special sets X. Let us also note that in practice we need
very fast separation procedures, since they are repeatedly called by the cutting plane
algorithms. For this reason, in the modern MIP solvers, very often exact separation
procedures are replaced with fast heuristics.
It is known that the separation problem for the set X is polynomially equivalent
to the optimization problem

max{cT x : x ∈ X}. (4.20)

Here our main interest is not in investigating the complexity aspects of this equiv-
alence. We are going to present two LP based approaches for solving each of the
problems, optimization or separation, provided that there is an efficient procedure
for solving the other problem.

4.7.1 Markowitz Model for Portfolio Optimization

We already know how separation procedures are used for solving MIPs. In this sec-
tion on a particular quadratic optimization problem we demonstrate how to apply the
cutting plane approach for solving optimization problems with convex constraints
that are represented by separation procedures.
We want to invest some amount of money in some of n assets (stocks, bonds,
etc.). Let pi be the relative change in price of asset i during some planning hori-
zon (for example, one year), i.e., pi is the change in the price of this asset during
the planning horizon divided by its price at the beginning of the horizon (return
per one enclosed dollar). We assume that p1 , . . . , pn are dependent normal random
variables, and p = (p1 , . . . pn )T is a random price vector with known mean (mathe-
4.7 Separation and Optimization 119

matical expectation) p̄ ∈ Rn+ and covariance matrix Σ , which is a symmetric positive


semidefined n × n-matrix.
A portfolio is a vector x = (x1 , . . . , xn )T , where xi ≥ 0 is the share of funds in-
vested in asset i (i = 1, . . . , n), ∑ni=1 xi = 1. We will assume that the portfolio is
formed in order to remain unchanged for the planning horizon. Therefore, the re-
turn of the portfolio x at the end of the planning horizon is a random variable pT x
with the mean value p̄T x and the variance xT Σ x. Markowitz formulated the problem
of portfolio optimization as the following quadratic programming problem:

p̄T x → max, (4.21a)


T 2
x Σx ≤ r , (4.21b)
n
∑ xi = 1, (4.21c)
i=1
xi ≥ 0, i = 1, . . . , n. (4.21d)

In this problem, we maximize the average return of the portfolio at a limited risk
of r2 (see Sect. 8.3 for a discussion of risk measures).
To solve (4.21) with the dual simplex method, we need to represent the convex
set
def 
XΣ = x ∈ Rn : xT Σ x ≤ r2

by a separation procedure. In this particular case, the separation algorithm is simple.


Since Σ is positive semidefined, it can be factored as Σ = BT B, where B is some
m × n-matrix. Introducing new variables y = Bx, we can rewrite (4.21b) as follows:

yT y ≤ r2 , y = Bx.

Now, to separate a given point x̃ ∈ Rn from the set XΣ  , we first compute


 ỹ = Bx̃.
T r
If kỹk ≤ r, then x̃ ∈ XΣ . Otherwise, the hyperplane ỹ y − kỹk ỹ = 0 — which
r
touches the sphere given by kyk = r at the intersection point, kỹk ỹ, of this sphere
with the ray going from the origin through the point ỹ — separates ỹ from the ball
def
B(r) = {y ∈ Rn : kyk ≤ 1}, and B(r) is in the half-space given by the inequality
ỹ y ≤ rkỹk. Therefore, the inequality (ỹT B)x ≤ rkỹk is valid for XΣ but not for x̃.
T

Example 4.7 Consider the problem of forming a portfolio of four assets. The mean
values and standard deviations of the future random returns of these assets are
presented in the following table
Asset 1 2 3 4
p̄i 1.03 1.06 1.08 1.1
σi 0 0.05 0.1 0.2
Here asset 1 is a risk-free asset with a return of 3%. The correlation coefficients
between risky assets are the following: ρ24 = −0.04, ρ34 = 0.03, and ρ23 = 0.
We need to find an approximately optimal portfolio which risk is not greater than
r2 for r = 0.04.
120 4 Cutting Planes

Solution. First, using equations Σii = σi2 , Σi j = ρi j σi σ j , we compute the covari-


ance matrix:  
0 0 0 0
0 0.0025 0 −0.0004
Σ = 0
.
0 0.01 0.0006 
0 −0.0004 0.0006 0.04
So we have to solve the following optimization problem:

1.03x1 + 1.06x2 + 1.08x3 + 1.1x4 → max,


0.0025x22 + 0.01x32 + 0.04x32 − 0.0008x2 x4 + 0.0012x3 x4 ≤ r2 ,
(4.22)
x1 + x2 + x3 + x4 = 1,
x1 ≥ 0, x2 ≥ 0, x3 ≥ 0, x4 ≥ 0.

You can directly verify that Σ ≈ BT B for


 
0 0 0 0
0 0.05 0 −0.008
B= 0 0 0.1 0.006  .

0 0 0 0.1997

Therefore,

y1 = 0, y2 = 0.05x2 − 0.008x4 , y3 = 0.1x3 + 0.006x4 , y4 = 0.1997x4 .

We cannot numerically find an exact optimal solution to (4.22). Let us agree


to stop calculations as soon as we have a portfolio x such that kBxk ≤ r + ε for
ε = 10−5 , and then

xT Σ x ≤ (r + ε)2 = r2 + 2 × 0.04 × 10−5 + 10−10 < r2 + 10−6 .

We start with solving the LP (LP(0) in what follows) obtained from (4.22) after
removing the quadratic inequality. The point x(0) = (0, 0, 0, 1)T is the only opti-
mal solution to this LP. Then y(0) = Bx(0) = (0, −0.008, 0.006, 0.1997)T , and since
ky(0) k = 0.19995 > r = 0.04, we compute the first cut:

1  (0) T
y Bx = −0.01x2 + 0.015x3 + 0.999502x4 ≤ ky(0) k = 0.19995.
r
Adding this inequality to LP(0) and reoptimizing the extended LP (denoted as
LP(1)), we get its optimal solution x(1) , which is presented in Table 4.1. Column 2
of this table contains optimal solutions to all the LPs solved by our cutting plane
algorithm: LP(i + 1) is obtained from LP(i) by adding to the latter the inequality
 T
1 (i)
r y Bx ≤ ky(i) k, which is presented in Column 3 of Row i, that separates from
def
XΣ an optimal solution, x(i) , to LP(i). Here y(i) = Bx(i) .
4.7 Separation and Optimization 121

Table 4.1 Cutting planes for portfolio problem example


1 (i) T
i (x(i) )T (i)
r (y ) Bx ≤ ky )k
0 (0, 0, 0, 1) −0.010000x1 + 0.015000x2 + 0.999502x3 ≤ 0.199950
1 (0.000000, 0.000000, 0.812139, 0.187861) −0.001879x2 + 0.205853x3 + 0.199950x4 ≤ 0.090497
2 (0.000000, 0.549577, 0.248606, 0.201817) 0.032330x2 + 0.065179x3 + 0.199950x4 ≤ 0.054525
3 (0.000000, 0.553723, 0.390366, 0.055911) 0.034049x2 + 0.098430x3 + 0.056202x4 ≤ 0.049161
4 (0.000000, 0.704495, 0.202844, 0.092661) 0.043104x2 + 0.052101x3 + 0.088612x4 ≤ 0.044338
5 (0.140220, 0.481318, 0.272385, 0.106078) 0.029022x2 + 0.069687x3 + 0.105297x4 ≤ 0.042010
6 (0.096975, 0.573079, 0.262967, 0.066979) 0.035148x2 + 0.066747x3 + 0.065160x4 ≤ 0.041017
7 (0.117786, 0.560763, 0.227810, 0.093641) 0.034111x2 + 0.058357x3 + 0.091404x4 ≤ 0.040488
8 (0.096812, 0.606008, 0.222317, 0.074862) 0.037127x2 + 0.056702x3 + 0.072099x4 ≤ 0.040251
9 (0.117884, 0.561278, 0.241584, 0.079254) 0.034287x2 + 0.061585x3 + 0.077225x4 ≤ 0.040121
10 (0.112453, 0.580639, 0.223040, 0.083868) 0.035451x2 + 0.057018x3 + 0.081366x4 ≤ 0.040063
11 (0.122751, 0.558533, 0.232882, 0.085834) 0.034050x2 + 0.059508x3 + 0.083699x4 ≤ 0.040030
12 (0.116129, 0.570827, 0.231828, 0.081216) 0.034865x2 + 0.059175x3 + 0.078945x4 ≤ 0.040016
13 (0.118105, 0.569426, 0.227866, 0.084603) ky(13) k = 0.040008 < 0.04 + 10−5

We stopped calculations after 13 iterates because ky(13) k = 0.040008 < r + 10−5 .


So, we declare the portfolio x(13) as an approximate optimal solution to our example,
the expected return of this portfolio is p̄T x(13) = 1.064398. t
u

Let us note that an efficient implementation of the dual simplex method as a


cutting plane algorithm not only adds cuts to the problem constraints but it also
deletes cuts added at some previous stages if those cuts are not tight at an optimal
solution to the currently solved LP.
In conclusion, let us also note that the cutting plane algorithm based on the dual
simplex method is not the best choice for solving convex (in particular, quadratic)
optimization problems. Nevertheless, when an optimization problem with convex
constraints involves integer variables, then the representation of these convex con-
straints by separation procedures withing a branch-and-cut algorithm — which is a
combinations of the cutting plane and branch-and-bound methods (see Sect. 6.2) —
may be a reasonable solution approach.

4.7.2 Exact Separation Procedure

Here we show how to separate a given point x̃ ∈ Rn from a given set X ⊆ Rn using
an algorithm for solving (4.20).
First, solving (4.20) with different objective vectors c = c1 , . . . , cm , we find a
family of vectors {x1 , . . . , xm } ∈ X. We can also start with an empty family by setting
m = 0. Then we look for a hyperplane H(a, β ) separating our point x̃ from the set
{x1 , . . . , xm }. To do this we solve the following LP:
122 4 Cutting Planes

x̃T v − α → max,
(xi )T v − α ≤ 0, i = 1, . . . , m,
(4.23)
−1 ≤ vi ≤ 1, i = 1, . . . , n,
−1 ≤ α ≤ 1.

Let a = v∗ , β = α ∗ denote the components of an optimal solution to (4.23). Since


v = 0 and α = 0 is a feasible solution to (4.23), then aT x̃ ≥ β . Let us consider two
cases.
1. First we consider the case when aT x̃ > β . To verify that the entire set X lies
in the half-space aT x ≤ β , we find an optimal solution x∗ to (4.20) when c = a. If
aT x∗ ≤ β , then aT x = β is a separating hyperplane. Otherwise, we set xm+1 = x∗
and increment m by 1, then we again solve (4.23) to find a new hyperplane aT x = β .
We continue to act this way until we find a separating hyperplane, or until we get
the equality aT x̃ = β .
2. Now we consider the case when aT x̃ = β . Let y∗ ∈ Rm+2n+2 be a solution to
the dual LP for (4.23):
2n+2
∑ ym+i → min,
i=1
m
∑ xij yi + ym+i − ym+n+i = x̃ j , j = 1, . . . , n,
i=1
m
− ∑ yi + ym+2n+1 − ym+2n+2 = −1,
i=1
yi ≥ 0, i = 1, . . . , m + 2n + 2.

Since the optimal primal and dual objective values are equal, then y∗m+i = 0 for
i = 1, . . . , 2n+2 and, therefore, the vector equality ∑m ∗ i
i=1 yi x = x̃ holds, which means
that
x̃ ∈ conv({x1 , . . . , xm }) ⊆ conv(X).
It is clear that in practice the separation procedure described above can be used
only for those sets X for which (4.20) is solved easily.
T
Example 4.8 We need to separate the point x̃ = 1, 45 , 15 from the knapsack set
X1 = {x ∈ {0, 1}3 : 4x1 + 5x2 + 5x3 ≤ 9}.
Solution. First, we set m = 0 and solve the next LP:
4 1
v1 + v2 + v3 − α → max,
5 5
−1 ≤ v1 , v2 , v3 ≤ 1,
−1 ≤ α ≤ 1.

Its solution is v∗ = (1, 1, 1)T , α ∗ = −1, and therefore we set a = v∗ = (1, 1, 1)T , β =
α ∗ = −1. Since aT x̃ = 2 > −1 = β , we need to solve the following 0,1-knapsack
4.7 Separation and Optimization 123

problem:

x1 + x2 + x3 → max,
4x1 + 5x2 + 5x3 ≤ 9,
x1 , x2 , x3 ∈ {0, 1}.

Its solution is x∗ = (1, 1, 0)T . As aT x∗ = 2 > −1 = β , we set x1 = (1, 1, 0)T and


m = 1.
Now we solve the following LP:
4 1
v1 + v2 + v3 − α → max,
5 5
v1 + v2 − α ≤ 0,
−1 ≤ v1 , v2 , v3 ≤ 1,
−1 ≤ α ≤ 1.

Its solution is v∗ = (1, 0, 1)T , α ∗ = 1, and therefore we set a = (1, 0, 1)T , β = 1.


Since aT x̃ = 65 > 1 = β , we need to solve the following 0,1-knapsack problem:

x1 + x3 → max,
4x1 + 5x2 + 5x3 ≤ 9,
x1 , x2 , x3 ∈ {0, 1}.

Its solution is x∗ = (1, 0, 1)T . As aT x∗ = 2 > 1 = β , we set x2 = (1, 0, 1)T and m = 2.


Now we solve the next LP:
4 1
v1 + v2 + v3 − α → max,
5 5
v1 + v2 − α ≤ 0,
v1 + v3 − α ≤ 0,
−1 ≤ v1 , v2 , v3 ≤ 1,
−1 ≤ α ≤ 1.

Its solution is v∗ = (1, 0, 0)T , α ∗ = 1, and therefore we set a = (1, 0, 1)T , β = 1. As


aT x̃ = 1 = β , then x̃ ∈ conv({x1 , x2 }). This can also be seen directly:
   
1
 
1 1
4 1 1 2 4   1   4
x + x = 1 + 0 = 5.
5 5 5 5 1
0 1 5

t
u

We will solve one more similar example, but with the other outcome.
124 4 Cutting Planes

3 4 1 T

Example 4.9 We need to separate the point x̃ = 4, 5, 2 from the knapsack set
X2 = {x ∈ {0, 1}3 : 4x1 + 5x2 + 2x3 ≤ 9}.
Solution. We set m = 0 and solve the next LP:
3 4 1
v1 + v2 + v3 − α → max,
4 5 2
−1 ≤ v1 , v2 , v3 ≤ 1,
−1 ≤ α ≤ 1.

Its solution is v∗ = (1, 1, 1)T , α ∗ = −1, and therefore a = v∗ = (1, 1, 1)T , β = α ∗ =


41
−1. Since aT x̃ = 20 > −1 = β , we need to solve the following 0,1-knapsack prob-
lem:

x1 + x2 + x3 → max,
4x1 + 5x2 + 2x3 ≤ 9,
x1 , x2 , x3 ∈ {0, 1}.

Its solution is x∗ = (1, 1, 0)T . As aT x∗ = 2 > −1 = β , we set x1 = (1, 1, 0)T and


m = 1.
Now we solve the following LP:
3 4 1
v1 + v2 + v3 − α → max,
4 5 2
v1 + v2 − α ≤ 0,
−1 ≤ v1 , v2 , v3 ≤ 1,
−1 ≤ α ≤ 1.

Its solution is v∗ = (−1, 0, 1)T , α ∗ = −1, and therefore we set a = (−1, 0, 1)T ,
β = −1. Since aT x̃ = − 41 > −1 = β , we need to solve the following 0,1-knapsack
problem:

−x1 + x3 → max,
4x1 + 5x2 + 2x3 ≤ 9,
x1 , x2 , x3 ∈ {0, 1}.

Its solution is x∗ = (0, 0, 1)T . As aT x∗ = 1 > −1 = β , we set x2 = (0, 0, 1)T and


m = 2.
Now we solve the following LP:
3 4 1
v1 + v2 + v3 − α → max,
4 5 2
v1 + v2 − α ≤ 0,
v3 − α ≤ 0,
4.7 Separation and Optimization 125

−1 ≤ v1 , v2 , v3 ≤ 1,
−1 ≤ α ≤ 1.

Its solution is v∗ = (0, 1, 1)T , α ∗ = 1, and therefore we set a = (0, 1, 1)T , β = 1.


13
Since aT x̃ = 10 > 1 = β , we need to solve the following 0,1-knapsack problem:

x2 + x3 → max,
4x1 + 5x2 + 2x3 ≤ 9,
x1 , x2 , x3 ∈ {0, 1}.

Its solution is x∗ = (0, 1, 1)T . As aT x∗ = 2 > 1 = β , we set x3 = (0, 1, 1)T and m = 3.


Now we solve the next LP:
3 4 1
v1 + v2 + v3 − α → max,
4 5 2
v1 + v2 − α ≤ 0,
v3 − α ≤ 0,
v2 + v3 − α ≤ 0,
−1 ≤ v1 , v2 , v3 ≤ 1,
−1 ≤ α ≤ 1.

Its solution is v∗ = (1, 0, 1)T , α ∗ = 1, and therefore we set a = (1, 0, 1)T , β = 1.


Since aT x̃ = 45 > 1 = β , we need to solve the following 0,1-knapsack problem:

x1 + x3 → max,
4x1 + 5x2 + 2x3 ≤ 9,
x1 , x2 , x3 ∈ {0, 1}.

Its solution is x∗ = (1, 0, 1)T . As aT x∗ = 2 > 1 = β , we set x4 = (1, 0, 1)T and m = 4.


Now we solve the following LP:
3 4 1
v1 + v2 + v3 − α → max,
4 5 2
v1 + v2 − α ≤ 0,
v3 − α ≤ 0,
v2 + v3 − α ≤ 0,
v1 + v3 − α ≤ 0,
−1 ≤ v1 , v2 , v3 ≤ 1,
−1 ≤ α ≤ 1.
T T
Its solution is v∗ = 21 , 21 , 12 , α ∗ = 1, and therefore we set a = 12 , 12 , 21 , β = 1.
Since aT x̃ = 41
40 > 1 = β , we need to solve the following 0,1-knapsack problem:
126 4 Cutting Planes

1 1 1
x1 + x2 + x3 → max,
2 2 2
4x1 + 5x2 + 2x3 ≤ 9,
x1 , x2 , x3 ∈ {0, 1}.

Its solution is x∗ = (1, 1, 0)T . Since aT x∗ = 1 = β , then the hyperplane

1 1 1
x1 + x2 + x3 = 1
2 2 2
separates x̃ from X2 , and the inequality x1 + x2 + x3 ≤ 2 is valid for X2 . t
u

4.8 Notes

The reviews [88], [144] and [39] are a good addition to the material of this and the
next chapters.
Sect. 4.2. Theorem 4.1 was proved by Chvátal [35] for the case when P(A, b) is a
polytope. Schrijver showed in [121] that this result is also valid for arbitrary poly-
hedra P(A, b). Historically, the first cutting plane algorithm for solving MIPs was
developed by Gomory [58, 60]. This algorithm uses the cuts described in Theo-
rem 4.2. In the general case, the separation problem for the Chvátal-Gomory in-
equalities is NP-hard in the strong sense [50]. Efficient separation procedures have
been proposed for totally tight cuts [86] (see Exercise 4.7), as well as for some
special polyhedra, provided that all components of the vector u are 0 or 1/2 [32].
The inequalities (4.4), known as the flower inequalities, were introduced in [48].
Sect. 4.3. Mixed integer rounding was studied in [98, 99]. Many known classes of
strong cuts for a series of structured mixed integer sets can be obtained by applying
mixed integer rounding [89].
Sect. 4.4. The fractional Gomory cuts were introduced in [59]. In the same paper it
was proved that the cutting plane algorithm based on these cuts solves any general
MIP in a finite number of steps, provided that the objective function takes integral
values on all feasible solutions of this MIP.
Sect. 4.5. The disjunctive principle was first implicitly present in the derivation of
the fractional Gomory cuts. The approach we outlined is based on the Balas char-
acterization of the disjunction of polyhedra [11]. The disjunction of polyhedra from
different spaces can be represented more efficiently [12] (see Exercise 4.12). In [85]
it is shown that many classes of facet defining inequalities for some well known
combinatorial optimization problems can be obtained by applying the disjunctive
technique.
Sect. 4.6. The lift-and-project procedure presented here was proposed in [13]. A
more powerful procedure, in which lifting is performed simultaneously for all binary
variables, was studied in [127]. An even more powerful lift-and-project procedure
4.9 Exercises 127

was proposed in [87], but here the separation procedure is reduced to solving a
semidefined programming problem.
Sect. 4.7. The equivalence of the optimization and separation problems was estab-
lished in [63] (see also [64, 122]).
Markowitz received the 1990 Nobel Prize in Economics for his portfolio opti-
mization model [90].
Sect. 4.9. The statements of exercises 4.2, 4.5, 4.6, 4.7, 4.12 are, respectively, taken
from [91], [31], [49], [86], [12].

4.9 Exercises

4.1. Let A be a real m × n-matrix, b ∈ Rm , S ⊆ {1, . . . , n}. Prove that if the mixed-
integer set P(A, b; S) is bounded, then its convex hull, conv(P(A, b; S)), is a polytope.
4.2. Let us consider a very simple generalization of the mixed-integer set X from
Lemma 4.1:

X̄ = {(x, y) ∈ Z × Rm : x − yi ≤ bi , i = 1, . . . , m}.

Prove that conv(X̄) is the solution set of the following system of inequalities:

x − yi ≤ bi , i = 1, . . . , m,
yi
x− ≤ bbc, i = 1, . . . , m,
1 − fi
yi ≥ 0, i = 1, . . . , m.

def
Here fi = bi − bbi c for i = 1, . . . , m.
def
4.3. Let A be a real m × n-matrix, and b ∈ Rm . Let f (r) = r − brc denote the frac-
tional part of r ∈ R. Assume that f (uT b) < 12 holds for all u ∈ Rm + , and t is a positive
1 T
integer such that 2 ≤ t f (u b) < 1. Let ū = ( f (t u1 ), . . . , f (t um )). Prove that, for the
set P(A, b)∩Zn+ , the cut būT Acx ≤ būT bc is not weaker than the cut buT Acx ≤ buT bc.
4.4. A function h : R → R is called superadditive, if

h(r1 ) + h(r2 ) ≤ h(r1 + r2 ) fot all r1 , r2 ∈ R.

Let β ∈ (0, 1) and f (r) = r − brc. Prove that the functions brc and

def max{0, f (r) − f (β )}


gβ (r) = brc +
1 − f (β )

are superadditive.
4.5. Prove the validity of the following generalization of the rounding principle: if
g is a non-decreasing superadditive function and g(0) = 0, then the inequality
128 4 Cutting Planes
n
∑ g(α j )x j ≤ g(β )
j=1

is valid for all non-negative integer solutions of the inequality


n
∑ α jx j ≤ β .
j=1

4.6. Prove the validity of the following analogue of Farkas’ lemma.


Theorem 4.8. Let A be a rational m × n-matrix, and b be a rational m-vector. A
linear system Ax = b has no integer solutions if and only if there exists a rational
m-vector y such that yT A is an integer-valued n-vector, but yT b is not integer.

4.7. Let A be a rational m × n-matrix, and b be a rational m-vector. A Chvátal-


Gomory cut ∑nj=1 buT A j cx ≤ buT bc that cuts off a point x̃ ∈ P(A, b) ⊆ Rn+ from
the set P(A, b) ∩ Zn is called totally tight if uT (b − Ax̃) = 0, i.e., this cut is derived
from the inequalities that are satisfied at x̃ as equalities. How can we use the result
of Theorem 4.8 to construct a separation procedure for the class of totally tight
Chvátal-Gomory cuts?
4.8. For the set

X = {x ∈ Zn+ : xi + x j ≤ 1, i, j = 1, . . . , n, i 6= j},

prove that ( )
n
conv(X) = x ∈ Rn+ : ∑ xj ≤ 1
j=1

and the Chvátal rank of the inequality ∑nj=1 x j ≤ 1 is O(n log n).
4.9. Prove that the Chvátal rank of the solution set of the following system

tx1 + x2 ≤ 1 + t,
−tx1 + x2 ≤ 1,
x1 ≤ 1,
x1 , x2 ≥ 0

is equal to t − 1 for t = 1, 2, . . . .
4.10. Apply the cutting plane algorithm that generates only Chvátal-Gomory cuts to
solve the following IPs:

a) x1 + x2 → max, b) x1 + x2 + x3 → max,
−x1 + x2 ≤ 1, 2x1 + 2x2 + x3 ≤ 6,
3x1 + 2x2 ≤ 4, x1 + x3 ≤ 2,
x1 , x2 ∈ Z+ ; x2 + x3 ≤ 2,
x1 , x2 , x3 ∈ Z+ .
4.9 Exercises 129

4.11. Apply the cutting plane algorithm that generates only fractional Gomory cuts
to solve the following MIPs:

a) x1 + 2x2 → max, b) x3 → max,


x1 + x2 ≤ 4, x1 + x2 + x3 ≤ 2,
−x1 + x2 ≤ 1, −x1 + x3 ≤ 0,
x1 ∈ Z+ , −x2 + x3 ≤ 0,
x2 ≥ 0; x1 , x2 ∈ Z+ ,
x3 ≥ 0.

4.12. For k = 1, 2, let Pk = {x ∈ [0, 1]nk : Ak x ≥ 1} be a monotone polyhedron, where


def
Ak = [akij ] is a non-negative real mk × nk -matrix (akij ≥ 0). Let Nk = {1, . . . , nk },
n o
def def
Mk = {1, . . . , mk }, and Fik = S ⊆ Nk : ∑ j∈Nk \S akij < 1 for i ∈ Mk .
a) Prove that the convex hull of the union (P1 × [0, 1]n2 ) ∪ ([0, 1]n1 × P2 ) is de-
scribed by the following system:

0 ≤ xkj ≤ 1, j = 1, . . . , nk , k = 1, 2,
a1i1 , j a2i2 , j
∑ x1j + ∑ x2j ≥ 1, (4.24)
j∈S1 1 − ∑ a1i1 ,l j∈S2 1 − ∑ a2i2 ,l
l∈N1 \S1 l∈N2 \S2

S1 ⊆ Fi11 , S2 ⊆ Fi22 , i1 ∈ M1 , i2 ∈ M2 .

def
b) For k ∈ {1, 2}, x ∈ [0, 1]nk and α ∈ [0, 1], let Sk (α, x) = { j ∈ Nk : x j ≤ α} and,
for i ∈ Mk , let
  
 
def
δik (x) = max x j : ∑ akil xl ≤ x j 1 − ∑ ail  ,
 k l∈S (x j ,x) k 
l∈Nk \S (x j ,x)

def akij x j
αik (x) = ∑ .
j∈Sk (δik (x),x)
1− ∑ akil
l∈Nk \Sk (δik (x),x)

Prove that a given point (x̃1 , x̃2 ) ∈ [0, 1]n1 × [0, 1]n2 satisfies all inequalities in (4.24)
if and only if αi11 (x̃1 ) + αi22 (x̃2 ) ≥ 1 for all pairs (i1 , i2 ) ∈ M1 × M2 . If for some
(i1 , i2 ) ∈ M1 × M2 , αi11 (x̃1 ) + αi22 (x̃2 ) < 1, then for S1 = S1 (δi11 (x̃1 ), x̃1 ) and S2 =
S2 (δi22 (x̃2 ), x̃2 ) the corresponding inequality in (4.12) is violated at (x̃1 , x̃2 ).
4.13. Elaborate separation procedures for the following sets:
a) (solution set of a convex quadratic constraint)

X1 = x ∈ Rn : xT Qx + 2cT x ≤ d ,


where Q is a real symmetric positive defined n × n-matrix, c ∈ Rn , d ∈ R;


130 4 Cutting Planes

b) (norm cone) X2 = {(x,t) ∈ Rn × R+ : kxk ≤ t}.


Hint. a) Use the substitution y = LT x + L−1 c, where Q = LLT and L is a non-
degenerate n × n-matrix. b) Given a point (x̃, t˜) such that kx̃k > t˜, compute its pro-
jection (x̂, tˆ) onto X2 , then the hyperplane passing through (x̂, tˆ) and orthogonal to
the vector (x̃ − x̂, t˜ − tˆ) separates (x̃, t˜) from X2 .
Chapter 5
Cuts for Structured Mixed-Integer Sets

Many optimization problems include various specific ”local” structures. For exam-
ple, very often, MIPs contain inequalities involving only binary variables, or, some-
times, a part of the problem constraints formulate a network flow problem. For such
local structures, we can get stronger inequalities using specific features of these
structures. It should be noted that there exist a great deal of such special structures.
In this chapter we consider a number of structural cuts for those mixed-integer sets
that are most often encountered in practical problems, and the separation procedures
for which have already been included in many modern MIP solvers.

5.1 Knapsack Inequalities

Constraints involving binary variables are so common in practice that they deserve
special attention. Consider a knapsack set of 0, 1-vectors
( )
n
K= x ∈ {0, 1}n : ∑ a jx j ≤ b (5.1)
j=1

def
when all coefficients, b and a j , are positive. A set C ⊆ N = {1, . . . , n} is called a
def
knapsack cover (or simply cover) if its excess λ (C) = ∑ j∈C a j − b is positive. Each
knapsack cover C defines the cover inequality:

∑ x j ≤ |C| − 1, (5.2)
j∈C

which is valid for K, and which implies that not all variables x j for j ∈ C can simul-
taneously be equal to 1.
A knapsack cover C is called minimal if a j ≥ λ (C) for all j ∈ C. If C is not a
minimal knapsack cover, then the cover inequality written for C is redundant, since
it is the sum of the inequalities

131
132 5 Cuts for Structured Mixed-Integer Sets

∑0 x j ≤ |C0 | − 1 and x j ≤ 1 for j ∈ C \C0 ,


j∈C

where C0 is a minimal cover contained in C.

5.1.1 Separation Problem For Cover Inequalities

For a given point x̃ ∈ [0, 1]n , we need to find a violated at this point inequality if
such one exists. Let us rewrite (5.2) in the following form:

∑ (1 − x j ) ≥ 1.
j∈C

Hence, it is clear that in order to solve the separation problem, we need to answer
the following question: is there a subset C ⊆ N such that

∑ aj > b and ∑ (1 − x̃ j ) < 1?


j∈C j∈C

This question can be rewritten also as follows:


( )
min
C⊆N
∑ (1 − x̃ j ) : ∑ a j > b < 1?
j∈C j∈C

Let us introduce an n-vector z of binary variables to represent any cover C, where


z j = 1 if j ∈ C, and z j = 0 if j ∈ N \C. Now to answer the above question, we have
to solve the following optimization problem:
n
∑ (1 − x̃ j )z j → min,
j=1
n
(5.3)
∑ a j z j > b,
j=1

z ∈ {0, 1}n .

Summarizing the arguments presented above, we formulate the following result.


Theorem 5.1. Let ξ ∗ an z∗ be the the optimal objective value and an optimal solu-
tion to (5.3). If ξ ∗ ≥ 1, then the point x̃ satisfies all the cover inequalities. If ξ ∗ < 1,
then for C∗ = { j ∈ N : z∗j = 1} the inequality ∑ j∈C∗ x j ≤ |C∗ | − 1 is violated at x̃ by
1 − ξ ∗.
We can solve (5.3), which is a 0,1-knapsack problem, of sufficiently large size
relatively quickly using the recurrence formula (1.30) (for this, (5.3) must first be
represented in the form (1.28) by substituting y j for 1 − z j ). But separation proce-
dures must be running very quickly, because they are called by the cutting plane
5.2 Lifting Inequalities 133

algorithms very often. In practice, (5.3) does not need to be solved to optimality,
instead, we can solve it approximately, but very quickly. Of course, solving (5.3)
approximately, we risk not finding a cover inequality violated at x̃, even if such one
exists.
Next we describe a simple efficient algorithm that solves (5.3) approximately.
1−x̃ j
1. List the ratios aj in non-decreasing order:
1 − x̃π(1) 1 − x̃π(2) 1 − x̃π(n)
≤ ≤ ... ≤ .
aπ(1) aπ(2) aπ(n)
2. Find a minimal index k such that ∑ki=1 aπ(i) > b.
3. Return C = {π(1), . . . , π(k)}.
The above algorithm is also known as the LP-heuristic because when all coeffi-
cients a j are integers, it constructs a solution that can be obtained by rounding up
the components of an optimal solution to the following LP:
( )
n n
min ∑ (1 − x̃ j ) z j : ∑ a j z j ≥ b + 1, z ∈ [0, 1]n .
j=1 j=1

5.2 Lifting Inequalities

The technique of lifting inequalities allows us to strengthen an inequality that is


valid for some subset of a given discrete set, and obtain an inequality that is valid
for the whole set. Its essence is as follows.
def
Theorem 5.2. Let X ⊆ {0, 1}n , X δ = {x ∈ X : xn = δ } for δ ∈ {0, 1} and the in-
equality
n−1
∑ a jx j ≤ b (5.4)
j=1

is valid for X δ .
(δ = 0, lifting up) If X 1 = 0,
/ then xn = 0 for all x ∈ X. If X 1 6= 0,
/ then the
inequality
n−1
∑ a j x j + αn xn ≤ b (5.5)
j=1

is valid for X when αn ≤ b − ζ 1 , where


( )
n−1
1 1
ζ = max ∑ a jx j : x∈X .
j=1

Moreover, if αn = b − ζ 1 and (5.4) is a facet defining inequality for conv(X 0 ), then


(5.5) defines a facet for conv(X).
134 5 Cuts for Structured Mixed-Integer Sets

(δ = 1, lifting down) If X 0 = 0,
/ then xn = 1 for all x ∈ X. If X 0 6= 0,
/ then the
inequality
n−1
∑ a j x j + γn xn ≤ b + γn (5.6)
j=1

is valid for X when γn ≥ ζ 0 − b, where


( )
n−1
ζ 0 = max ∑ a jx j : x ∈ X0 .
j=1

Moreover, if γn = ζ 0 − b and (5.4) is a facet defining inequality for conv(X 1 ), then


(5.6) defines a facet for conv(X).
Proof. We consider only the lifting up case when δ = 0. The lifting down case
when δ = 1 is considered in a similar way.
Let x̄ ∈ X. If x̄n = 0, then
n−1 n−1
∑ a j x̄ j + αn x̄n = ∑ a j x̄ j ≤ b.
j=1 j=1

If x̄n = 1, then, by definition of ζ 1 , we have


n−1 n−1
∑ a j x̄ j + αn x̄n = ∑ a j x̄ j + αn ≤ ζ 1 + αn ≤ b.
j=1 j=1

Suppose now that X 1 6= 0, / the dimension of the set X 0 is d, αn = b − ζ 1 and (5.4)


is a facet defining inequality for conv(X 0 ). Then X 0 contains d affine-independent
points x1 , . . . , xd for which (5.4) holds as an equality. Let ζ 1 = ∑n−1 d+1
j=1 a j x j for
x d+1 1 d+1
∈ X . Since xn = 1, then x d+1 does not lie in the affine subspace generated by
the points x1 , . . . , xd . Hence, we have d + 1 affine-independent point x1 , . . . , xd , xd+1
satisfying (5.5) as an equality. Therefore, by Proposition 1.4, (5.5) defines a facet
for conv(X). t
u

The lifting technique described in Theorem 5.2 can be applied successively: the
inequality obtained as a result of lifting one variable can be lifted further by another
variable. Next, we apply this technique to strengthen the cover inequalities.

5.2.1 Lifted Cover Inequalities

Let C ⊆ N = {1, . . . , n} be a knapsack cover for the set K defined by (5.1). We want
to strengthen the cover inequality ∑ j∈C x j ≤ |C| − 1, which is valid for K. We do this
using the lifting technique described in Theorem 5.2.
5.2 Lifting Inequalities 135

Consider a partition (C1 ,C2 ) of the set C (C1 ∪ C2 = C and C1 ∩ C2 = 0)


/ with
C1 6= 0.
/ The inequality
∑ x j ≤ |C1 | − 1 (5.7)
j∈C1

is valid for the set


K0 = {x ∈ K : x j = 1 for j ∈ C2 }.
Lifting (5.7) up by the variables x j for j ∈ N \ C, and down by the variables x j for
j ∈ C2 , we can obtain a lifted cover inequality (LCI)

∑ xj + ∑ α j x j ≤ |C1 | − 1 + ∑ αj (5.8)
j∈C1 j∈N\C1 j∈C2

that is valid for K. For j ∈ N \C1 , the coefficients α j depend on the order in which
the variables x j are lifted. Let j1 , . . . , jk be some ordering of the elements of the
set N \ C1 , γ = |C1 | − 1, β = b − ∑ j∈C2 a j , and χ C2 ∈ {0, 1}N be the characteristic
vector of C2 (χiC2 = 1 if i ∈ C2 , and χiC2 = 0 if i ∈ N \C2 ). The coefficients α ji can
be calculated in series as follows.
Suppose that we have already obtained the inequality
r−1
∑ x j + ∑ α ji x ji ≤ γ
j∈C1 i=1

that is valid for the set Kr−1 , where


def
Ks = {x ∈ K : x j = 1, j ∈ C2 \ { j1 , . . . , js }}.

We calculate the coefficient α jr for the next variable x jr to guarantee that the result-
ing inequality must be valid for the set Kr . By virtue of Theorem 5.2, it is necessary
to solve the following 0,1-knapsack problem
r−1
∑ x j + ∑ α ji x ji → max,
j∈C1 i=1
r−1 C2 (5.9)
∑ a j x j + ∑ a ji x ji ≤ β + (−1)(1−χ jr ) a jr ,
j∈C1 i=1

x j ∈ {0, 1}, j ∈ C1 ∪ { j1 , . . . , jr−1 }.

Letting ξr denote the optimal objective value in (5.9), we set

α jr = γ − ξr , if jr ∈ N \C;
α jr = ξr − γ, γ = ξr , β := β + a jr , if jr ∈ C2 .
136 5 Cuts for Structured Mixed-Integer Sets

If (5.9) is solved using the recurrence formula (1.31), the above lifting procedure
runs in polynomial time. Let us also note again that different orderings of the set
N \C1 result in different lifted cover inequalities.

Example 5.1 Consider the knapsack set


n o
K 1 = x ∈ {0, 1}6 : 5x1 + 6x2 + 4x3 + 6x4 + 3x5 + 8x6 ≤ 16

and the point x̃ = 0, 23 , 43 , 1, 1, 0 . We need to find a lifted cover inequality that




separates x̃ from K 1 .

Solution. First, let us write down (5.3) applied to our knapsack set K 1 :
1 1
z1 + z2 + z3 + 0z4 + 0z5 + z6 → min,
3 4
5z1 + 6z2 + 4z3 + 6z4 + 3z5 + 8z6 > 16,
z1 , z2 , z3 , z4 , z5 , z6 ∈ {0, 1}.

To find a knapsack cover, we apply the LP-heuristic:


1−x̃ j
1. Sorting the numbers 51 , 18
1 1
, 16 , 0, 0, 81 , which are the ratios a j , in non-decreasing
order, we get the permutation π = (4, 5, 2, 3, 6, 1).
2. Now we decide that k = 4 and C = {4, 5, 2, 3}.
Since ∑ j∈C (1 − x̄ j ) = 0 + 0 + 31 + 14 = 12
7
< 1, we have found the cover inequality
violated at x̃: x2 + x3 + x4 + x5 ≤ 3.
Next let us lift this cover inequality for the partition C1 = C, C2 = 0,/ and the
lifting sequence j1 = 1, j2 = 6. We set β = b = 16, γ = |C1 | − 1 = 3. Note that when
C2 = 0,/ the values of β and γ remain constant during the execution of the lifting
algorithm.
To calculate α1 , we solve the following 0,1-knapsack problem:

x2 + x3 + x4 + x5 → max,
6x2 + 4x3 + 6x4 + 3x5 ≤ 16 − 5 = 11,
x2 , x3 , x4 , x5 ∈ {0, 1}.

An optimal solution to this problem is given by x2 = x3 = 1, x4 = x5 = 0, and the


optimal objective value is ξ1 = 2. Hence, α1 = γ − ξ1 = 3 − 2 = 1.
Similarly, to find α6 , we solve the next 0,1-knapsack problem:

x1 + x2 + x3 + x4 + x5 → max,
5x1 + 6x2 + 4x3 + 6x4 + 3x5 ≤ 16 − 8 = 9,
x1 , x2 , x3 , x4 , x5 ∈ {0, 1}.

Its optimal solution is determined by x1 = x5 = 1, x2 = x3 = x4 = 0, and the optimal


objective value is ξ2 = 2. Hence, α6 = γ − ξ2 = 3 − 2 = 1.
5.2 Lifting Inequalities 137

Now we can write the lifted cover inequality

x1 + x2 + x3 + x4 + x5 + x6 ≤ 3

that separates x̃ from K 1 . t


u

Example 5.2 We need to find a lifted cover inequality that separates the point

x̃ = (0, 0.4, 0.5, 0.5, 0.7, 1.0)

from the knapsack set


n o
K 2 = x ∈ {0, 1}6 : 13x1 + 7x2 + 6x3 + 5x4 + 3x5 + 10x6 ≤ 22 .

Solution. Let us first try to find a maximally violated cover inequality by solving
the following 0,1-knapsack problem:

z1 + 0.6z2 + 0.5z3 + 0.5z4 + 0.3z5 + 0z6 → min,


13z1 + 7z2 + 6z3 + 5z4 + 3z5 + 10z6 > 22,
z1 , z2 , z3 , z4 , z5 , z6 ∈ {0, 1}.

Its optimal solution is z∗ = (1, 0, 0, 0, 0, 1), and the optimal objective value is ξ ∗ = 1.
Therefore, by Theorem 5.1, the point x̃ satisfies all cover inequalities valid for K 2 . In
particular, x̃ satisfies the cover inequality x1 +x2 ≤ 1 written for the cover C = {1, 6}
determined by z∗ . Moreover, one can easily verify that, regardless of the choice of
the partition (C1 ,C2 ) and regardless of the ordering of the set N \C1 , all coefficients
α j calculated by the lifting procedure are zeros. As a result, we have the inequality
x1 + x6 ≤ 1, which is not violated at x̃.
Nevertheless, there is still a lifted cover inequality that separates x̃ from K 2 . We
will find such an inequality later in the continuation of this example. t
u

Example 5.2 shows that if we are going to solve the separation problem for the
class of lifted cover inequalities, the choice of the most violated cover inequality
as the starting one is not always justified. This example motivates the use of the
following heuristic procedure for finding an initial knapsack cover C.
def
For δ ∈ {0, 1}, we define N δ = { j ∈ N : x̃ j = δ }, and then compute b̄ = b −
∑ j∈N 1 a j . Next, sorting the values x̃ j for j ∈ N 2 = N \ (N 0 ∪ N 1 ) in non-increasing
order,
x̃ j1 ≥ x̃ j2 ≥ · · · ≥ x̃ jk ,
we find an index r such that
r−1 r
∑ a ji ≤ b̄ and ∑ a ji > b̄.
i=1 i=1
138 5 Cuts for Structured Mixed-Integer Sets

Then we set C1 = { j1 , . . . , jr }, C2 = N 1 and C = C1 ∪ C2 . To lift the inequality


∑ j∈C1 x j ≤ |C1 | − 1, let us order the elements of the set N \C1 in such a way that the
elements j ∈ N 2 are written first and they are listed in non-increasing order of values
x̃ j , then the elements from N 1 must follow, and the elements from N 0 are written at
the very end.

Example 5.3 (continuation of Example 5.2) We need to apply the above proce-
dure for separating the point x̃ from the set K 2 .
Solution. First, we write the partition of N = {1, 2, 3, 4, 5, 6}: N 0 = {1}, N 1 = {6}
and N 2 = N \ (N 0 ∪ N 1 ) = {2, 3, 4, 5}. Next we sort the components x̃ j for j ∈ N 2 :

x̃5 = 0.7 > x̃3 = 0.5 = x̃4 = 0.5 > x̃2 = 0.3.

Since b̄ = b−a6 = 22−10 = 12, a5 +a3 = 3+6 < 12 and a5 +a3 +a4 = 3+6+5 >
12, then r = 3 and C1 = {3, 4, 5}, C2 = N 1 = {6}, C = C1 ∪C2 = {3, 4, 5, 6}. Listing
the elements of N \C1 in the order of (2, 6, 1), we will lift the inequality

x3 + x4 + x5 ≤ 2.

Note that this inequality is not valid for K 2 . It is valid only for the set {x ∈ K 2 : x6 =
1}. Initially, we set β = b − a6 = 22 − 10 = 12, γ = 2.
To compute α2 , we solve the following 0,1-knapsack problem:

x3 + x4 + x5 → max,
6x3 + 5x4 + 3x5 ≤ 12 − 7 = 5,
x3 , x4 , x5 ∈ {0, 1}.

An optimal solution to this problem is given by x3 = x5 = 0, x4 = 1, and the optimal


objective value is ξ1 = 1. Therefore, α2 = γ − ξ1 = 2 − 1 = 1.
To compute α6 , we solve the next 0,1-knapsack problem:

ξ2 = x2 + x3 + x4 + x5 → max,
7x2 + 6x3 + 5x4 + 3x5 ≤ 12 + 10 = 22,
x2 , x3 , x4 , x5 ∈ {0, 1}.

Its optimal solution is given by x2 = x3 = x4 = x5 = 1, and the optimal objective


value is ξ2 = 4. Therefore, α2 = ξ2 − γ = 4 − 2 = 2. We also set β := β + a6 =
12 + 10 = 22, γ = ξ2 = 4.
Now we compute the last coefficient α1 solving the following 0,1-knapsack prob-
lem:

ξ3 = x2 + x3 + x4 + x5 + 2x6 → max,
7x2 + 6x3 + 5x4 + 3x5 + 10x6 ≤ 22 − 13 = 9,
x2 , x3 , x4 , x5 , x6 ∈ {0, 1}.
5.2 Lifting Inequalities 139

An optimal solution to this problem is determined by x2 = x3 = x6 = 0, x4 = x5 = 1,


and the optimal objective value is ξ3 = 2. Therefore, α1 = γ − ξ3 = 4 − 2 = 2.
Now all three coefficients are calculated, and we can write down the resulting
lifted cover inequality

2x1 + x2 + x3 + x4 + x5 + 2x6 ≤ 4,

which separates x̃ from K 2 since it is violated at x̃ by 0.1. t


u

5.2.2 Lifting Feasible Set Inequalities

The lifting technique presented in Theorem 5.2 can be used to strengthen any in-
equalities involving only binary variables. Let us again consider the knapsack set K
defined by (5.1). If a set S ⊂ N is not a knapsack cover (it is called a feasible set),
then ∑ j∈S a j ≤ b, and, for any vector w ∈ ZS++ , the inequality

∑ w jx j ≤ ∑ w j
j∈S j∈S

is valid for K. If we lift this trivial inequality, sometimes we can get a much stronger
inequality. Let us demonstrate this on an example.
Example 5.4 Consider the knapsack set
n o
K 3 = x ∈ {0, 1}6 : 3x1 + 4x2 + 6x3 + 7x4 + 9x5 + 18x6 ≤ 21 .

Choosing S = {1, 2, 3, 4} and w1 = w2 = w3 = w4 = 1, we write the inequality

x1 + x2 + x3 + x4 ≤ 4,

which, clearly, is valid for K 3 . We need to strengthen this inequality lifting the vari-
ables in the following order: j1 = 5, j2 = 6.
Solution. We have γ = 4 and β = 21. To compute α5 , we solve the following
0,1-knapsack problem:

x1 + x2 + x3 + x4 → max,
3x1 + 4x2 + 6x3 + 7x4 ≤ 21 − 9 = 12,
x1 , x2 , x3 , x4 ∈ {0, 1}.

Its optimal solution is given by x1 = x2 = 1, x3 = x4 = 0, and the optimal objective


value is ξ1 = 2. Therefore, α5 = γ − ξ1 = 4 − 2 = 2.
Next, we calculate α6 solving the problem
140 5 Cuts for Structured Mixed-Integer Sets

x1 + x2 + x3 + x4 + 2x5 → max,
3x1 + 4x2 + 6x3 + 7x4 + 9x5 ≤ 21 − 18 = 3,
x1 , x2 , x3 , x4 , x5 ∈ {0, 1}.

An optimal solution to this problem is determined by x1 = 1, x2 = x3 = x4 = x5 = 0,


and the optimal objective value is ξ2 = 1. Hence, α6 = γ − ξ2 = 4 − 1 = 3 and the
resulting inequality
x1 + x2 + x3 + x4 + 2x5 + 3x6 ≤ 4
is valid for K 3 . t
u

5.3 Mixed Knapsack Sets

Here, on a simple example, we demonstrate a typical application of the mixed in-


teger rounding (see Sect. 4.3) to generate cuts for simple mixed integer sets. Then
we discuss an alternative way of lifting inequalities, when the values of missing
coefficients do not depend on the order of their calculation.
Consider the set
( )
n
X BK+1 = (x, y) ∈ {0, 1}n × R : ∑ a jx j − y ≤ b ,
j=1

where a j ≥ 0 for j = 1, . . . , n. A subset C ⊆ {1, . . . , n} is a (mixed knapsack) cover


for X BK+1 , if
def
1) λ = λ (C) = ∑ j∈C a j − b > 0 and
2) ak > λ for k ∈ arg max j∈C a j .
Theorem 5.3. The following inequality

∑ min{a j , λ }x j − y ≤ ∑ min{a j , λ } − λ (5.10)


j∈C j∈C

is valid for X BK+1 .


Proof. As all a j are non-negative, then X BK+1 is a subset of of the set
( )
X̃ BK+1 = (x, y) ∈ {0, 1}n × R : ∑ a jx j − y ≤ b ,
j∈C

and, if (5.10) is valid for X̃ BK+1 , then it is valid also for X BK+1 .
After the substitution x̄ j for 1 − x j , the inequality

∑ a jx j − y ≤ b
j∈C
5.3 Mixed Knapsack Sets 141

takes the form


y + ∑ a j x̄ j ≥ λ .
j∈C

Dividing this inequality by ak , we obtain the inequality

aj 1 λ
∑ − ak x̄ j − ak y ≤ − ak , (5.11)
j∈C

to which we apply Theorem 4.3. Taking into account that ak > λ and ak ≥ a j for all
j ∈ C, we calculate

λ ak − λ
f =− − (−1) = ,
ak ak
aj ak − a j
f j = − − (−1) = ,
ak ak
fj − f λ −aj aj
= = 1− .
1− f λ λ
Applying (4.7) to (5.11), we obtain the inequality
1
− ∑ min{1, a j /λ }x̄ j − y ≤ −1,
j∈C λ

which, after multiplication by λ and the inverse substitution 1−x j for x̄ j , transforms
into (5.10). t
u

Example 5.5 We need to separate the point (x̃, ỹ) = 0, 1, 0, 1, 1, 34 , 21 , 0, 0 from the


mixed integer set

X = {(x, y) ∈ {0, 1}6 × R3+ :


5x1 + x2 + 3x3 + 2x4 + x5 + 8x6 − 2y1 − y2 + 3y3 ≤ 9}.

Solution. We should not be confused by the fact that there are more than one real
variable here. Since y3 ≥ 0, the set X is contained in the set

X 0 = {(x, y) ∈ {0, 1}6 × R2+ : 5x1 + x2 + 3x3 + 2x4 + x5 + 8x6 − 2y1 − y2 ≤ 9},

and any inequality valid for X 0 will be also valid for X. Further, after the substitution
s for 2y1 + y2 , the set X 0 transforms into the set

X̄ = {(x, s) ∈ {0, 1}6 × R+ : 5x1 + x2 + 3x3 + 2x4 + x5 + 8x6 − s ≤ 9}.

Let us start with the cover C = {2, 4, 5, 6} that is produced by the LP heuristic
applying to x̃ and the knapsack set obtained from X̄ by dropping the variable s. Since
λ = 1 + 2 + 1 + 8 − 9 = 3, by Theorem 5.3, the inequality
142 5 Cuts for Structured Mixed-Integer Sets

x2 + 2x4 + x5 + 3x6 − s ≤ 1 + 2 + 1 + 3 − 3 = 4

is valid for X̄. Therefore, the inequality

x2 + 2x4 + x5 + 3x6 − 2y1 − y2 ≤ 4

is valid for X, but at (x̃, ỹ) it is violated by 14 . t


u

5.3.1 Sequence Independent Lifting

We can strengthen (5.10) by calculating the coefficients of the variables x j for j 6∈ C


using a lifting technique, which differs from that discussed in Sect. 5.2. Define the
lifting function
(
φC (u) = min y + ∑ min{a j , λ }(1 − x j ) − λ :
j∈C
)
∑ a j x j − y ≤ b − u, y ∈ R, x j ∈ {0, 1} for j ∈ C .
j∈C

def
Let Cλ = { j ∈ C : a j > λ } and let r = |Cλ | > 0. List the elements of Cλ in order of
j
their weights, aπ1 ≥ aπ2 ≥ . . . aπr > λ , and then set A j = ∑i=1 aπi for i = 1, . . . , r. It
is easy to see that

λ ( j − 1)
 if A j−1 ≤ u ≤ A j − λ ,
φC (u) = λ ( j − 1) + u − (A j − λ ) if A j−1 − λ ≤ u ≤ A j ,

λ (r − 1) + u − (Ar − λ ) if u ≥ Ar − λ ,

and, therefore, φC is a superadditive function on R+ :

φC (u) + φC (v) ≤ φC (u + v) for all u, v ∈ R.

The following theorem establishes the role of superadditivity in the sequence in-
dependent lifting, i.e, when the order of lifting variables does not affect the lifting
coefficients.

Theorem 5.4. The following inequality

∑ min{a j , λ }x j + ∑ φC (a j )x j − y ≤ ∑ min{a j , λ } − λ (5.12)


j∈C j∈{1,...,n}\C j∈C

is valid for X BK+1 .


5.4 Simple Flow Structures 143

Proof. It is necessary to prove that any point (x∗ , y∗ ) ∈ X BK+1 satisfies (5.12). Let
Q = { j 6∈ C : x∗j = 1}. As φC is superadditive, we have

∑ min{a j , λ }(1 − x∗j ) − ∑ φC (a j )x∗j + y∗ − λ


j∈C j∈{1,...,n}\C

=y + ∑ min{a j , λ }(1 − x∗j ) − λ − ∑ φC (a j )
j∈C j∈Q
(
≥ min y + ∑ min{a j , λ }(1 − x j ) − λ :
j∈C
)
∑ a j x j − y ≤ b − ∑ a j , y ∈ R, x j ∈ {0, 1} for j ∈ C − ∑ φC (a j )
j∈C j∈Q j∈Q
!
= φC ∑ aj − ∑ φC (a j ) ≥ 0. t
u
j∈Q j∈Q

5.4 Simple Flow Structures

Let us consider a mixed-integer set

X = {(x, y) ∈ Rn+ × {0, 1}n : ∑ x j − ∑ x j ≤ b, x j ≤ u j y j for j = 1, . . . , n},


j∈N1 j∈N2

where b ∈ R, u ∈ Rn+ , N1 ∪ N2 = {1, . . . , n} and N1 ∩ N2 = 0.


/ Note, that IP (1.23),
which is a formulation of the FCNF problem, for each of |V | balance equations,
includes two such sets:
E(v)
Xv+ = {(x, y) ∈ R+ × {0, 1}E(v) : ∑ xe − ∑ xe ≤ dv ,
e∈E(V,v) j∈E(v,V )

xe ≤ ue ye , e ∈ E(v)},
E(v)
Xv− = {(x, y) ∈ R+ × {0, 1}E(v) : ∑ xe − ∑ xe ≤ −dv ,
e∈E(v,V ) j∈E(V,v)

xe ≤ ue ye , e ∈ E(v)}.

def
Here E(v) = E(v,V ) ∪ E(V, v).
A pair (C1 ,C2 ), where C1 ⊆ N1 and C2 ⊆ N2 , is called a generalized cover for the
set X if
∑ u j − ∑ u j = b + λ (C1 ,C2 ),
j∈C1 j∈C2

where λ (C1 ,C2 ) > 0 is the excess of the generalized cover.


Theorem 5.5. Let (C1 ,C2 ) be a generalized cover, λ = λ (C1 ,C2 ) and L2 ⊆ N2 \C2 .
Then the flow cover inequality
144 5 Cuts for Structured Mixed-Integer Sets

∑ xj − ∑ xj +
j∈C1 j∈N2 \(C2 ∪L2 )
(5.13)
∑ max{0, u j − λ } · (1 − y j ) − λ ∑ y j ≤ b + ∑ u j
j∈C1 j∈L2 j∈C2

is valid for X.

Proof. We have to prove that any point (x, y) ∈ X satisfies (5.13). Let C1+ = { j ∈
C1 : u j > λ } and T = { j ∈ N1 ∪ N2 : y j = 1}. Consider two cases.
1. |C1+ \ T | + |L2 ∩ T | = 0.

∑ x j + ∑ max{0, u j − λ } · (1 − y j )
j∈C1 j∈C1

= ∑ xj + ∑ (u j − λ )
j∈C1 ∩T j∈C1+ \T

= ∑ xj (since C1+ \ T = 0)
/
j∈C1 ∩T

≤ ∑ xj (since x j ≥ 0)
j∈N1

≤b+ ∑ xj (by definition of X)


j∈N2

=b+ ∑ xj + ∑ xj + ∑ xj
j∈C2 j∈L2 ∩T j∈N2 \(C2 ∪L2 )

≤b+ ∑ uj +0+ ∑ xj (since L2 ∩ T = 0)


/
j∈C2 j∈N2 \(C2 ∪L2 )

≤b+ ∑ uj +λ ∑ yj + ∑ xj .
j∈C2 j∈L2 j∈N2 \(C2 ∪L2 )

2. |C1+ \ T | + |L2 ∩ T | ≥ 1.

∑ x j + ∑ max{0, u j − λ } · (1 − y j )
j∈C1 j∈C1

= ∑ xj + ∑ (u j − λ )
j∈C1 ∩T j∈C1+ \T

≤ ∑ u j − |C1+ \ T | · λ (since x j ≤ u j )
j∈C1

≤ ∑ u j − λ + λ · |L2 ∩ T | (since − |C1+ \ T | ≤ −1 + |L2 ∩ T |)


j∈C1

=b+ ∑ uj +λ ∑ yj ≤ b+ ∑ uj +λ ∑ yj + ∑ xj . t
u
j∈C2 j∈L2 j∈C2 j∈L2 j∈N2 \(C2 ∪L2 )

Inequality (5.13) cuts off from conv(X) a number of vertices of the relaxation
polyhedron
5.4 Simple Flow Structures 145

{(x, y) ∈ Rn+ × [0, 1]n : ∑ x j − ∑ x j ≤ b, x j ≤ u j y j for j = 1, . . . , n}.


j∈N1 j∈N2

These vertices, (x̄, ȳ), are built as follows:


• x̄k = uk − λ , ȳk = (uk − λ )/uk for some k ∈ C1 with λ < uk , or x̄k = λ , ȳk = λ /uk
for some k ∈ L2 with λ < uk ;
• x̄ j = u j , ȳ j = 1 for j ∈ (C1 ∪C2 ) \ {k};
• x̄ j = 0, ȳ j = 0 ∨ 1 for j ∈ (N1 ∪ N2 ) \ (C1 ∪C2 ∪ {k}).

5.4.1 Separation for Flow Cover Inequalities

If we assume that L2 = N2 \C2 , x j = u j y j and u j > λ for all j ∈ C1 , then (5.13) takes
the form

∑ u j y j + ∑ (u j − λ ) (1 − y j ) ≤ b + ∑ u j + λ ∑ yj ,
j∈C1 j∈C1 j∈C2 j∈N2 \C2

or, after simplifications,

∑ u j − λ ∑ (1 − y j ) ≤ b + ∑ u j + λ ∑ yj .
j∈C1 j∈C1 j∈C2 j∈N2 \C2

Substituting λ for b − ∑ j∈C1 u j + ∑ j∈C2 u j and then dividing the result by λ , we have

∑ (1 − y j ) + ∑ yj ≥ 1
j∈C1 j∈N2 \C2

or
∑ (1 − y j ) − ∑ y j ≥ 1 − ∑ y j .
j∈C1 j∈C2 j∈N2

The last inequality is nothing else as a cover inequality for the knapsack set
( )
y ∈ {0, 1}n : ∑ u jy j − ∑ u jy j ≤ b ,
j∈N1 j∈N2

that was written for the cover C = C1 ∪C2 subject to the condition

∑ uj − ∑ uj = b+λ and λ > 0.


j∈C1 j∈C2

Based on the above reasoning, we can write down the following heuristic that
separates a given point (x̃, ỹ) from the set X by a flow cover inequality.
1. Solve the following 0,1-knapsack problem:
146 5 Cuts for Structured Mixed-Integer Sets

∑ (1 − ỹ j )z j − ∑ ỹ j z j → min,
j∈N1 j∈N2

∑ u j z j − ∑ u j z j > b,
j∈N1 j∈N2

z j ∈ {0, 1}, j ∈ N1 ∪ N2 .

Let z∗ be an optimal solution to this problem, and let C = { j ∈ N1 ∪ N2 : z∗j = 1}.


2. If, for C1 = N1 ∩C, C2 = N2 ∩C and L2 = { j ∈ N2 \C2 : λ ỹ j < x̃ j }, Ineq. (5.13)
is violated at (x̃, ỹ), then it is a required cut.
Example 5.6 We need to separate the point (x∗ , y∗ ) with

x∗ = (3, 0, 2, 1, 0, 0),
y∗ = (y∗2 , y∗3 , y∗4 , y∗5 , y∗6 ) = (0, 2/3, 1, 0, 0)

from the set

X = {(x, y) ∈ R6+ × {0, 1}{2,3,4,5,6} :


x1 + x2 + 2x3 − 3x4 − 2x5 − x6 ≤ 4,
x1 ≤ 3, x2 ≤ 3y2 , x3 ≤ 3y3 , x4 ≤ y4 , x5 ≤ 2y5 , x6 ≤ y6 }.

Solution. Introducing a new binary variable y1 and changing the variables

x¯1 = x1 , x¯2 = x2 , x¯3 = 2x3 , x¯4 = 3x4 , x¯5 = 2x5 , x̄6 = x6 ,

we map the set X onto the intersection of the set

X̄ ={(x̄, y) ∈ R6+ × {0, 1}6 :


x̄1 + x̄2 + x̄3 − x̄4 − x̄5 − x̄6 ≤ 4,
x̄1 ≤ 3y1 , x̄2 ≤ 3y2 , x̄3 ≤ 6y3 , x̄4 ≤ 3y4 , x̄5 ≤ 4y5 , x̄6 ≤ y6 }

and the hyperplane given by the equation y1 = 1. In addition, the point (x∗ , y∗ ) is
mapped into the point (x0 , y0 ) with

x0 = (3, 0, 4, 3, 0, 0), y0 = (1, 0, 2/3, 1, 0, 0).

Next, we try to separate (x0 , y0 ) from X̄. Therefore, we solve the following 0,1-
knapsack problem:
1
z3 − z4 → min,
3
3z1 + 3z2 + 6z3 − 3z4 − 4z5 − z6 > 4, (5.14)
z1 , z2 , z3 ,z4 , z5 , z6 ∈ {0, 1}.

We should not be confused by the fact that this problem is not quite similar to the
standard 0, 1-knapsack problem (1.28). But after the change of variables
5.5 Generalized Upper Bounds 147

z̄1 = 1 − z1 , z̄2 = 1 − z2 , z̄3 = 1 − z3 ,


z̄4 = z4 , z̄5 = z5 , z̄6 = z6 ,

(5.14) is rewritten as
1
1 − z̄2 + (1 − z̄3 ) − z̄4 → min,
3
3(1 − z̄1 ) + 3(1 − z̄2 ) + 6(1 − z̄3 ) − 3z̄4 − 4z̄5 − z̄6 ≥ 5,
z̄1 , z̄2 , z̄3 , z̄4 , z̄5 , z̄6 ∈ {0, 1},

or, after rearranging,


1
z̄2 + z̄3 + z̄4 → max,
3
3z̄1 + 3z̄2 + 6z̄3 + 3z̄4 + 4z̄5 + z̄6 ≤ 7,
z̄1 , z̄2 , z̄3 ,z̄4 , z̄5 , z̄6 ∈ {0, 1}.

The point z̄∗ = (0, 1, 0, 1, 0, 0) is an optimal solution to this program. Returning to


the original variables, we obtain an optimal solution z∗ = (1, 0, 1, 1, 0, 0) to (5.14).
The point z∗ defines the generalized cover C1 = {1, 3}, C2 = {4} with the excess
λ = 2. Since L2 = { j ∈ {5, 6} : λ y0j < x0j } = 0,
/ we get the inequality

x̄1 + x̄3 − x̄5 − x̄6 + 1 · (1 − y1 ) + 4 · (1 − y3 ) ≤ 4 + 3.

Returning to the original variables x j , and setting y1 = 1, we obtain the inequality

x1 + 2x3 − 2x5 − x6 − 4y3 ≤ 3,

which is valid for X, but is violated at (x∗ , y∗ ) by 34 . t


u

5.5 Generalized Upper Bounds

A generalized upper bound (GUB) is an inequality ∑ j∈C x j ≤ 1 that involves only


binary variables x j . In practice, such restrictions are very common. For example,
let us recall the set-packing problem (2.1) and the representation of discrete vari-
ables (1.2).
Let A be a 0,1-matrix of size m × n, and e be a vector of m ones. The convex hull
of the binary set {x ∈ {0, 1}n : Ax ≤ e} is called a packing polytope. The intersection
def
graph of the 0,1-matrix A, which is denoted by G(A), has N = {1, . . . , n} as the set
of vertices, and a pair of vertices, (i, j), is an edge in G(A) if the columns Ai and
A j have one in the same row. Using G(A), one can derive a number of classes of
inequalities that are valid for the packing polytope.
148 5 Cuts for Structured Mixed-Integer Sets

5.5.1 Clique Inequalities

Let C ⊆ N be the set of vertices of some clique in G(A) (any two vertices from C
are connected by an edge in G(A)). Then the clique inequality

∑ xj ≤ 1
j∈C

is valid for the packing polytope. Note that in this way we can obtain new inequali-
ties not present in the original system. For example, for the system

x1 + x2 ≤ 1, x2 + x3 ≤ 1, x3 + x1 ≤ 1,

the set {1, 2, 3} is a clique in G(A), and the click inequality x1 + x2 + x3 ≤ 1 does
not belong to the system.
The separation problem for the class of clique inequalities is formulated as
follows: given a point x̃ ∈ [0, 1]n , find in G(A) a clique C∗ of maximum weight
w = ∑ j∈C∗ x̃ j . If w > 1, the clique inequality ∑ j∈C∗ x j ≤ 1 is violated at x̃; otherwise,
x̃ satisfies all the clique inequalities.
It is known that this separation problem for the clique inequalities is NP-hard.
Surprisingly, there is a wider class of inequalities that includes all clique inequal-
ities and for which there is a polynomial separation algorithm. Unfortunately, this
polynomial algorithm is too time consuming to be used in practice. Therefore, in
practice, the separation problem for the clique inequalities is solved with the help of
several heuristic procedures. One of them was specially developed for the case when
x̃ is a solution (or only a part of solution) to the relaxation LP of a MIP containing a
system of binary inequalities, Ax ≤ e, among its constraints.
Suppose that one of the inequalities from the system Ax ≤ e is fulfilled as an
equality. Let this be an inequality i0 and let C = { j : ai0 , j = 1, x̃ j > 0}. First, we
sort the components x̃ j for j ∈ N \ C in non-increasing order. Let j1 , . . . , jk be a
required order. Then for i = 1, . . . , k, if C ∪ { ji } is a clique in G(A), add ji to C. If
we can thus add at least one index ji with x ji > 0, then as a result we get a clique
inequality ∑ j∈C x j ≤ 1 that is violated at x̃.

5.5.2 Odd Cycle Inequalities

A list ( j1 , . . . , jk , jk+1 = j1 ) of vertices is a cycle of length k in G(A) if any of


its two neighboring vertices are adjacent (connected by an edge). The vertex set,
C = { j1 , . . . , jk }, of an odd cycle (k is odd) in G(A) induces the odd cycle inequality

|C| − 1
∑ xj ≤ 2
, (5.15)
j∈C
5.5 Generalized Upper Bounds 149

which is valid for the packing polytope. In fact, (5.15) is a Chvátal-Gomory cut
since we can derive it by adding together the inequalities

x j1 + x jk ≤ 1 and x ji + x ji+1 ≤ 1, i = 1, . . . , k − 1,

dividing the result by 2, and rounding down the right-hand side.


For the class of odd hole inequalities, there exists a polynomial separation
procedure. We define a bipartite digraph H with the vertex set N ∪ N 0 , where
N 0 = {n + 1, . . . , 2n}, and, for each edge (i, j) from G(A), H has two arcs, (i, n + j)
and (n+i, j) of weight 1−2x̃i , and two arcs, ( j, n+i) and (n+ j, i) of weight 1−2x̃ j .
It is not hard to see that a shortest path in H from a vertex i to the vertex n + i (if
such a path exists) corresponds to the shortest odd cycle in G(A) passing through i.
We denote by C∗ the set of vertices of this cycle. If ∑ j∈C∗ (1 − 2x̃ j ) is less than 1,
∗ ∗
then ∑ j∈C∗ x̃ j > |C 2|−1 , and the odd cycle inequality ∑ j∈C∗ x j ≤ |C 2|−1 is violated at
x̃.
In conclusion, we note that, since H has no cycles of negative weight, then a
shortest path between any pair of its vertices can be found in polynomial time.

5.5.3 Conflict Graphs

A conflict graph represents logical dependencies between binary variables of a MIP.


Let us consider a solution set X = P(A, b; S) of a MIP involving binary variables.
The conflict graph GX for the set X contains two vertices, j0 and j1 , for each binary
variable x j . For α, β ∈ {0, 1}, the edge (iα , jβ ) belongs to the conflict graph if the
set {x ∈ X : xi = α, x j = β } is empty.
For a given MIP, the conflict graph subsumes the intersection graph, i.e., any edge
of the intersection graph is also an edge of the conflict graph. The conflict graphs
can be used for generating cuts in the same way as the intersection graphs.
If a vertex set C induces a clique in GX , then the inequality

∑ x j + ∑ (1 − x j ) ≤ 1
j1 ∈C j0 ∈C

is valid for X. For example, for the set X of the solutions to the following system

x1 + 2x2 − x3 ≤ 1,
3x1 + x2 − 2x3 ≤ 2,
x1 , x2 , x3 ∈ {0, 1} ,

the conflict graph, GX , has three edges (11 , 21 ), (21 , 30 ) and (11 , 30 ). Therefore, the
set of vertices, C = {11 , 21 , 30 }, is a clique in GX , and the inequality

x1 + x2 + (1 − x3 ) ≤ 1 or x1 + x2 − x3 ≤ 0
150 5 Cuts for Structured Mixed-Integer Sets

is valid for X.
Similarly, if C is the vertex set of an odd cycle in GX , then the inequality

|C| − 1
∑ x j + ∑ (1 − x j ) ≤ 2
j1 ∈C j0 ∈C

is valid for X.

5.6 Notes

The idea of obtaining strong inequalities exploiting the structure of an NP-hard


problem being solved has its roots in [44, 43].
Sect. 5.1. The cover inequalities, as is often the case, independently discovered sev-
eral authors [10, 72, 104, 138].
Sect. 5.2. The concept of lifting inequalities was already present in [61], later it was
extended in [103, 139]. The technique of sequential lifting was proposed in [104]
and has since become widespread and is now implemented in many commercial
MIP libraries (see [67]). The idea of lifting inequalities for feasible sets was pro-
posed in [136].
Sect. 5.3. The role of superadditivity in the sequence-independent lifting was inves-
tigated in [69, 140]. Theorem 5.4 was proved in [89].
Sect. 5.4. The flow cover inequalities were introduced in [108]. The lifting procedure
for these inequalities was elaborated in [68].
Sect. 5.5. The packing polytope was actively studied in the 1970s. For a discussion
of the results of these studies and bibliographic references, see, for example, [98].
The clique inequalities appeared in [52, 103]. The separation problem for the clique
inequalities is NP-hard (see Theorem 9.2.9 in [64]). Therefore, it is surprising that
there is a wider class of inequalities that includes all clique inequalities, and the
separation problem for which is polynomially solvable (see [64, 87]).
The inequalities for odd cycles were introduced in [103]. The separation algo-
rithm for these inequalities is based on the algorithm from Lemma 9.1.11 in [64].
The use of the conflict graphs is discussed in [8].
Sect. 5.7. The statements of Exercises 5.4 and 5.5 were taken, respectively, from
[108] and [9].

5.7 Exercises

5.1. Write down an inequality that cuts off just one given point a ∈ {0, 1}n from the
0,1-cube {0, 1}n .
5.2. Find a lifted cover inequality that separates a given point x̃ from a given set X:
5.7 Exercises 151
T
a) x̃ = 0, 0, 43 , 43 , 1 , X = {x ∈ {0, 1}5 : 8x1 + 7x2 + 6x3 + 6x4 + 5x5 ≤ 14};
T
b) x̃ = 0, 68 , 34 , 43 , 0 , X = {x ∈ {0, 1}5 : 12x1 + 8x2 + 6x3 + 6x4 + 7x5 ≤ 15};
T
c) x̃ = 0, 0, 21 , 61 , 1 , X = {x ∈ {0, 1}5 : 10x1 − 9x2 + 8x3 + 6x4 − 3x5 ≤ 2}.

5.3. For given set X and point (x̃, ỹ), find a flow cover inequality that separates (x̃, ỹ)
from X:

a) X = {(x, y) ∈ R3+ × {0, 1}3 : x1 + x2 + x3 = 7,


x1 ≤ 3y1 , x2 ≤ 5y2 , x3 ≤ 6y3 },
 T
2
(x̃, ỹ) = 2, 5, 0; , 1, 0 ;
3
b) X = {(x, y) ∈ R6+ × {0, 1}6 : 2x1 + x2 + x3 − x4 − 2x5 − x6 = 4,
3 5
x1 ≤ y1 , x2 ≤ 3y2 , x3 ≤ 6y3 , x4 ≤ 3y4 , x5 ≤ y5 , x6 ≤ y6 },
2 2
 T
3 2
(x̃, ỹ) = , 3, 0, 0, 1, 0; 1, 1, 0, 0, , 0 .
2 5

Here, in the representations of (x̃, ỹ), the x̃ and ỹ parts are separated by semicolons.
5.4. Consider the set
( )
n
n
X= (x, y) ∈ {0, 1} × Rn+ : ∑ y j ≤ b, y j ≤ a j x j for j = 1, . . . , n .
j=1

def
Let C ⊆ {1, . . . , n} and λ = ∑ j∈C a j − b > 0. Using Theorem 5.3, prove that the
inequality
∑ (y j + max{0, a j − λ }(1 − x j )) ≤ b
j∈C

is valid for X.
5.5. Consider the solution set X of the following system:

s+ ∑ x j − ∑ x j ≥ b,
j∈N1 j∈N2
(5.16)
0 ≤ x j ≤ y j, j = 1, . . . , n,
s ≥ 0, y ∈ Zn+ ,

where (N1 , N2 ) is a partition of the set N = {1, . . . , n}. Let f = b − bbc. Prove the
following statements:
a) conv(X) is described by the inequalities that determine the relaxation polyhedron
for (5.16), and the inequalities

s+ f ∑ y j + ∑ x j ≥ f dbe + ∑ (x j − (1 − f )y j ) , (5.17)
j∈L1 j∈R1 j∈L2
152 5 Cuts for Structured Mixed-Integer Sets

for all partitions (L1 , R1 ) of N1 , and all L2 ⊆ N2 ;


b) a point (s, x, y) ∈ R+ × Rn+ × Zn+ satisfies (5.17) if and only if the following non-
linear inequality holds:

s+ ∑ min{ f y j , x j } ≥ f dbe + ∑ max{0, x j − (1 − f )y j }.


j∈N1 j∈N2

5.6. Let us consider (1.7), which is the system of inequalities that describes the truth
sets of the CNF given by (1.6). Suppose that q ∈ Sk1 ∩ Sl0 . Prove that the following
inequality
∑ x j + ∑ (1 − x j ) ≥ 1
j∈(Sk1 ∪Sl1 )\{q} j∈(Sk0 ∪Sl0 )\{q}

is valid for the truth sets of this CNF.


5.7. Prove that the constraint matrix in (1.15), which is a formulation of the trans-
portation problem, is totally unimodular.
5.8. The parity polytope is the convex hull of the set of 0, 1-solutions to the compar-
ison
n
∑ xj ≡ 0 (mod 2).
j=1

Prove that this polytope coincides with the set of solutions to the following sys-
tem of linear inequalities:

∑ xj − ∑ x j ≤ |S| − 1, S ⊆ {1, . . . , n}, |S| is odd,


j∈S j∈{1,...,n}\S

0 ≤ x j ≤ 1, j = 1, . . . , n.

How to solve the separation problem for this system of inequalities?


5.9. Minimum distance of a linear code. A binary m × n matrix H (parity check
matrix) is given. Among the binary vectors x satisfying the comparison Hx ≡ 0
(mod 2), we need to find a nonzero vector of minimum weight (with a minimum
number of ones). Using Exercise 5.8, formulate this problem as an IP.
5.10. Consider the packing polytope P and let G denote the intersection graph of
the constraint matrix defining P (see Sect. 5.5 for the definitions and notations used
below). Show that
a) a click inequality is facet defining for P if and only if it is induced by the maximal
(by inclusion) click in G;
b) an odd cycle inequality is facet defining for P if and only if it is induced by a
cordless odd cycle (also known as odd hole), i.e., no two non-neighboring ver-
tices of this cycle are adjacent in G.
Chapter 6
Branch-And-Cut

Currently the main method of solving MIPs is the branch-and-cut method since it
is used in all (in all!) known competitive modern MIP solvers. Briefly, the branch-
and-cut method is a combination of the branch-and-bound and cutting-plane algo-
rithms. In this chapter we present a general schema of the branch-and-cut method,
and also discuss its most important components. In the last section of this chapter
we demonstrate an application of this method for solving MIPs with exponentially
many inequalities. Specifically, we consider a branch-and-cut algorithm for the trav-
eling salesman problem. This algorithm is considered as one of the most impressive
successful applications of the branch-and-cut method.

6.1 Branch-And-Bound

We will consider MIP (1.1) with two sided constraints. The basic structure of the
branch-and-bound method is the search tree. The root (or root node) of the search
tree corresponds to the original MIP. The search tree grows through a process called
branching that creates two or more descendants for one of the leaves of the current
search tree. Each of the MIPs in the child nodes is obtained from the parent MIP
by adding one or more new constraints that are usually upper or lower bounds for
integer variables. It should also be noted that in the process of branching, we should
not lose feasible solutions: the union of the feasible domains of the child MIPs must
be the feasible domain for their parent MIP.
But if the search tree only grew (via branching), then even for relatively small
MIPs the tree size could be huge. On the contrary, one of the main ideas of the
branch-and-bound method is to prevent an uncontrolled growth of the search tree
This is achieved by cutting off the ”hopeless” branches of the search tree. We eval-
uate the prospects of the nodes of the search tree by comparing their upper bounds
with the current lower bound. In the LP based branch-and-bound method, the up-
per bound at any node k is the optimal objective value, γ(k), of the relaxation LP
at this node. The lower bound (or record), R, is the largest value of the objective

153
154 6 Branch-And-Cut

function attained on the already found feasible solutions of the original MIP. The
best of these solutions is called a record solution. If γ(k) ≤ R, then node k and all its
descendants are cut off from the search tree.

branch-and-bound(c, b1 , b2 , A, d 1 , d 2 , S; xR , R);
{
Compute x0 ∈ arg max{cT x : b1 ≤ Ax ≤ b2 , d 1 ≤ x ≤ d 2 };
if (xS0 ∈ ZS ) { // change the record and record solution
R = cT x0 ; xR = x0 ; return;
}
initialize the list of active nodes with one node (x0 , d 1 , d 2 );
while (the list of active nodes is not empty) {
select a node, N = (x0 , d 1 , d 2 ), from the list of active nodes;
if (cT x0 ≤ R)
continue;
select a fractional component xi0 for i ∈ S;
compute x1 ∈ arg max{cT x : b1 ≤ Ax ≤ b2 , d 1 ≤ x ≤ d 2 (i, bxi0 c)};
if (cT x1 > R) {
if (xS1 ∈ ZS ) { // change the record and record solution
R = cT x1 ; xR = x1 ;
}
else
add the node (x1 , d 1 , d 2 (i, bxi0 c)) to the list of active nodes;
}
compute x2 ∈ arg max{cT x : b1 ≤ Ax ≤ b2 , d 1 (i, dxi0 e)) ≤ x ≤ d 2 };
if (cT x2 > R) {
if (xS2 ∈ ZS ) { // change the record and record solution
R = cT x2 ; xR = x2 ;
}
else
add the node (x2 , d 1 (i, dxi0 e), d 2 ) to the list of active nodes;
}
}
}

Listing 6.1. Branch-and-bound method for solving MIPs

A very general version of the LP-based branch-and-bound method for solving


MIPs is shown in Listing 6.1. The input of the branch-and-bound procedure consists
of the parameters describing a MIP, a constraint matrix A, vectors c, b1 , b2 , d 1 ,
d 2 and a set S of integer variables, as well as a feasible solution xR (initial record
6.1 Branch-And-Bound 155

solution) and R = cT xR (initial record). There are MIPs for which it is difficult to
find an initial feasible solution, in such cases R is set to −∞. If R = −∞ when the
branch-and-bound procedure terminates, then the MIP being solved does not have
feasible solutions; otherwise, xR is an optimal solution to this MIP. In the description
of the method we use the following notation
(
def d j , if j 6= i,
d(i, α) =
α, if j = i.

It follows from the description of the branch-and-bound procedure that each


node of the search tree stores not only the upper and lower bounds for the values of
variables but also an optimal solution to the relaxation LP. In practice, instead of an
optimal solution to the relaxation LP, it is better to store at each node a description
of an optimal basis, so that the dual simplex method later will be able to quickly
reoptimize the relaxation LPs for the child nodes.
The branch-and-bound procedure from Listing 6.1 allows for ambiguities in the
selection of a node from the list of active nodes and in the choice of a variable for
branching on it if there are several integer variables taking fractional values. There
exist a few competitive strategies to make the procedure unambiguous. A simple and
at the same time quite efficient method is to select a node with the maximum upper
bound. Unfortunately, there is no rule — both simple and efficient for most MIPs —
to select a variable for branching. In the literature, the most commonly mentioned
rule recommends choosing a variable with the most fractional value (which frac-
tional part is closest to 21 ). But computational experiments have proved that this rule
is no better than the one that recommends choosing a variable for branching ran-
domly. Let us postpone a detailed discussion of the branching rules until Sect. 6.3.
We should not be confused by the fact that the search tree is not mentioned in the
branch-and-bound procedure. In fact, the procedure ”builds” (although implicitly)
a search tree, and the leaves of this tree constitute the list of active nodes.
Let us demonstrate the work of the branch-and-bound method on a simple exam-
ple.
Example 6.1 We need to solve the following IP:

x1 + 2x2 → max,
1: −2x1 + 3x2 ≤ 4,
2: 2x1 + 2x2 ≤ 11,
3: 1 ≤ x1 ≤ 4,
4: 1 ≤ x2 ≤ 5,
x1 , x2 ∈ Z.

Solution. The search tree is shown in Fig. 6.1. The nodes of this tree are num-
bered from 0 (the root node corresponding to the original MIP) to 5. Each node is
represented as a rectangle that, for the relaxation LP at this node, stores the feasible
intervals of both variables, as well as an optimal solution and the optimal objective
156 6 Branch-And-Cut

value (upper bound). The relaxation LPs are solved by the dual simplex method
starting with an optimal basis for the parent node relaxation LP.

0 γ(0) = 8
x(0) = (5/2, 3)
1 ≤ x1 ≤ 4
1 ≤ x2 ≤ 5

1 γ(1) = 7 2 γ(2) = 8
x(1) = (2, 8/3) x(2) = (3, 5/2)
1 ≤ x1 ≤ 2 3 ≤ x1 ≤ 4
1 ≤ x2 ≤ 5 1 ≤ x2 ≤ 5

3 γ(3) = 7 4 has no
x(3) = (7/2, 2) solutions
3 ≤ x1 ≤ 4 3 ≤ x1 ≤ 4
1 ≤ x2 ≤ 2 3 ≤ x2 ≤ 5

5 γ(5) = 7
x(5) = (3, 2)
3 ≤ x1 ≤ 3
1 ≤ x2 ≤ 2

Fig. 6.1 Search tree for IP of Example 6.1

Initially, we set R = −∞. In this example, the objective function takes only integer
values for all feasible solutions, therefore, we take as the upper bound, γ(k), not the
value of cT x(k) but its integer part bcT x(k) c. Here x(k) denotes an optimal solution to
the relaxation LP at node k. The iterations performed by the algorithm follow below.
They are numbered by the pairs i. j, where i is the node index, and j is the iteration
index of the dual simplex method.
 
10
0.0. I = (3, 4), B−1 = , x = (4, 5)T , π = (1, 2)T .
01
 
1 0
0.1. s = 1, u = (−2, 3)T , λ = 32 , t = 2, I = (3, 1), B−1 = 2 1 ,
3 3
x = (4, 4)T , π = ( 73 , 32 )T .
6.1 Branch-And-Bound 157
" #
3
− 15
0.2. s = 2, u = ( 10 2 T
3 , 3) , λ= 7
10 , t = 1, I = (2, 1), B−1 = 10
1 1
,
5 5
7 1 T
x = x(0) = ( 25 , 3)T , π = ( 10 , 5 ) , γ(0) = 8.

Since the solution x(0) to the relaxation LP is not integral, we form the root
(node 0) of the search tree, and then we perform branching on the variable x1 taking
a fractional value.
 
3 1 T 7 −1 1 0
1.1. s = 3, u = ( 10 , − 5 ) , λ = 3 , t = 1, I = (3, 1), B = 2 1 ,
3 3
x(1) = (2, 83 )T , π = ( 73 , 23 )T , γ(1) = 7.
Since the solution x(1) is not integer, and γ(1) = 7 > −∞ = R, we add node 1 to
the search tree.
 
0 −1
2.1. s = −3, u = (− 10 , 5 ) , λ = 1, t = 2, I = (2, −3), B−1 = 1
3 1 T
,
2 1
x(2) = (3, 25 )T , π = (1, 1)T , γ(2) = 8

Since the solution x(2) at node 2 is also not integer and γ(2) = 8 > −∞ = R, we
add this node to the search tree.
Among the active nodes, which are the tree leaves, node 2 has the maximum
upper bound. So, we choose this node to branch on the variable x2 .
1 
1 T −1 2 −1
3.1. s = 4, u = ( 2 , 1) , λ = 1, t = 2, I = (2, 4), B = ,
0 1
x(3) = ( 72 , 2)T , π = ( 12 , 1)T , γ(3) = 7.

Since the solution x(3) at node 3 is not integer, and γ(3) = 7 > −∞ = R, we add
this node to the search tree.
4.1. s = −4, u = (−1/2, −1)T . Since all components of vector u are non-positive,
then the relaxation LP at this node has no feasible solutions. In this case, we do
not need to add node 4 to the search tree. Nevertheless, in Fig. 6.1 this node is
present there to make the tree more informative.
From two active nodes, 1 and 3, with the maximal upper bound 7, we select
node 3, which is farther from the root, and perform branching on the variable x1 .
 
10
5.1. s = 3, u = (1/2, −1)T , λ = 1, t = 1, I = (3, 4), B−1 = ,
01
x(5) = (3, 2)T , π = (1, 2)T , γ(5) = 7.
Since the solution x(5) is integer and γ(5) = 7 > −∞ = R, we change the record
and the record solution: R = 7, xR = (3, 2)T . In Fig. 6.1, node 5 is also presented for
informative purposes.
As the upper bounds for the nodes 1 and 3 are equal to the current record, then
node 1 and the right branch of node 3, which we have not created yet, must be cut
off.
158 6 Branch-And-Cut

Since there are no more unprocessed nodes in the search tree (the list of active
nodes is empty), the current record solution xR = (3, 2)T is optimal. t
u

6.2 Branch-And-Cut

The branch-and-cut method is a branch-and-bound method in which cuts are gen-


erated when solving the relaxation LPs at all (or only some) nodes of the search
tree. At first glance it may seem that these changes are insignificant. In practice, this
changed the whole philosophy of integer programming. A simplified block diagram
of the branch-and-cut method is shown in Fig. 6.2.
Two values that determine the behavior of the branch-and-cut method (as well as
the branch-and-bound method) are the lower bound (record) and the upper bounds
(optimal objective values for relaxation LPs) at the nodes of the search tree. In the
branch-and-bound method, when processing a node of the search tree, the main
goal is to solve its relaxation LP as quickly as possible. The branch-and-cut method
performs much more work at each node generating cuts in order to minimize its
upper bound. In this case, the duality gap (the difference between the upper and
lower bounds) at the node also decreases.
Unlike the ”pure” cutting plane algorithms, we now do not expect that adding
cuts will be sufficient to find an optimal solution. It is also worth noting that earlier,
as a rule, only one inequality was generated to cut off the solution of the relaxation
LP. Today such a strategy is considered bad, now cuts are added by groups of many
inequalities.
In practice, it is very important to determine when to stop generating new cuts and
proceed to branching. If many cuts are added at each node, it may take significantly
longer to reoptimize the relaxation LPs. A reasonable strategy is to watch how the
duality gap is decreasing. If there is no substantial progress for several cut generation
rounds, then it is time to stop. In this case, after each cut generation round, it is
reasonable to remove from the active (solved at the moment) LP those cuts that
are not satisfied ”almost” as equalities. Some of such cuts are permanently removed
from the system, and the remaining ones are moved to a special repository called the
cut pool. Before adding the processed node to the list of active nodes (to the search
tree), all cuts present in the constraint system of the active LP but not in the pool are
recorded there. Later, when this node is selected for processing, its relaxation LP is
restored by extracting all the necessary inequalities (cuts) from the pool.
It may seem that the pool is needed only to restrict the sizes of the node LPs.
But if cuts are added not only at the root node of the search tree, then we need the
pool for one more important reason. The problem is that not all generated cuts are
global inequalities, i.e., they are valid for all nodes of the search tree. This hap-
pens, in particular, because the cut generating procedures use the bounds imposed
on the variables (inequalities of the form x j ≤ (≥)d), which are different for differ-
ent nodes of the search tree. A local inequality is valid for a particular node and all
6.2 Branch-And-Cut 159


START


?
Preprocessing  Select active node 

?
- Solve LP

?
HHH  Is H H
 Has H No - node list HH No
HH solution?  HH empty? 
HH HH
Yes ? Yes
Generating
cuts
? 
STOP


?
HHH
Yes  New H 6
HH cuts? 
HH
No ? Yes
 H IsHH
 HH
No - node list HH No
HH γ > R? 
 H HH empty? 
HH HH
Yes ? 6
IsHH
 solution HH Yes - Change
H
HHfeasible?  record solution
H
No
?
Node
heuristic

H ?
 New HH

Yes - Change
HH record? H
  record solution
HH

No ?
Perform branching
- and add new nodes
to list of active nodes

Fig. 6.2 Block diagram of the branch-and-cut method


160 6 Branch-And-Cut

its descendants; for other nodes it may not be valid. Therefore, such an inequality
can not be present in the active constraint matrix permanently, and the pool is the
best place to store it.
Another way to reduce the duality gap at a node is to use node heuristics to
increase the lower bound (record). The idea of the node heuristics is simple. Each
time some tree node is processed, we can try to somehow ”round off” a solution to its
relaxation LP. Usually the rounding consists in performing some type of ”diving”,
when the values of a group of integer variables with ”almost integer” values are
fixed, the resulting LP is reoptimized, then another group of variables is fixed, and so
on until an integer solution is obtained, or it is proved that fixing variables resulted in
an inconsistent constraint system. When we are lucky to get a new feasible solution
better than the record one, then the lower bound is increased allowing us to eliminate
some active nodes of the search tree, and thereby speed up the solution process.
Example 6.2 We need to solve the following IP:

x1 + 3x2 + x3 + 2x4 → max,


4x1 + 7x2 + 3x3 + 5x4 ≤ 10,
(6.1)
5x1 + 4x2 + 6x3 + 2x4 ≤ 9,
x1 , x2 , x3 , x4 ∈ {0, 1}.

Solution. Let us agree to generate only the cover inequalities, which are always
global and, therefore, they are valid for all nodes of the search tree shown in Fig. 6.3.

0 γ(0) = 3 12
1 1 1 1 T
x(5) =

2, 2, 2, 2
 Q
x1 = 0  Q x1 = 1
 Q

 QQ
1 γ(1) = 3 2 γ(2) = 3
x(6) = (0, 0, 1, 1)T x(7) = (1, 0, 0, 1)T

Fig. 6.3 Search tree for IP (6.1)

0. First, we solve the relaxation LP for (6.1). Its optimal solution is the point
T
x(1)= 0, 1, 0, 35 , which violates the inequality

x2 + x4 ≤ 1

written for the knapsack cover C11 = {2, 4} of the first knapsack constraint. Adding
this inequality to the constrain system, after reoptimizing, we get the solution x(2) =
1 5
T
3 , 1, 9 , 0 , which violates the inequality
6.3 Branching 161

x1 + x2 ≤ 1

induced by the cover C21 = {1, 2} of the first knapsack inequality. Adding this in-
T
equality and reoptimizing, we get the third solution x(3) = 0, 1, 56 , 0 , which vio-
lates the inequality
x2 + x3 ≤ 1
written for the cover C12 = {2, 3} of the second knapsack constraint. Again, adding
T
this inequality and reoptimizing, we find the solution x(4) = 59 , 49 , 95 , 95 , which
violates the inequality
x1 + x3 ≤ 1
induced by the cover C22 = {1, 3} of the second knapsack constraint. Adding this
T
inequality and reoptimizing, we obtain the solution x(5) = 12 , 12 , 12 , 21 , which sat-
isfies all cover inequalities for both knapsack sets of our IP. So, we turn to branching
on the variable x1 .
1. Now we solve the relaxation LP for node 1 (x1 = 0). Its optimal solution,
x(6) = (0, 0, 1, 1)T , is integer and, therefore, it is declared as the first record solution,
xR = (0, 0, 1, 1)T , and the record is set to R = cT xR = 3.
2. An optimal solution x(7) = (1, 0, 0, 1)T to the relaxation LP at node 2 (x1 = 1)
is also integer. The objective value on this solution is equal to 3. Therefore, both
points, x(6) and x(7) , are optimal solutions to (6.1). t
u

6.3 Branching

The branching strategy used in a particular implementation of the branch-and-cut


method significantly affects the performance of this implementation, especially in
cases where MIP formulations are not strong enough, and the cuts are inefficient. A
branching strategy includes a rule for selecting the next active node and a branching
rule. In practice, there are many different rules for selecting a node for branching.
For example, let us mention only a few such rules:
• maxCOST: first choose a node with the maximum upper bound, which is the
optimal objective value of the relaxation LP;
• DFS: first choose a node of maximum depth (the depth of a node is the distance
from it to the root of the search tree);
• DFS/maxCOST: DFS before obtaining the first valid solution, then switching to
the maxCOST strategy;
• for the first k (say k = 255) nodes, use the maxCOST strategy, then if no feasible
solution was found, switch to the DFS/maxCOST strategy.
When a node, say node q, has been selected, we need to determine a variable on
which to branch. Let x̄q and z(q) be an optimal solution and the optimal objective
162 6 Branch-And-Cut

value of the relaxation LP at node q, and let z̃− +


i (q) and z̃i (q) be the optimal objective
values of the relaxation LPs for the left (with the restriction xi ≤ bx̄iq c) and right (with
the restriction xi ≥ dx̄iq e) descendants of node q. As a rule, in practice, we are limited
to computing some estimates λi− (q) and λi+ (q) for the decrements z(q) − z− i (q) and
z(q) − z+i (q) of the objective value at node q. To perform branching, we select a
variable for which the weighted estimate

λi (q) = (1 − µ) · min{λi− (q), λi+ (q)} + µ · max{λi− (q), λi+ (q)}

is maximum. Here µ is a number between zero and one.


Next, we discuss some of the most commonly used rules for calculating the es-
timates λi (q). Note that in practice, various combinations of those rules are also
applied. In addition, branching on variables is not the only branching method used
in the branch-and-cut algorithms. We will discuss some other branching methods in
Sects. 6.3.2, 7.2.2 and 7.3.3.

Branching on Most Fractional Variables

This is one of the simplest and most frequently used rules according to which a
variable with a fractional part closest to 0.5 is chosen. This corresponds to the cal-
culation of the estimates by the following rule:

λi (q) = 0.5 − |x̄i − bx̄i c − 0.5|.

In other words, this rule chooses for branching that variable, which is ”harder” to
round off. Unfortunately, numerical experiments have proved that in practice this
rule is not better than the rule that chooses a variable for branching randomly.

Pseudocost Branching

This complex rule remembers all successful branchings over all variables. The val-
ues
def q q
ζi− (q) = (z(q) − z−
i (q))/(x̄i − bx̄i c),
def q q
ζi+ (q) = (z(q) − z+
i (q))/(dx̄i e − x̄i )

define the average (per unit of change of the variable) decrement of the objective
function for the left and right descendants of node q, respectively. By this definition,
the value of ζi− (q) or ζi+ (q) is infinity if the corresponding relaxation LP does not
have a solution. In such a case, we can set either of these values to be equal to the
integrality gap at node q if the latter is finite, or to some predefined penalty. Further,
let ξi− (q) (resp., ξi+ (q)) denote the sum of ζi− (q0 ) (resp., ζi+ (q0 )) for all nodes q0
processed before the processing of node q starts, and for which the branching was
6.3 Branching 163

performed on the variable xi . Let ηi− (resp., ηi+ ) be the number of such nodes. For
any variable xi , the left and right pseudocosts are determined by the formulas
def def
Ψi− (q) = ξi− (q)/ηi− (q) , Ψi+ (q) = ξi+ (q)/ηi+ (q) .

Defining

λi− (q) = Ψi− (q) · (x̄iq − bx̄iq c),


λi+ (q) = Ψi+ (q) · (dx̄iq e − x̄iq ),

we determine the pseudocost branching rule.


In early stages of the branch-and-cut algorithm almost all pseudocosts are zeroes.
Therefore, applying pseudocost branching in the earliest most important stages of
the algorithm, in fact, we will choose a variable for branching randomly. In practice,
hybrid methods are used to select a variable for branching: at early stages some other
branching rule is applied, and the algorithm switches to the pseudocost branching
after it accumulates enough information.

Strong Branching

In its pure form, strong branching assumes the calculation of exact estimates

λi− (q) = z(q) − z−


i (q), λi+ (q) = z(q) − z+
i (q)

for all integer variables taking fractional values. Since the calculation of all the es-
timates takes too much time, the strong branching procedure can be modified as
follows. First, a relatively small subset C of integer variables with fractional values
(for example, 10% of all candidates with largest pseudocosts) is chosen. The next
simplification is that, when estimating the decrements z(q)−z− +
i (q) and z(q)−zi (q),
only some fixed number of iterations of the dual simplex method is accomplished.
This simplification is motivated by the fact that, for the dual simplex method, the av-
erage per iteration decrease of the objective value usually decreases with the number
of iterations performed. However, this observation is not valid for many problems
with built-in combinatorial structures, since, as a rule, the relaxation LPs of such
problems are strongly degenerate.

6.3.1 Priorities

Assigning priorities to the variables allow us to establish a partial order relation on


the set of integer variables: the higher the priority of a variable the more important it
is. When a variable is selecting for branching, the priorities play a dominant role: a
variable to branch on is usually selected from the variables with the highest priority.
164 6 Branch-And-Cut

As an example of a situation where setting priorities can significantly speed up the


solution process, we can mention the multiperiod planning problems (see Sects. 2.4,
2.6, 2.11 and 2.14). Here, the decisions made in the early periods have a greater
impact on future decisions than future decisions have on the past ones. Therefore,
the decision variables for early periods should receive higher priorities.
The priorities allow us to resolve the following difficulty. Very often, we know
that some variables automatically take integer values. As a simple example, we can
mention a situation where in an equation with integer coefficients all variables ex-
cept one are integer. It is obvious that this single non-integer variable can take only
integer values. Another example is the family of xi j variables in (2.4), which is the
formulation of the facility location problem. The submatrix of the constraint matrix,
composed of columns corresponding to the xi j variables, is totally unimodular (see
Exercise 1.12). Therefore, any basic feasible solution to the relaxation LP for (2.4)
has integer components xi j if all yi take integer values. If we declare the variables
xi j to be integers, then any of them can be chosen for branching, which is undesir-
able. On the other hand, by declaring variables xi j as integer-valued, we can benefit
by generating stronger cuts. A simple solution to this problem is to declare the xi j
variables as integer-valued and assign them the lowest priority.

6.3.2 Special Ordered Sets

Let us recall the representation (1.2) of the discrete variable (see Sect. 1.1.1). Sup-
pose that we have a MIP with such a constraint. If, in an optimal solution to the
relaxation LP, not all values λ̃i of the variables λi are integers, then using any stan-
dard branching rule, we choose a fractional component λ̃i∗ to divide the feasible
domain of λ variables,
( )
k
K= λ ∈ {0, 1}k : ∑ λi = 1 ,
i=1

into two subsets: K0 = {λ ∈ K : λi∗ = 0} of cardinality k − 1, and K1 = {λ ∈ K :


λi∗ = 1} of cardinality 1. Since the set K0 is usually much larger than the set K1 , the
search tree turns out to be unbalanced. The situation can be corrected if we use a
balanced branching, when the set K is divided into two subsets:

K̄0 = {λ ∈ K : λi = 0, i = 1, . . . , r},
K̄1 = {λ ∈ K : λi = 0, i = r + 1, . . . , k},

where
j k
r = arg min ∑ λ̃i − ∑ λ̃i .

1≤ j<k i=1 i= j+1

Such a way of branching is known as SOS1-branching.


6.4 Global Gomory Cuts 165

A special way of balanced branching is also used if a MIP contains SOS2-


constraints, which are used to represent piecewise linear approximations of non-
linear functions (see Sect. 1.1.3). Let λ̃i denote the values of the variables λi in a
SOS2-constraint given by the equation
k
∑ λi = 1
i=1

and the requirement that no more than two components λi take non-zero values, and
if there are two of such components, they must be consecutive. Let

i1 = min{i : λ̃i > 0}, i2 = max{i : λ̃i > 0}.

If i2 − i1 > 1, then after branching the feasible set


( )
k
k
K= λ ∈ [0, 1] : ∑ λi = 1
i=1

is divided into two subsets

K1 = {λ ∈ K : λi = 0, i = 1, . . . , r},
K2 = {λ ∈ K : λi = 0, i = r, . . . , k},

where r = b(i1 + i2 )/2c. This way of branching is called a SOS2-branching.

6.4 Global Gomory Cuts

Let us recall that global cuts are valid for all nodes of the search tree, and local
cuts are valid only for a particular node and all its descendants. Therefore, as a rule,
global cuts are more useful in practice. Recall that we declare a cut as being local
if in its derivation we used other local inequalities. Most often, such inequalities are
the lower and upper bounds for the values of integer variables. Changing a bound
(lower or upper) of a binary variable means fixing its value. This simple observa-
tion makes it possible to generate a global fractional Gomory cut each time when
no local inequalities, other than the bounds for binary variables, were used in the
derivation of the base inequality (for which mixed integer rounding is applied). Let
us demonstrate this with a simple example.
Example 6.3 Let us imagine that we are solving the following IP

4x1 + 2x2 + 5x3 → max,


3x1 + 2x2 + 2x3 ≤ 4, (6.2)
x1 , x2 , x3 ∈ {0, 1}
166 6 Branch-And-Cut

by the branch-and-cut method. We know an optimal solution, x∗ = 23 , 0, 1 , to the




root relaxation LP, and now we need to process the child node obtained from the
root node after fixing x1 to 1.
Solution. First, let us write down the relaxation LP for this node:

4x1 + 2x2 + 5x3 → max,


1: 3x1 + 2x2 + 2x3 ≤ 4,
2: 1 ≤ x1 ≤ 1,
3: 0 ≤ x2 ≤ 1,
4: 0 ≤ x3 ≤ 1.

Its optimal basic feasible solution, basic set and inverse basic matrix are the follow-
ing:  
0 −1 0
1 T
 
x̄ = 1, 0, , I = (1, −2, −3) and B−1 =  0 0 −1 .
2 1 3
2 2 1
As x3 is the only variable taking a fractional value, we will build the fractional
Gomory cut starting with the equation:
1 3 1
x3 + s1 + s2 + s3 = . (6.3)
2 2 2
Next, we compute the coefficients
1 f0 1 1
f0 = , = 1, f1 = , f2 = , f3 = 0
2 1 − f0 2 2
and write down the cut
1 1 1
s1 + s2 ≥ ,
2 2 2
or
s1 + s2 ≥ 1.
Substituting 4 − 3x1 − 2x2 − 2x3 for s1 , and −1 + x1 for s2 , after simplifications and
rearranging, we get the cut in the initial variables:

x1 + x2 + x3 ≤ 1.

This inequality, as it should, cuts off the point x̄, but is not global, since it also cuts
off the point (0, 1, 1)T , which is a feasible solution to (6.2). This happened because,
when derivating this cut, we used the local bound x1 ≥ 1.
Now, let us build a global cut. Since we have the equation x1 = 1, we can include
into the basic set any of two inequalities: x1 ≥ 1, which is local, or x1 ≤ 1, which is
global. This time we include into the basic set the global inequality x1 ≤ 1. So, we
have the basic set I¯ = (1, 2, −3) and the inverse basic matrix
6.5 Preprocessing 167
 
0 1 0
B̄−1 =  0 0 −1 .
1 3
2 −2 1

Now we will build the cut starting with the equation:


1 3 1
x3 + s1 − s2 + s3 = . (6.4)
2 2 2
Before continuing, note that we could write down (6.4) directly by the matrix B−1
(without writing B̄−1 ): (6.4) is obtained from (6.3) by changing the sign of the co-
efficient of the variable s2 , which corresponds to the inequality x1 ≤ 1.
Let us continue building the cut. We calculate the coefficients
1 f0 1 3 1
f0 = , = 1, f1 = , f2 = − − (−2) = , f3 = 0
2 1 − f0 2 2 2
and write down the cut
1 1 1
s1 + s2 ≥ ,
2 2 2
or
s1 + s2 ≥ 1.
Substituting 4 − 3x1 − 2x2 − 2x3 for s1 , and 1 − x1 for s2 , after simplifications and
rearranging, we get the cut in the initial variables:

2x1 + x2 + x3 ≤ 2.

In the derivation of this cut, we did not use local inequalities, and therefore this cut
is global. t
u

6.5 Preprocessing

We have not yet discussed one position in the block diagram of the branch-and-cut
method shown in Fig. 6.2. It is about preprocessing, or automatic reformulation. All
modern commercial MIP solvers begin solving any MIP with an attempt to sim-
plify it (narrow the feasible intervals for variables or even fix their values, classify
variables and constraints, strengthen inequalities, scale the constraint matrix, etc.).
The following simple statements give an idea what actions are being performed in
the preprocessing step.

Proposition 6.1. Given the inequalities ∑nj=1 a j x j ≤ b and l ≤ x ≤ u, where a, l, u ∈


Rn . Let
νi = ∑ a jl j + ∑ a ju j.
j: j6=i, a j >0 j: j6=i, a j <0
168 6 Branch-And-Cut

1) (Bounds on variables) If ai > 0, then


 
b − νi
xi ≤ min ui , ,
ai

and if ai < 0, then  


b − νi
xi ≥ max li , .
ai
For an integer variable xi , we have

dli e ≤ xi ≤ bui c.

2) (Redundancy) The inequality ∑nj=1 a j x j ≤ b is redundant if

∑ a ju j + ∑ a j l j ≤ b.
j: a j >0 j: a j <0

3) (Infeasibility) The system ∑nj=1 a j x j ≤ b and l ≤ x ≤ u is infeasible if

∑ a jl j + ∑ a j u j > b.
j: a j >0 j: a j <0

Proposition 6.2. (Fixing variables) Consider the LP

max{cT x : Ax ≤ b, l ≤ x ≤ u}.

If A j ≥ 0 and c j ≤ 0, then x j = l j . Conversely, if A j ≤ 0 and c j ≥ 0, then x j = u j .

Example 6.4 We need to apply Propositions 6.1 and 6.2 to strengthen the following
formulation:
3x1 + 2x2 − x3 → max,
1 : 4x1 − 3x2 + 2x3 ≤ 13,
2 : 7x1 + 3x2 − 4x3 ≥ 8,
3: x1 + 2x2 − x3 ≥ 4,
x1 ∈ Z+ ,
x2 ∈ {0, 1},
x3 ≥ 1.
Solution. In view of Proposition 6.2, we fix the value of x3 to 1. Further, from
Ineqs. 1 and 2, we have

x1 ≤ b(13 + 3x2 − 2x3 )/4c ≤ b(13 + 3 · 1 − 2 · 1)/4c = 3,


x1 ≥ d(8 − 3x2 + 4x3 )/7e ≥ d(8 − 3 · 1 + 4 · 1)/7e = 2.

From Ineq. 3, we obtain the lower bound:

x2 ≥ d(4 − x1 + x3 )/2e ≥ d(4 − 3 − 1)/2e = 1.


6.5 Preprocessing 169

Hence, x2 = 1, and from Ineq. 3 we establish the bound:

x1 ≥ 4 − 2x2 + x3 = 4 − 2 · 1 + 1 = 3.

So, we have fixed the values of all variables and found the solution to this exam-
ple, x∗ = (3, 1, 1), already at the preprocessing stage. Of course, this is rather an
exception than a rule. t
u

Another simple way to strengthen the formulation of a given MIP is to change


the constraint coefficients. Let us start with a specific example. The inequality
n
∑ a jx j ≥ β ,
j=1

in which all the variables x j are binary and the coefficients a j and β are positive, is
equivalent to the inequality
n
∑ min{a j , β }x j ≥ β .
j=1

Geometrically, decreasing some coefficients a j , we rotate the hyperplane H(a, β ) in


such a way that it touches more feasible solutions.
Proposition 6.3. Let X be the solution set of the system
n
αy + ∑ a j x j ≤ β , (6.5)
j=1
y ∈ {0, 1} and l j ≤ x j ≤ u j , j = 1, . . . , n, (6.6)

and let
U= ∑ a ju j + ∑ a jl j .
j:a j >0 j:a j <0

If α < 0, then for ᾱ = max{α, β −U} the inequality


n
ᾱy + ∑ a j x j ≤ β (6.7)
j=1

is valid for X and is not weaker than (6.5).


If α > 0, then for α̂ = min{α,U − β + α} the inequality
n
α̂y + ∑ a j x j ≤ β − α + α̂ (6.8)
j=1

is valid for X and is not weaker than (6.5).


Proof. First we consider the case when α < 0. Inequalities (6.5) and (6.7) are
equivalent if y = 0 or ᾱ = α. Therefore, suppose that y = 1 and ᾱ > α. Then
170 6 Branch-And-Cut
n
−α > −ᾱ = −β +U ≥ −β + ∑ a j x j
j=1

and, consequently,
n
ᾱ · 1 + ∑ a j x j ≤ β .
j=1

Now consider the case when α > 0. Let us rewrite (6.5) in the form
n
−α(1 − y) + ∑ a j x j ≤ β − α. (6.9)
j=1

From the already proved part of this proposition for the case α < 0, it follows that,
for
−α̂ = max{−α, β − α −U},
the inequality
n
−α̂(1 − y) + ∑ a j x j ≤ β − α
j=1

is valid for X and is not weaker than (6.9). t


u

Example 6.5 We need to reduce the coefficients of the binary variables in the in-
equality
92x1 − 5x2 + 72x3 − 10x4 + 2x5 ≤ 100 (6.10)
under the following conditions:

x1 , x2 , x3 ∈ {0, 1}, 0 ≤ x4 ≤ 5, 0 ≤ x5 ≤ 2.

Solution. Let us apply Proposition 6.3 sequentially for the variables x1 , x2 and x3 .
Since α = a1 = 92 > 0, U = −5 · 0 + 72 − 10 · 0 + 2 · 2 = 76, then

α̂ = min{92, 76 − 100 + 92} = 68,

and we obtain the inequality

68x1 − 5x2 + 72x3 − 10x4 + 2x5 ≤ 100 − 92 + 68 = 76.

As α = a2 = −5 < 0, U = 68 + 72 + 4 = 144, then

ᾱ = max{−5, 76 − 144} = −5,

and therefore we cannot change the coefficient of x2 .


Since α = a3 = 72 > 0, U = 68 + 4 = 72, then

α̂ = min{72, 72 − 76 + 72} = 68,


6.5 Preprocessing 171

and we obtain the final inequality

68x1 − 5x2 + 68x3 − 10x4 + 2x5 ≤ 76 − 72 + 68 = 72. t


u

After solving the relaxation LP for some node of the search tree, we have addi-
tional information that can be used to strengthen the formulation at this node.
Proposition 6.4. Let γ be the optimal objective value of the relaxation LP for some
node being processed, and R be the record value at that time. Let x∗j and c̄ j 6= 0 be
the value and the reduced cost of an integer variable x j , l j ≤ x j ≤ u j . Further, let
δ = b(γ − R)/c̄ j c. Then, for any optimal solution of the relaxation LP, the following
holds:
if x∗j = l j , then x j ≤ l j + δ , and if x∗j = u j , then x j ≥ u j − δ .

6.5.1 Disaggregation of Inequalities

There exist methods that allows us to replace (aggregate) a system of linear inequal-
ities with a single linear inequality so that the integer solutions of the system and this
single inequality are the same. But in practice it is better to disaggregate inequali-
ties; usually, this strengthens existing formulations. In Example 1.1 we strengthened
an IP formulation of a binary set by replacing an inequality with a family of inequal-
ities that imply the original inequality. The next proposition presents a simple but
rather general disaggregation technique.

Proposition 6.5. Let us consider the solution set X to the inequality

∑ f j x j + ∑ ai yi ≤ b, (6.11)
j∈B i∈I

where all variables x j ( j ∈ B) are binary, all variables yi (i ∈ I) are integer (or
binary), all coefficients f j ( j ∈ B) are positive and ∑ j∈B f j ≤ 1, b and all coefficients
ai (i ∈ I) are integers. Then the inequalities

x j + ∑ ai yi ≤ b, j ∈ B, (6.12)
i∈I

are also valid for X, and (6.11) is a consequence of (6.12).

Proof. The inequalities

f j x j + ∑ ai yi ≤ b, j ∈ B,
i∈I

are valid for X since all f j and x j are non negative. As x j are binaries, and b −
∑i∈I ai yi takes only integer values, all inequalities from (6.12) are also valid for X.
172 6 Branch-And-Cut

Multiplying the j-th inequality in (6.12) by f j / ∑k∈B fk , and then summing together
all |B| resulting inequalities, we get the inequality

1
∑ j∈B f j
∑ f j x j + ∑ ai yi ≤ b,
j∈B i∈I

which is not weaker than (6.11).


Thus, we have proven that (6.12) is not weaker (usually much stronger) formu-
lation for X than (6.11). t
u

6.5.2 Probing

Solving a MIP with a substantial share of binary variables, we can also apply the
technique of probing the values of binary variable. At each iteration of the probing
procedure, the value of one binary variable, xi , is fixed to α ∈ {0, 1}, the basic
preprocessing techniques are applied, and then we explore the consequences.
1. If infeasibility is detected, then, for any feasible solution, the variable xi cannot
take the value of α, and, therefore, its value must be set to 1 − α. For demonstration,
consider a simple example:

2x1 − 2x2 + x3 ≤ 0, x1 + x2 ≤ 1, x1 , x2 ∈ {0, 1}, x3 ≥ 1.

Setting x1 = 1, from the second inequality we obtain that x2 = 0, and from the
first one we derive the upper bound x3 ≤ −2, which contradicts to the lower bound
x3 ≥ 1. Therefore, we can set x1 = 0.
2. If one of the bounds of a constraint l ≤ aT x ≤ u is changed, then we can
¯ ū be new bounds established as a result of prepro-
strengthen this constraint. Let l,
cessing. Then the following inequalities hold

l + (l¯ − l)(1 − xi ) ≤ aT x ≤ u − (u − ū)(1 − xi ) if α = 0,


l + (l¯ − l)xi ≤ aT x ≤ u − (u − ū)xi if α = 1.

In the example

x1 − x2 ≤ 0, x1 − x3 ≤ 0, x2 + x3 ≥ 1,

after setting x1 = 1, we obtain the inequalities x2 ≥ 1, x3 ≥ 1, and a new lower bound


for the third inequality: x2 + x3 ≥ 2. Therefore, we can write down the inequality

x2 + x3 ≥ 1 + (2 − 1)x1 or − x1 + x2 + x3 ≥ 1.

We can also consider the bounds on variables as inequalities. Consider the system
6.5 Preprocessing 173

5x1 + 3x2 + 2x3 ≤ 10, x1 ∈ {0, 1}, x2 ≤ 3, x3 ≤ 5, x2 , x3 ∈ Z+ .

Fixing x1 to 1, from the first inequality we get the upper bounds x2 ≤ 1 and x3 ≤ 2.
This allows us to derive the following inequalities:

x2 ≤ 3 − (3 − 1)x1 and x3 ≤ 5 − (5 − 2)x1 ,

or
2x1 + x2 ≤ 3 and 3x1 + x3 ≤ 5.
Despite the apparent simplicity, the probing procedure is a very powerful tool
for enhancing the formulations of MIPs with binary variables. Besides, the probing
techniques subsume some of the preprocessing techniques that we discussed earlier.

Disaggregation of Inequalities

Proposition 6.5 demonstrates a way of strengthening an inequality by replacing it


with a family of inequalities that imply the original constraint. In some cases, we
can automate the disaggregation process by probing binary variables.
Consider the inequality

x1 + x2 + · · · + xn ≤ k · y,

where all variables are binary, and 1 ≤ k ≤ n. Fixing y to 0, we get the new upper
bounds xi ≤ 0, i = 1, . . . , n. Consequently, the following inequalities are valid:

xi ≤ y, i = 1, . . . , n.

Changing Coefficients of Binary Variables

Probing binary variables, we can achieve much more than Proposition 6.3 allows.
Example 6.6 We need to strengthen the first inequality in the following system

4x1 + 5x2 + 7x3 + 2x4 ≤ 9,


x1 + x2 ≤ 1,
x1 , x2 , x3 , x4 ∈ {0, 1}.

Solution. First we note that we cannot strengthen the first inequality in the way
specified in Proposition 6.3. Therefore, let us proceed to probing the variables.
Setting x1 = 1, from the second inequality we conclude that x2 = 0, and from
the first inequality it follows that x3 = 0. Then the maximum value of the left-hand
side of the first inequality is u = 4 + 2 = 6 < 9. Therefore, if x1 = 1, this inequality
becomes redundant, and it can be strengthened as follows:
174 6 Branch-And-Cut

4x1 + 5x2 + 7x3 + 2x4 ≤ 9 − (9 − 6)x1

or
7x1 + 5x2 + 7x3 + 2x4 ≤ 9. (*)
Setting x2 = 1 and arguing in a similar way as above, we can further strengthen
(*) to the inequality
7x1 + 7x2 + 7x3 + 2x4 ≤ 9. t
u

6.6 Traveling Salesman Problem

Often, a formulation of a MIP contains too many inequalities, and all of them cannot
be stored in the computer memory. In such cases, some of the inequalities are ex-
cluded from the formulation, the truncated problem is solved by the branch-and-cut
method, and the excluded inequalities are considered as cuts. But such cuts, which
constitute a part of a MIP formulation, must be represented by an exact separation
procedure. Otherwise, we could get an infeasible solution to our MIP. Let us demon-
strate what has been said with a famous example of the minimum Hamiltonian cycle
problem.
Given a (undirected) graph G = (V, E), each edge e ∈ E of which is assigned a
cost ce , we need to find a Hamiltonian cycle (a simple cycle that covers all vertices)
with the minimum total cost of edges. We note that the minimum Hamiltonian cycle
problem on complete graphs is also called the traveling salesman problem (TSP)
because of the following interpretation. There are n cities and the distance ci j is
known between any pair of cities i and j. A traveling salesman wants to find the
shortest ring route, which visits each of the n cities exactly once. As a subproblem,
the TSP appears in practical applications in the following context. A multifunctional
device, processing a unit of some product, performs over it n operations in any order.
The readjustment time of the device after performing operation i for operation j is
ti j . It is necessary to find the order of performing operations for which the total time
spent on readjustments is minimum.
Introducing binary variables xe , e ∈ E, with xe = 1 if edge e is included in the
Hamiltonian cycle, and xe = 0 otherwise, we can formulate the minimum Hamilto-
nian cycle problem as follows:

∑ ce xe → min, (6.13a)
e∈E

∑ xe = 2, v ∈ V, (6.13b)
e∈E(v,V )

∑ xe ≥ 2, 0/ 6= S ⊂ V, (6.13c)
e∈E(S,V \S)

xe ∈ {0, 1}, e ∈ E. (6.13d)


6.6 Traveling Salesman Problem 175

Here we use the notation E(S, T ) for the subset of edges from E with one end vertex
in S, and the other in T .
Equations (6.13b) require that each vertex be incident to exactly two selected
edges. The subtour elimination inequalities (6.13c) are needed to exclude ”short
cycles” (see below the solution of Example 6.7). System (6.13c) contains too many
inequalities. Even for relatively small graphs (say with 50 vertices) we can not store
in the memory of a modern computer any conceivable description of all subtour
elimination inequalities. But we can treat the subtour elimination inequalities as
cuts and add them to the active node LPs as needed. To do this, we only need to
solve the following separation problem:
given a point x̃ ∈ [0, 1]E that satisfies (6.13b), it is needed to verify whether all
inequalities in (6.13c) are valid, and if there exist violated inequalities, find
one (or a few) of them.
This separation problem can be formulated as the minimum cut problem in which
we need to find a proper subset S̃ of the vertex set V (0/ 6= S̃ ⊂ V ) such that the value
q = ∑e∈E(S̃,V \S̃) x̃e is minimum. If q < 2, then the inequality ∑e∈E(S̃,V \S̃) xe ≥ 2 is
violated at x̃; otherwise x̃ satisfies all inequalities from (6.13c).
To find a minimum cut, there are effective deterministic and probabilistic algo-
rithms. We can use one of them to solve the separation problem for the subtour
elimination inequalities. In addition, we can find several minimum cuts (violated in-
equalities) at once by constructing the Gomory-Hu tree. It is said that a cut (S,V \ S)
separates two vertices s and t if exactly one of these vertices belongs to S; such a
cut is also called an s,t-cut. The Gomory-Hu tree, TGH = (V, Ẽ), is defined on the
vertex set V of the graph G, but the edges e ∈ Ẽ need not be edges in G. Each edge
e ∈ Ẽ is assigned a number fe . For given two vertices s,t ∈ V , we can find a mini-
mum s,t-cut as follows. First, on a single path connecting s and t in TGH , we need
to find an edge e with the minimal value fe . Removing this edge e from the tree
TGH , we get two subtrees. Let S and V \ S be the vertex sets of these subtrees, then
the partition (S,V \ S) is a minimum s,t-cut. In spite of the fact that n-vertex graphs
have n(n − 1)/2 different s,t-cuts (different pairs s,t), we can build the Gomory-Hu
tree by a procedure that solves only n − 1 minimum s,t-cut problems.
Example 6.7 Consider an example of the minimum Hamiltonian cycle problem de-
fined on the graph depicted in Fig. 6.4, where the numbers near the edges are their
costs. We need to solve this example by the branch-and-cut method, when a) α = 4,
β = 10; b) α = β = 0.

Solution. In both cases, we begin by solving the LP


176 6 Branch-And-Cut

1j 4j
0
@
@1 1
@j
5j
α
1 2 1
@
1 1@

3j
β @ j
6

Fig. 6.4 An example of the Hamiltonian cycle problem

x1,2 + x2,3 + x3,1 + x4,5 + x5,6 + x6,4 + αx2,5 + β x3,6 → min,


x1,2 + x3,1 + x1,4 = 2,
x1,2 + x2,3 + x2,5 = 2,
x2,3 + x3,1 + x3,6 = 2,
(6.14)
x4,5 + x6,4 + x1,4 = 2,
x4,5 + x5,6 + x2,5 = 2,
x5,6 + x6,4 + x3,6 = 2,
0 ≤ x1,2 , x2,3 , x3,1 , x4,5 , x5,6 , x6,4 , x1,4 , x2,5 , x3,6 ≤ 1.

Here the variable xi, j corresponds to the variable xe for the edge e = (i, j).
Case a). Use your favorite LP solver to verify that, for α = 4 and β = 10, an
optimal solution to (6.14) is the point x(1) with the coordinates:
(1) (1) (1) (1) (1) (1)
x1,2 = x2,3 = x3,1 = x4,5 = x5,6 = x6,4 = 1,
(1) (1) (1)
x1,4 = x2,5 = x3,6 = 0.

Note that two short cycles correspond to the point x(1) :

1→2→3→1 and 4 → 5 → 6 → 4.

The point x(1) violates the subtour elimination inequality for S = {1, 2, 3}:

x1,4 + x2,5 + x3,6 ≥ 2.

Adding this inequality to the constraints of LP (6.14), after the reoptimization, we


obtain the solution x(2) with the coordinates:
(2) (2) (2) (2) (2) (2)
x3,1 = x2,3 = x2,5 = x5,6 = x6,4 = x1,4 = 1,
(6.15)
(2) (2) (2)
x1,2 = x3,6 = x4,5 = 0.

The integer point x(2) determines a minimum Hamiltonian cycle

1 → 3 → 2 → 5 → 6 → 4 → 1,
6.6 Traveling Salesman Problem 177

which cost equals 8.


Case b). Now we solve (6.14) with α = β = 0. Its solution is the point x(3) with
the following coordinates:

(3) (3) (3) (3) (3) (3) 1


x1,2 = x2,3 = x3,1 = x4,5 = x5,6 = x6,4 = ,
2
(3) (3) (3)
x1,4 = x2,5 = x3,6 = 1.

For such a small example, it is not difficult to verify (even without solving the sep-
aration problem) that the point x(3) satisfies all the subtour elimination inequalities.
This suggests that Formulation (6.13), containing so many inequalities, is not ideal.
Many other classes of inequalities are known for the minimum Hamiltonian cycle
problem. But, unlike the subtour elimination inequalities, all other classes of in-
equalities are usually not part of the problem formulation.
It is easy to see that of the six edges

(1, 2), (2, 3), (3, 1), (1, 4), (2, 5), (3, 6)

no more than four can be on a Hamiltonian cycle. Therefore, the next inequality
must hold
x1,2 + x2,3 + x3,1 + x1,4 + x2,5 + x3,6 ≤ 4, (6.16)
which is violated at x(3) . Adding this inequality to the constraints of (6.16), after
reoptimization, we again get the solution x(2) given by (6.15), but now the cost of
x(2) is 4. t
u

Inequality (6.16) belongs to a large class of inequalities called comb-inequalities.


Suppose that a family of vertex subsets H ⊆ V and Ti ⊆ V for i = 1, . . . , k, satisfies
the following conditions :

|H ∩ Ti | ≥ 1, |Ti \ H| ≥ 1, i = 1, . . . , k,
(6.17)
Ti ∩ T j = 0,
/ i = 1, . . . , k − 1, j = i + 1, . . . , k,

where k ≥ 3 is an odd integer. The configuration C = (H, T1 , . . . , Tk ) is called a comb


in graph G with the handle H and the teeth Ti (for illustration see Fig. 6.5). Denote
by x(S) the sum ∑e∈E(S,S) xe . The inequality

q q H q q
q qq q
4j 5j 6j 7j
T1 T2 T3
q q q q
1j 2j 3j

Fig. 6.5 Graphical representation of comb-inequalities


178 6 Branch-And-Cut

k k  
k
x(H) + ∑ x(Ti ) ≤ |H| + ∑ (|Ti | − 2) + (6.18)
i=1 i=1 2

is called a comb-inequality.
Let us show that (6.18) is a Chvátal-Gomory cut. Using (6.13b) and the following
equivalent representation for the subtour elimination inequalities

x(S) ≤ |S| − 1, 0/ 6= S ⊂ V,

we can write down the following chain of inequalities:


k k
2x(H) + 2 ∑ x(Ti ) ≤ x(H,V ) + ∑ (x(Ti ) + x(Ti ∩ H) + x(Ti \ H))
i=1 i=1
k
≤ 2 |H| + ∑ ((|Ti | − 1) + (|Ti ∩ H| − 1) + (|Ti \ H| − 1))
i=1
k
= 2 |H| + 2 ∑ (|Ti | − 2) + k.
i=1

Dividing both sides of the resulting inequality by 2 and rounding the right-hand side,
we obtain (6.18).
Even for small n, the total number of comb-inequalities is huge, there are much
more of them than there are the subtour elimination inequalities. To use the comb-
inequalities in computational algorithms, we need an efficient separation proce-
dure for these inequalities. In the general case, the separation problem for comb-
inequalities is not solved. But there are several heuristic separation procedures.
These procedures may not find an inequality violated at a given point, even if such
inequalities exist.
An efficient exact separation procedure is known only for a subclass of comb-
inequalities, when

|H ∩ Ti | = 1, |Ti \ H| = 1, i = 1, . . . , k.

Such comb-inequalities are also called flower inequalities because these inequalities
are sufficient to describe the 2-matching polyhedron, which is the convex hull of
points x ∈ {0, 1}E satisfying (6.13b).
From a practical point of view, the main difference between the cuts that are in
the problem formulation, and the usual cuts, is that the exact separation procedure
is necessary for the former, and heuristic separation procedures can be used for the
latter. There are many examples where, even when there are theoretically efficient
separation procedures, in practice, preference is given to faster heuristics.
6.8 Exercises 179

6.7 Notes

Sect. 6.1. The LP-based branch-and-cut method for integer programming was pro-
posed by Land and Doig in [83].
Sect. 6.2. Articles [62, 106] were among the first to describe the use of cuts in the
branch-and-bound method.
Sect. 6.3. Now standard branching on an integer variable with a fractional value
appeared in [41], pseudocost branching was proposed in [23], strong branching was
introduced in CPLEX 7.5 (see also [6]), and GUB/SOS-branching was proposed in
[19].
Sect. 6.4. The idea to generate global cuts avoiding local bounds for binary variables
was specified in [14].
Sect. 6.5. Many preprocessing methods are considered as folklore, since it is very
difficult to trace their origins. Various aspects of preprocessing are discussed in
[4, 30, 70, 74, 119, 129, 132].
Sect. 6.6. Danzig, Falkerson and Johnson [44] were the first to use cuts for solving
a traveling salesman problem with 49 cities. Later, their method was significantly
expanded and improved by many researchers. An overview of these results is pro-
vided in [96]. An implementation of the branch-and-cut method for solving very big
traveling salesman problems is discussed in [7].
The minimum cut problem can be efficiently solved by both deterministic [97]
and probabilistic [79] algorithms. An efficient implementation of the procedure for
constructing the Gomory-Hu tree was proposed in [71].
The comb inequalities were introduced in [36] for a particular case of comb struc-
tures, where each tooth contains exactly one vertex from the handle, and the general
comb inequalities appeared in [65]. The separation procedure for flower inequalities
was developed in [105]. A number of heuristic separation procedures for comb-like
inequalities where described in [107].
Sect. 6.8. The statement of Exercise 6.11 was taken from [92] (see also [110]).

6.8 Exercises

6.1. Consider the following IP

max{−xn+1 : 2x1 + 2x2 + · · · + 2xn + xn+1 = n, x ∈ {0, 1}n+1 }.

Prove that for odd n, the branch-and-bound method from Listing 6.1 processes an
exponential (in n) number of nodes.
6.2. How many branchings can the branch-and-bound method perform in the worst
case when solving an IP with one integer variable?
180 6 Branch-And-Cut

6.3. Solve again Example 6.1 by the branch-and-bound method, but now first apply
preprocessing.
6.4. Solve the following IPs by the branch-and-bound method:

a) 4x1 + 5x2 + x3 → max, b) 10x1 + 14x2 + 21x3 → min,


3x1 + 2x2 ≤ 10, 2x1 + 2x2 + 7x3 ≥ 14,
x1 + 4x2 ≤ 11, 8x1 + 11x2 + 9x3 ≥ 12,
3x1 + 3x2 + x3 ≤ 13, 9x1 + 6x2 + 3x3 ≥ 10,
0 ≤ x1 ≤ 4, 0 ≤ x1 ≤ 2,
0 ≤ x2 ≤ 3, 0 ≤ x2 ≤ 2,
0 ≤ x3 ≤ 5, 0 ≤ x3 ≤ 3,
x1 , x2 , x3 ∈ Z; x1 , x2 , x3 ∈ Z.

6.5. Using the result of Exercise 3.4, solve the following 0,1-knapsack problem by
the branch-and-bound method:
16x1 + 6x2 + 14x3 + 19x4 → max,
6x1 + 3x2 + 7x3 + 9x4 ≤ 13,
x1 , x2 , x3 , x4 ∈ {0, 1}.

6.6. Solve the next IP by the branch-and-cut method that at each node generates
only fractional Gomory cuts:

3x1 − x2 → max,
3x1 − 2x2 ≤ 3,
−5x1 − 4x2 ≤ −10,
2x1 + x2 ≤ 5,
0 ≤ x1 ≤ 2,
0 ≤ x2 ≤ 3,
x1 , x2 ∈ Z.

6.7. Apply probing to strengthen the formulation

5x1 − 8x2 − 12x3 + 3x4 − 3x5 → max,


x1 + 3x2 − 4x3 + 2x4 + 5x5 ≤ 0,
3x1 + 7x2 − 2x3 + 2x4 + 3x5 ≥ 4,
−2x1 + 2x3 + x4 − x5 ≥ 2,
x1 , x2 , x3 , x4 , x5 , x6 ∈ {0, 1}.

6.8. Prove that in the system

−106 y + 999995x1 + 999995x2 + x3 + x4 − x5 − x6 − x7 − x8 ≤ −3,


y ≥ 1,
(y, x) ∈ Z × {0, 1}8

its first inequality can be replaced with the following much simpler inequality
6.8 Exercises 181

x1 + x2 − y ≤ 0.

Generalize your proof to propose a useful preprocessing technique.


6.9. Chvátal-Gomory strengthening of inequalities. Given an integer set X = {x ∈
Zn+ : ∑nj=1 a j x j ≥ b} with a j > 0 for j = 1, . . . , n. Let q > 0 be a scalar such that

dqa j e × b/dqbe ≤ a j , j = 1, . . . , n,

and al least one of the above inequalities is strict. Prove that the inequality
n
∑ dqa j ex j ≥ dqbe
j=1

is valid for X, and it is stronger than ∑nj=1 a j x j ≥ b.


6.10. Prove Proposition 6.4.
6.11. Prove that in Formulation (6.13) of the minimum Hamiltonian cycle problem,
System (6.13c) of the subtour elimination inequalities can be replaced with the fol-
lowing more compact system

uv − uw + (n − 1)x(v,w) ≤ n − 2, (v, w) ∈ E, v, w 6= s, (6.19)

where s is an arbitrary vertex from V , and uv (v ∈ V \ {s}) are continuous variables.


Give a meaningful interpretation for the variables uv . Which of two systems, (6.13c)
or (6.19), is stronger?
6.12. Acyclic subgraph. Given a digraph G = (V, E), each arc e ∈ E of which is
assigned a weight we ; we need to find an acyclic subgraph (having no directed
cycles) G0 = (V, E 0 ) (E 0 ⊆ E) of the maximum weight ∑e∈E 0 we . Let CG denote the
family of all directed cycles in G (more precisely, every element in CG is the set
of edges of some directed cycle in G). Introducing binary variables xe for all e ∈ E
(xe = 1 only if e ∈ E 0 ), we can write down the following formulation:

∑ we xe → max, (6.20a)
e∈E

∑ xe ≤ |C| − 1, C ∈ CG , (6.20b)
e∈C
xe ∈ {0, 1}, e ∈ E. (6.20c)

How can we solve the separation problem for Ineqs. (6.20b)?


Chapter 7
Branch-And-Price

If formulated briefly, then the branch-and-price method is a combination of the


branch-and-bound method with the column generation approach, which is used for
solving large LPs. In this chapter, we first consider the column generation algorithm
applied to the one-dimensional cutting stock problem. Then we discuss the general
scheme of the branch-and-price method and consider two specific application of this
method for solving 1) the generalized assignment problem, and 2) the problem of
designing a reliable telecommunication network.

7.1 Column Generation Algorithms

The column generation method is used for solving LPs with a large (usually expo-
nentially large) number of columns (variables). Technically, this method is similar
to the cutting plane method. A standard problem for demonstrating this method is
the one-dimensional cutting stock problem.

7.1.1 One-Dimensional Cutting Stock Problem

Materials such as paper, textiles, cellophane and metal foil are produced in rolls of
great length, from which short stocks are then cut out. For example, from a roll with
a length of 1000 cm, we can cut out 20 stocks of 30 cm in length, and 11 stocks with
a length of 36 cm, with 4 cm going to waste. When it is required to cut out many
different types of stocks in different quantities, it is not always easy to find the most
economical way (with the minimum amount of waste) to do this.
The problem of finding the most economical method of cutting is known as the
cutting stock problem. In the simplest form, it is formulated as follows. From rolls of
length L, we need to cut out pieces of length l1 , . . . , lm , respectively, in the quantities
q1 , . . . , qm . Our goal is to use the minimum number of rolls.

183
184 7 Branch-And-Price

A vector a = (a1 , . . . , am )T with non-negative integer components is called a pat-


tern if ∑m i=1 ai li ≤ L, i.e., from a roll of length L one can simultaneously cut out a1
stocks of length l1 , a2 stocks of length l2 , and so on am stocks of length lm . Let
a j = (a1j , . . . , amj )T , j = 1, . . . , n, be the set of all possible patterns, and let x j be a
variable which value is the number of rolls to be cut along the pattern a j . Then the
(one-dimensional) cutting stock problem is formulated as the next IP:
n
∑ x j → min,
j=1
n
(7.1)
∑ aij x j ≥ qi , i = 1, . . . , m,
j=1

x j ∈ Z+ , j = 1, . . . , n.

Remark. This base model can be easily modified for the case when it is necessary
to cut rolls of different lengths. When cutting expensive materials, such as silk, a
more appropriate criterion is to minimize the cost of leftovers ∑nj=1 c j x j , where c j
is the waste cost for the pattern a j .

7.1.2 Column Generation Approach

It should be noted that the number of variables, n, in (7.1) may be astronomically


large. Therefore, even the relaxation LP for (7.1) cannot be solved in the usual way.
But such LPs with exponentially many variables (columns) can be solved using the
column-generation technique which, in its essence, consists in the following. At
the beginning, a relatively small family of patterns is selected — without loss of
generality we may assume that these are the first k patterns, a1 , . . . , ak , from our list
of all patterns — so that the following truncated LP
k
∑ x j → min,
j=1
k (7.2)
∑ aij x j ≥ qi , i = 1, . . . , m,
j=1

x j ≥ 0, j = 1, . . . , k,

has a feasible solution. Let x∗ and y∗ be optimal primal and dual basic solutions to
this LP. The solution x∗ can be extended to the solution of the full LP (relaxation LP
for (7.1)) if we set to zero the values of all variables x j for j = k + 1, . . . , n. Clearly,
this extended solution is optimal to (7.1) if y∗ is its optimal dual solution. By the
complementary slackness condition (see item c) of Theorem 3.2), the latter is valid
if all reduced costs are non-negative:
7.1 Column Generation Algorithms 185
m
c̄ j = 1 − ∑ aij y∗i ≥ 0, j = 1, . . . , n. (7.3)
i=1

To verify (7.3), it is enough to solve the following pricing problem:


m
∑ y∗i zi → max,
i=1
m
(7.4)
∑ li zi ≤ L,
i=1
zi ∈ Z+ , i = 1, . . . , m.

which is an integer knapsack problem, and it can be solved by dynamic program-


ming using the recurrence formula (1.29).
Let z∗ be an optimal solution to (7.4). If ∑m ∗ ∗
i=0 yi zi ≤ 1, then all inequalities in
∗ ∗
(7.3) are satisfied at z . Otherwise, since z is a pattern, we add to (7.2) a new
column ak+1 = z∗ with a variable xk+1 , increment k by 1, and solve this extended
truncated LP. We continue adding (generating) new columns until (7.3) is satisfied.
A commonly accepted approach for solving the cutting-stock IP (7.1) is to solve
its relaxation LP and then round up the components of its optimal solution. For
problems with a small number of stocks that need to be cut out in large quantities,
this approach usually gives a near optimum solution. Alternatively, after solving the
relaxation LP, we can continue solving the problem by the branch-and-cut algorithm
without generating columns.

7.1.3 Finding a Good Initial Solution

When applying a column generation algorithm, it is highly desirable to start with a


such truncated LP which objective value is close to the optimal objective value of
the full LP. Here we present a heuristic that builds a reasonably good set of patterns.
This heuristic is based on the assumption that long stocks are cut out first, and short
stocks are cut out from leftovers.
Initialization. List stocks in decreasing order of their lengths:

lπ(1) > lπ(2) > · · · > lπ(m) .

Set b = q, I = (π(1), . . . , π(m)), k = 0.


General step. While I 6= 0,
/ do the following.
Set k := k + 1, W = L, xk = ∞. For all i 6∈ I, set aki = 0.
For i = 1, . . .j, |I|, k
W
set akI[i] = lI[i] , W := W − akI[i] · lI[i] ;
186 7 Branch-And-Price
 
bI[i]
if xk akI[i] > bI[i] , set xk = akI[i]
.
ak
Set b := b − xk and remove from I all stocks s such that bs ≤ 0,
preserving the order of the elements that are left.

7.1.4 Cutting Stock Example

To illustrate how the column generation algorithm works, let us apply it to solve
an instance of the cutting-stock problem with the following numeric parameters:
L = 100, l1 = 45, l2 = 36, l3 = 31, l4 = 14, q1 = 97, q2 = 610, q3 = 395, q4 = 211.
First, let us apply the heuristic from Sect. 7.1.3 to find an initial set of patterns.
Initialization. Set b = (97, 610, 395, 211), I = (1, 2, 3, 4), k = 0.
Step 1. Set W = 100, x1 = ∞. Compute in sequence
   
100 97
a11 = = 2, W = 100 − 2 · 45 = 10, x1 = = 49;
45 2
     
10 10 10
a12 = = 0; a13 = = 0; a14 = = 0.
36 31 14

Set b = (97, 610, 395, 211) − 49(2, 0, 0, 0) = (−1, 610, 395, 211), I = (2, 3, 4).
Step 2. Set W = 100, x2 = ∞, a21 = 0. Compute in sequence
   
2 100 610
a2 = = 2; W = 100 − 2 · 36 = 28, x2 = = 305;
36 2
 
28
a23 = = 0;
31
   
2 28 211
a4 = = 2, W = 28 − 2 · 14 = 0, x2 = = 106.
14 2

Set b = (−1, 610, 395, 211) − 106(0, 2, 0, 2) = (−1, 398, 395, −1), I = (2, 3).
Step 3. Set W = 100, x3 = ∞, a31 = 0, a34 = 0. Compute in sequence
   
100 398
a32 = = 2; W = 100 − 2 · 36 = 28, x3 = = 199;
36 2
 
28
a33 = = 0.
31

Set b = (−1, 398, 395, −1) − 199(0, 2, 0, 0) = (−1, 0, 395, −1), I = (3).
Step 4. Set W = 100, x4 = ∞, a41 = 0, a42 = 0, a44 = 0. Compute
7.1 Column Generation Algorithms 187
   
100 395
a43 = = 3; W = 100 − 3 · 31 = 7, x4 = = 132.
31 3

Set b = (−1, 0, 395, −1) − 132(0, 0, 3, 0) = (−1, 0, −1, −1), I = 0.


/
Having an initial set of patterns, we can continue solving our cutting stock exam-
ple by the column generation algoritm. First, we write down the truncated LP based
on the patterns a j , j = 1, . . . , 4:

x1 + x2 + x3 + x4 → min,
2x1 ≥ 97,
2x2 + 2x3 ≥ 610,
(7.5)
3x4 ≥ 395,
2x2 ≥ 211,
x1 , x2 , x3 , x4 ≥ 0.

Its optimal primal and dual solutions are, respectively, the following vectors:

2 T
   T
1 1 1 1 1 1 1 1
x = 48 , 105 , 199 , 131 , y = , , ,0 .
2 2 2 3 2 2 3

Next we write down the pricing problem:


1 1 1
z1 + z2 + z3 + 0z4 → max,
2 2 3
45z1 + 36z2 + 31z3 + 14z4 ≤ 100,
z1 , z2 , z3 , z4 ∈ Z+ .

The vector z1 = (0, 1, 2, 0)T is its optimal solution. Since (y1 )T z1 = 1/2 + 2/3 =
7/6 > 1, then the column a5 = z1 is added to (7.5), and, as a result, we get the
following LP:
x1 + x2 + x3 + x4 + x5 → min,
2x1 ≥ 97,
2x2 + 2x3 + x5 ≥ 610,
3x4 + 2x5 ≥ 395,
2x2 ≥ 211,
x1 , x2 , x3 , x4 , x5 ≥ 0.
After reoptimizing, we obtain the following optimal primal and dual solutions to the
above LP:

1 T
   T
1 1 3 1 1 1
x2 = 48 , 105 , 100 , 0, 197 , y2 = , , ,0 .
2 2 4 2 2 2 4

Now we solve the next pricing problem:


188 7 Branch-And-Price

1 1 1
z1 + z2 + z3 + 0z4 → max,
2 2 4
45z1 + 36z2 + 31z3 + 14z4 ≤ 100,
z1 , z2 , z3 , z4 ∈ Z+ .

The point z2 = (1, 1, 0, 0) is its optimal solution. Since (y2 )T z2 = 1, then x2 deter-
mines an optimal solution to the full relaxation LP.
Rounding up the solution x2 , we get the following approximate solution of our
example: cut out 49 rolls according to the pattern (2, 0, 0, 0), 106 rolls according to
the pattern (0, 2, 0, 2), 101 according to the pattern (0, 2, 0, 0), and 198 rolls accord-
ing to the pattern (0, 1, 2, 0). In this case, 454 rolls are used in total.
Since any cutting plan must use at least
 
 2 2 2 2 2
 1
x1 + x2 + x3 + x4 + x5 = 452 = 453
4

rolls, then if there is a more economical way of cutting, it will save only one roll.
As a rule, a cutting plan obtained by rounding up a solution to the relaxation LP is
very ”close” to the optimal ones. Therefore, in practice, we can almost always limit
ourselves to the search for such approximate cutting plans. t
u

7.2 Dancig-Wolfe Reformulation

Danzig-Wolfe decomposition was originally developed (in 1960) for solving large
structured LPs. Much later in the 1980-th this method began to be used to reformu-
late MIPs in order to strengthen them.
Consider the IP in the following form:
K
∑ (ck )T xk → max,
k=1
K (7.6)
∑ Ak xk ≤ b,
k=1
k k
x ∈X , k = 1, . . . , K,

where b ∈ Rm , and, for k = 1, . . . , K, ck ∈ Rnk , Ak is a real m × nk -matrix, X k ⊂ Znk


is a finite set.
We can write
( )
Xk = xk = ∑ λak a : ∑ λak = 1, λak ∈ {0, 1} for a ∈ X k .
a∈X k a∈X k

Substituting the expressions for xk into (7.6), we have


7.2 Dancig-Wolfe Reformulation 189

K
∑ ∑ (aT ck )λak → max,
k=1 a∈X k
K
∑ ∑ (Ak a)λak ≤ b, (7.7)
k=1 a∈X k

∑ λak = 1, k = 1, . . . , K,
a∈X k
λak ∈ Z+ , a ∈ X k , k = 1, . . . , K.

A natural question arises: what gives us the transition from Formulation (7.6),
which is compact, to Formulation (7.7) with a huge number of variables (columns)?
One of the advantages is that, in cases where each of the sets X k is a set of integer
points of a polyhedron, (7.7) is usually stronger than (7.6), since only points xk from
conv(X k ) are feasible to the relaxation LP for (7.7). For example, if

X k = {x ∈ Z2 : 3x1 + 2x2 ≤ 4, 0 ≤ x1 , x2 ≤ 1},

then the point x̄ = 1, 21 , which satisfies all inequalities defining X k , does not belong


to the convex hull of all integer points, (0, 0), (1, 0) and (0, 1), from X k .
Let zLPM denote the optimal objective value of the relaxation LP for (7.7). It is
not difficult to see that the following equality holds
K
def
zLPM = zCUT = max ∑ (ck )T xk ,
k=1
K
∑ Ak xk ≤ b,
k=1
x ∈ conv(X k ),
k
k = 1, . . . , K.

This means that the branch-and-bound method applied to (7.7) is equivalent (from
the point of view of the accuracy of upper bounds) to the the branch-and-cut method
applied to (7.6) assuming that this method uses exact separation procedures for the
sets conv(X k ).

7.2.1 Master and Pricing Problems

The number of variables in (7.7) can be astronomically large. Therefore, its relax-
ation LP can only be solved by a column generation algorithm. First, relatively small
subsets Sk ⊆ X k are chosen and the master problem is solved:
190 7 Branch-And-Price

K
∑ ∑ (aT ck )λak → max,
k=1 a∈Sk
K
∑ ∑ (Ak a)λak ≤ b, (7.8)
k=1 a∈Sk

∑ λak = 1, k = 1, . . . , K,
a∈Sk
λak ≥ 0, a ∈ Sk , k = 1, . . . , K.

Let (y, v) ∈ Rm K
+ × R be an optimal dual solution to (7.8). The nonzero compo-
nents of an optimal primal solution to (7.8) are nonzero components of an optimal
solution to the relaxation LP for (7.7) if the inequalities

yT Ak a + vk ≥ aT ck , a ∈ X k , k = 1, . . . , K, (7.9)

hold. To verify this condition, for each k = 1, . . . , K, we need to solve the following
pricing problem
((ck )T − yT Ak )xk → max,
(7.10)
xk ∈ X k .
If for some k the optimal objective value in (7.10) is greater than vk , then an optimal
solution to (7.10), x̄k , is added to the set Sk . Next we solve the extended master LP.
We continue to act in this way until an optimal solution to the relaxation LP for (7.7)
is found.

7.2.2 Branching

Applying the column generation technique makes it difficult (or even impossible)
to use the conventional branching on integer variables in the branch-and-bound
method. For example, if at a particular node of the search tree we set some vari-
able λak to zero, then for an optimal solution to the relaxation LP at this node the
reduced cost of λak can be positive, and this makes it possible that the column corre-
sponding to λak can be an optimal solution to the pricing problem. To prevent this, an
additional constraint must be added to the pricing problem. As a result, even at the
search tree nodes of low height, initially a relatively easy to solve pricing problem
can turn into a difficult IP.
Everything is greatly simplified for problems involving only binary variables.
The point x̃k = ∑a∈Sk λak a belongs to {0, 1}nk if and only if all λak are integer-valued.
Therefore, if there are variables λak taking fractional values, then x̃k has fractional
components x̃kj , one of which can be chosen for branching on it.
When processing a branch xkj = α (α ∈ {0, 1}), all elements a with a j = 1 − α
must be removed from the set Sk . In addition, we need to exclude the generation
7.3 Generalized Assignment Problem 191

of such elements (columns) when solving the corresponding pricing problem. The
latter can be done by setting xkj = α in the pricing problem. If α = 1, then we also
need to exclude from the other sets Sk̄ (k̄ 6= k) all elements a with a j = 1. In the
pricing problems it is necessary to set xk̄j = 0 for all k̄ 6= k. Note that the addition
of such simple constraints usually does not destroy the structure of each pricing
problem.
This combination of the branch-and-bound and column generation methods is
known as the branch-and-price method.

7.3 Generalized Assignment Problem

The generalized assignment problem can be considered as the following parallel


machine scheduling problem. Each of m independent tasks must be processed with-
out interruptions by one of K parallel machines; it takes pki units of time and costs cki
if task i is processed on machine k (i = 1, . . . , m, k = 1, . . . , K). The load (total run-
ning time) of machine k must not exceed its capacity lk . A schedule is represented
by an m × K-matrix X = {xik } with xik = 1 if task i is assigned (for processing) to
machine k. The generalized assignment problem (GAP) is to find such a schedule
that admits all the above requirements and has the minimum total assignment cost.
It is formulated as follows:
K m
∑ ∑ cki xik → min, (7.11a)
k=1 i=1
K
∑ xik = 1, i = 1, . . . , m, (7.11b)
k=1
m
∑ pki xik ≤ lk , k = 1, . . . , K, (7.11c)
i=1
xik ∈ {0, 1}, i = 1, . . . , m, k = 1, . . . , K. (7.11d)

Objective (7.11a) is to minimize the total assignment cost. Equations (7.11b) re-
quire that each task be assigned to exactly one machine. Inequalities (7.11c) impose
the capacity restrictions.
Setting
m
X k = {z ∈ {0, 1}m : ∑ pki zi ≤ lk },
i=1

we can reformulate (7.11) in the following way:


192 7 Branch-And-Price
!
K m
∑ ∑ ∑ cki ai λak → min, (7.12a)
k=1 a∈X k i=1
K
∑ ∑ aλak = e, (7.12b)
k=1 a∈X k

∑ λak ≤ 1, k = 1, . . . , K, (7.12c)
a∈X k
λak ∈ {0, 1}, a ∈ X k , k = 1, . . . , K. (7.12d)

Here e denotes the m-vector of all ones. Observe that we write down (7.12c) as
inequalities (instead of equations in accordance with (7.7)) since X k contains the
point 0 ∈ Rm .

7.3.1 Master Problem

For k = 1, . . . , K, let Sk ⊆ X k . To simplify the construction of an initial master prob-


lem, as well as the processing of infeasible subproblems, we introduce slack vari-
ables si (i = 1, . . . , m) to formulate the master problem in the following way:
!
K m m
−∑ ∑ ∑ cki ai λak − M ∑ si → max, (7.13a)
k=1 a∈Sk i=1 i=1
K
∑ ∑ aλak + s = e, (7.13b)
k=1 a∈Sk

∑ λak ≤ 1, k = 1, . . . , K, (7.13c)
a∈Sk
λak ≥ 0, a ∈ Sk , k = 1, . . . , K, (7.13d)
si ≥ 0, i = 1, . . . , m, (7.13e)

where M is a sufficiently big number, for example,


m
M ≥ ∑ max cki .
i=1 1≤k≤K

Introducing slack variables also allows us to start with the simplest master problem
when all Sk = 0.
/
7.3 Generalized Assignment Problem 193

7.3.2 Pricing Problem

Let (ỹ, ṽ) ∈ Rm × RK+ be an optimal dual solution to (7.13). Here the dual variable yi
corresponds to the i-th equation in (7.13b), and the dual variable vk corresponds to
the k-th inequality in (7.13c). For k = 1, . . . , K, the pricing problem is the following
0,1-knapsack problem:
m
∑ (−cki − ỹi )zi → max,
i=1
m
(7.14)
∑ pki zi ≤ lk ,
i=1
zi ∈ {0, 1}, i = 1, . . . , m.
This pricing problem can be solved by the recurrence formula (1.30).
Let z∗ ∈ {0, 1}m be an optimal solution to (7.14). If the inequality
m
∑ (−cki − ỹi )z∗i > ṽk
i=1

holds, then z∗ is added to the set Sk , and thereby the column(z∗ , ek ) ∈ Rm × RK


must be added to the master problem. The objective coefficient for this column is
− ∑m k
i=1 ci ai .

7.3.3 Branching

Branching on the variables λaj is not efficient due to two reasons. First, such branch-
ing results in a non-balanced search tree, since the branch λaj = 0 excludes only one
column while the branch λaj = 1 excludes all columns that have 1 at least in one
row i such that ai = 1. Second, such branching violates the structure of the pricing
problem (see Sect. 7.3.2).
As we have already noted, if in an optimal solution of the master LP, λ̃ , one of
the values λ̃ak is non-integer, then the vector

x̃k = ∑ λ̃ak a
a∈Sk

has non-integer components as well. Let δq be the minimum of the values


!2
 2
δk = 1− ∑ λ̃ak + ∑ λ̃ak , k = 1, . . . , K.
a∈Sk a∈Sk

Among the variables xiq , we choose for branching a variable xrq whose current value
x̃rq is closest to 0.5.
194 7 Branch-And-Price

When implementing this branch-and-price algorithm, at each node of the search


tree, it is necessary to store a list of variables xik together with their fixed values.
These data allow us to correctly modify the 0,1-knapsack pricing problems. More
precisely, when solving at some node the pricing problem for machine k, if a variable
xik is fixed to α, the variable zi must be set to α. In addition, for all k̄ 6= k and for all
variables xik̄ fixed to 1, we also need to set zi = 0.

7.3.4 Example

After we have specified all the elements of the branch-and-price method, let us apply
it for solving a small example.

Example 7.1 We need to solve an instance of GAP in which three (m = 3) tasks


must be performed on two machines (K = 2) with the following parameters:
   
1 2 3 2  
k k 4
[ci ] = 2 2 , [pi ] = 2 2 , l =
    .
4
2 1 2 3

Solution. We start with S1 = 0,


/ S2 = 0/ and M = 6. The search tree processed by
the branch-and price method is depicted in Fig. 7.1. Below are presented the steps
performed by the method to solve this example: step i. j describes iteration j of the
column generation algorithm when processing node i of the search tree.
0.1. Solve the initial master LP:

0 γ(0) = −5
(x11 , x21 , x31 )T = ( 21 , 21 , 12 )
(x12 , x22 , x32 )T = ( 21 , 21 , 12 )
!! aa
x11 = 0 !! aa x11 = 1
! aa
!! a
1 γ(1) = −6 2 γ(2) = −7
(x11 , x21 , x31 )T = (0, 0, 1) node eliminated
since its upper bound
(x12 , x22 , x32 )T = (1, 1, 0) is less than the record

Fig. 7.1 Search tree for Example 7.1


7.3 Generalized Assignment Problem 195

−6s1 − 6s2 − 6s3 → max,


s1 = 1,
s2 = 1,
s3 = 1,
0 ≤ 1,
0 ≤ 1,
s1 , s2 , s3 ≥ 0.

Its optimal dual solution is given by y = (−6, −6, −6) and v1 = v2 = 0.


Now we solve two pricing problems. For machine 1, we solve the following 0,1-
knapsack problem:

ξ = 5z1 + 4z2 + 4z3 → max,


3z1 + 2z2 + 2z3 ≤ 4,
z1 , z2 , z3 ∈ {0,1}.

The point z∗ = (0, 1, 1)T and ξ ∗ = 8 are its optimal solution and objective value.
Since ξ ∗ = 8 > 0 = v1 , we add z∗ to the set S1 : S1 = {(0, 1, 1)T }.
For machine 2, we solve the next 0,1-knapsack problem:

ξ = 4z1 + 4z2 + 5z3 → max,


2z1 + 2z2 + 3z3 ≤ 4,
z1 , z2 , z3 ∈ {0,1}.

Its optimal solution and objective value are z∗ = (1, 1, 0)T and ξ ∗ = 8. Since ξ ∗ =
8 > 0 = v2 , we add z∗ to the set S2 : S2 = {(1, 1, 0)T }.
0.2. Solve the next extended master LP:
−4λ11 − 4λ12 − 6s1 − 6s2 − 6s3 → max,
λ12 + s1 = 1,
λ1 + λ12 +
1 s2 = 1,
λ11 + s3 = 1,
λ11 ≤ 1,
λ12 ≤ 1,
λ11 , λ12 , s1 , s2 , s3 ≥ 0.

Its optimal dual solution is given by y = (−6, 2, −6) and v1 = v2 = 0.


Now let us solve the pricing problems. For machine 1, we solve the following
0,1-knapsack problem:

ξ = 5z1 − 4z2 + 4z3 → max,


3z1 + 2z2 + 2z3 ≤ 4,
z1 , z2 , z3 ∈ {0,1}.
196 7 Branch-And-Price

Its optimal solution and objective value are z∗ = (1, 0, 0)T and ξ ∗ = 5. Since ξ ∗ =
5 > 0 = v1 , we add z∗ to S1 : S1 = {(0, 1, 1)T , (1, 0, 0)T }.
For machine 2, we solve the next 0,1-knapsack problem:

ξ = 4z1 − 4z2 + 5z3 → max,


2z1 + 2z2 + 3z3 ≤ 4,
z1 , z2 , z3 ∈ {0,1}.

Its optimal solution and objective value are z∗ = (0, 0, 1)T and ξ ∗ = 5. Since ξ ∗ =
5 > 0 = v2 , we add z∗ to S2 : S2 = {(1, 1, 0)T , (0, 0, 1)T }.
0.3. We need to solve one more master LP:
−4λ11 − λ21 − 4λ12 − λ22 − 6s1 − 6s2 − 6s3 → max,
λ21 + λ12 + s1 = 1,
1
λ1 + λ12 + s2 = 1,
λ11 + λ22 + s3 = 1,
1
λ1 + λ2 1 ≤ 1,
λ12 + λ22 ≤ 1,
1 1 2 2
λ1 , λ2 , λ1 , λ2 , s1 , s2 , s3 ≥ 0.

Its optimal dual solution is given by y = (−1, −3, −1) and v1 = v2 = 0.


Now we solve the pricing problems. For machine 1, we solve the following 0,1-
knapsack problem:

ξ = 0z1 + z2 − z3 → max,
3z1 + 2z2 + 2z3 ≤ 4,
z1 , z2 , z3 ∈ {0,1}.

Its optimal solution and objective value are z∗ = (0, 1, 0)T and ξ ∗ = 1. Since ξ ∗ =
1 > 0 = v1 , we add z∗ to S1 : S1 = {(0, 1, 1)T , (1, 0, 0)T , (0, 1, 0)T }.
For machine 2, we solve the next 0,1-knapsack problem:

ξ = −z1 + z2 + 0z3 → max,


2z1 + 2z2 + 3z3 ≤ 4,
z1 , z2 , z3 ∈ {0,1}.

Its optimal solution and objective value are z∗ = (0, 1, 0)T and ξ ∗ = 1. Since ξ ∗ =
1 > 0 = v2 , we add z∗ to S2 : S2 = {(1, 1, 0)T , (0, 0, 1)T , (0, 1, 0)T }.
0.4. Solve the next master LP:
7.3 Generalized Assignment Problem 197

−4λ11 − λ21 − 2λ31 − 4λ12 − λ22 − 2λ32 − 6s1 − 6s2 − 6s3 → max,
λ21 + λ12 + s1 = 1,
1
λ1 + λ3 + λ12 +
1 λ32 + s2 = 1,
λ11 + λ22 + s3 = 1,
λ11 + λ21 + λ31 ≤ 1,
λ12 + λ22 + λ32 ≤ 1,
λ11 , λ21 , λ31 , λ12 , λ22 , λ32 , s1 , s2 , s3 ≥ 0.

The non-zero components of its optimal primal and dual solutions are the following:
1
λ11 = λ21 = λ12 = λ22 = ,
2 (7.15)
y = (−2, −3, −2), v1 = v2 = 1.

Next we solve the pricing problems. For machine 1, we solve the following 0,1-
knapsack problem:

ξ = z1 + z2 + 0z3 → max,
3z1 + 2z2 + 2z3 ≤ 4,
z1 , z2 , z3 ∈ {0,1}.

Its optimal solution and optimal objective value are z∗ = (1, 0, 0)T and ξ ∗ = 1. Since
ξ ∗ = 1 = v1 , we cannot extend the set S1 .
For machine 2, we solve the next 0,1-knapsack problem:

ξ = 0z1 + z2 + z3 → max,
2z1 + 2z2 + 3z3 ≤ 4,
z1 , z2 , z3 ∈ {0,1}.

Its optimal solution and optimal objective value are z∗ = (0, 1, 0)T and ξ ∗ = 1. Since
ξ ∗ = 1 = v2 , we cannot extend the set S2 .
We were not able to extend both sets S1 and S2 . This means that the current
solution given by (7.15) is optimal for the root node LP. Knowing the values of the
variables λ jk , we calculate the values of the variables xik :

 1     1
x1 0 1 2
x1  = 1 · 1 + 1 · 0 =  1
2,
2
2 2

x31 1 0 1
2
 2     1
x1 1 0 2
x2  = 1 · 1 + 1 · 0 =  1
2.
2
2 2

x32 0 1 1
2
198 7 Branch-And-Price

Now we select the variable x11 , with a non-integer value of 12 , to branch on it at


the root node.
1.1. Since x11 = 0 at node 1, then all vectors a with a1 = 1 must be excluded from
the set S1 that is inherited from the parent node 0. As a result, we have

S1 = {(0, 1, 1)T , (0, 1, 0)T }.

In addition, for this node and all its descendants, when setting the pricing problems
for machine 1, it will always be necessary to set z1 = 0. The set S2 remains the same
as that of the parent node1 : S2 = {(1, 1, 0)T , (0, 0, 1)T , (0, 1, 0)T }.
Now we solve the following master LP:

−4λ11 − 2λ21 − 4λ12 − λ22 − 2λ32 − 6s1 − 6s2 − 6s3 → max,


λ12 + s1 = 1,
λ1 + λ2 + λ12 +
1 1 λ32 + s2 = 1,
λ11 + λ22 + s3 = 1,
λ11 + λ21 ≤ 1,
λ12 + λ22 + λ32 ≤ 1,
λ11 , λ21 , λ12 , λ22 , λ32 , s1 , s2 , s3 ≥ 0.

Its optimal dual solution is given by y = (−6, − 21 , − 72 ) and v1 = 0, v2 = 25 .


Next we solve the pricing problems. For machine 1, we solve the following 0,1-
knapsack problem:
3 3
ξ = 5z1 − z2 + z3 → max,
2 2
3z1 + 2z2 + 2z3 ≤ 4,
z1 = 0, z2 , z3 ∈ {0, 1}.

Its optimal solution and objective value are z∗ = (0, 0, 1)T and ξ ∗ = 32 . Since ξ ∗ =
3 ∗ 1 1 T T T
2 > 0 = v1 , we add z to S : S = {(0, 1, 1) , (0, 1, 0) , (0, 0, 1) }.
For machine 2, we solve the next 0,1-knapsack problem:
3 5
ξ = 4z1 − z2 + z3 → max,
2 2
2z1 + 2z2 + 3z3 ≤ 4,
z1 , z2 , z3 ∈ {0,1}.

Its optimal solution and objective value are z∗ = (1, 0, 0)T and ξ ∗ = 4. Since ξ ∗ =
4 > 25 = v2 , we add z∗ to S2 : S2 = {(1, 1, 0)T , (0, 0, 1)T , (0, 1, 0)T , (1, 0, 0)T }.
1.2. Now we need to solve the next master LP:

1 Since we only have two machines, task 1 must be processed by machine 2. Therefore, we could
remove from the set S2 all vectors a with a1 = 0. These are the second and third vectors. But we
do not do this, because we do not want to use any specifics of this concrete example.
7.3 Generalized Assignment Problem 199

−4λ11 − 2λ21 − 2λ31 − 4λ12 − λ22 − 2λ32 − 2λ42 − 6s1 − 6s2 − 6s3 → max,
λ12 + λ42 + s1 = 1,
1
λ1 + λ2 +1 2
λ1 + 2
λ3 + s2 = 1,
λ11 + λ31 + λ22 + s3 = 1,
1 1
λ1 + λ2 + λ3 1 ≤ 1,
λ12 + λ22 + λ32 + λ42 ≤ 1,
λ11 , λ21 , λ31 , λ12 , λ22 , λ32 , λ42 , s1 , s2 , s3 ≥ 0.

The non-zero components of its optimal primal and dual solutions are the following:

λ31 = λ12 = 1,
(7.16)
y = (−3, −2, −2), v1 = 0, v2 = 1.

Next we solve the pricing problems. For machine 1, we solve the following 0,1-
knapsack problem:

ξ = 2z1 + 0z2 + 0z3 → max,


3z1 + 2z2 + 2z3 ≤ 4,
z1 = 0, z2 , z3 ∈ {0, 1}.

Its optimal solution and objective value are z∗ = (0, 0, 0)T and ξ ∗ = 0. Since ξ ∗ =
0 = v1 , we cannot extend the set S1 .
For machine 2, we solve the next 0,1-knapsack problem:

ξ = z1 + 0z2 + z3 → max,
2z1 + 2z2 + 3z3 ≤ 4,
z1 , z2 , z3 ∈ {0, 1}.

Its optimal solution and objective value are z∗ = (0, 0, 1)T and ξ ∗ = 1. Since ξ ∗ =
1 = v2 , we cannot extend the set S2 .
We were not able to extend both sets S1 and S2 . This means that the current
solution given by (7.16) is optimal for node 1. Since all λ jk are integers, we can
compute a feasible solution to the original problem:
 1    2  
x1 0 x1 1
x1  = 0 , x2  = 1 .
2 2
x31 1 x32 0

According to this solution, task 3 is processed on machine 1, and tasks 1 and 2 are
processed on machine 2. This is our first solution, and therefore it is remembered
as a record solution in the form of the vector π R = (2, 2, 1)T (πiR = k if task i is
assigned to machine k). The optimal objective value, which is equal to −6, of the
solved relaxation LP is our new record: R = −6. Note that the cost of the record
assignment, π R , is −R = 6.
200 7 Branch-And-Price

2.1. Since x11 = 1 at node 2, which means that task 1 is assigned to machine 1,
we must exclude all vectors a with a1 = 1 from the set S2 that is inherited from the
parent node 0. As a result, we have

S2 = {(0, 0, 1)T , (0, 1, 0)T }.

In addition, for this node and all its descendants, when setting the pricing problems
for machine 1, it will always be necessary to set z1 = 1, and for all other machines
(in this example only for machine 2), we need to set z1 = 0. The set S1 remains the
same as that at the parent node: S1 = {(0, 1, 1)T , (1, 0, 0)T , (0, 1, 0)T }.
Now we solve the next master LP:
−4λ11 − λ21 − 2λ31 − λ12 − 2λ22 − 6s1 − 6s2 − 6s3 → max,
λ21 + s1 = 1,
1
λ1 + λ31 + λ22 + s2 = 1,
λ11 + λ12 + s3 = 1,
λ11 + λ21 + λ31 ≤ 1,
λ12 + λ22 ≤ 1,
λ11 , λ21 , λ31 , λ12 , λ22 , s1 , s2 , s3 ≥ 0.

The non-zero components of its optimal primal and dual solutions are the following:
1
λ11 = λ21 = λ12 = λ22 = s1 = ,
2 (7.17)
y = (−6, −5, −4), v1 = 5, v2 = 3.

Next we solve the pricing problems. For machine 1, we solve the following 0,1-
knapsack problem:

ξ = 5z1 + 3z2 + 2z3 → max,


3z1 + 2z2 + 2z3 ≤ 4,
z1 = 1, z2 , z3 ∈ {0, 1}.

Its optimal solution and objective value are z∗ = (1, 0, 0)T and ξ ∗ = 5. Since ξ ∗ =
5 = v1 , we cannot extend the set S1 .
For machine 2, we solve the next 0,1-knapsack problem:

ξ = 4z1 + 3z2 + 3z3 → max,


2z1 + 2z2 + 3z3 ≤ 4,
z1 = 0, z2 , z3 ∈ {0, 1}.

Its optimal solution and objective value are z∗ = (0, 0, 1)T and ξ ∗ = 3. Since ξ ∗ =
3 = v2 , we cannot extend the set S2 .
We were not able to extend both sets S1 and S2 . This means that the current
solution given by (7.17) is optimal for node 2. But since the upper bound at this
7.4 Symmetry Issues 201

node is −7 and it is less than the current record equal to −6, then node 2 must be
eliminated from the search tree.
Since there are no more unprocessed nodes in the search tree, the current record
assignment π R = (2, 2, 1)T is optimal. t
u

After solving this example, someone might have doubts about the efficiency of
the branch-and-price algorithms (at least the one th6at we just used to solve our
example): so many calculations to solve an almost trivial example. In fact, this is
not so. But even this simple example should convince you that the implementation
of almost any branch-and-price algorithm is not a trivial exercise.

7.4 Symmetry Issues

As we noted in Sect. 7.2, one of the reasons why Formulation (7.7) with a huge
number of variables may be preferable to the compact formulation (7.6) is that the
first one provides more accurate upper bounds. We also noted that, we will get the
same bounds if we solve (7.6) by the branch-and cut method using exact separation
procedures for the sets conv(X k ). Another reason why we can prefer (7.7) is the
presence of symmetric structures, when, for some k, the objects, X k , Ak and ck , are
the same. In such cases, there are many symmetric (essentially identical) solutions:
by interchanging the values of the corresponding components of the vectors xk1
and xk2 for two identical structures k1 and k2 , we get a feasible solution with the
same objective value. As a rule, the branch-and-cut algorithms are very inefficient
in solving problems with symmetric structures. The reason here is that, by adding
a cut valid for the set X k1 or by changing a bound for some variable xkj1 , we do
not exclude the later appearance in the search tree of a node with the symmetric LP
solution obtained from the previously cut off solution by just exchanging the vectors
xk1 and xk2 . In some cases, we can cut off some of the symmetric solutions by adding
new constraints. But it is not always possible to completely overcome this symmetry
issues. More effectively, these symmetry issues can be resolved by developing a
specialized branching scheme. Often, the transition to Formulation (7.7) and the use
of a special branching scheme allows us to completely eliminate the symmetry. Next
we demonstrate this with the example of the generalized assignment problem.
Let us modify the statement of the problem given in Sect. 7.3. Suppose now that
we have K groups of machines, and group k contains nk identical machines. Now
the parameters pki and cki characterize all machines of group k, they are, respectively,
the processing time and cost of performing task i on any machine from group k. For
a new statement, the master problem (7.13) changes only slightly: we only need to
replace (7.13c) with the following inequalities:

∑ λak ≤ nk , k = 1, . . . , K. (7.18)
a∈Sk
202 7 Branch-And-Price

At the root node of the search tree, the pricing problem (7.14) for each group of
machines remains unchanged. At other nodes, additional constraints will be added
to their pricing problem.
To eliminate symmetry, we now do not distinguish machines from one group. For
this reason branching on the variables xik is impossible. On the other hand, branching
on variables λak is inefficient. We can still develop an efficient branching scheme
based on the following observation.
Proposition 7.1. Let λ̃ be a solution to the master LP at some node of the search
tree. If not all values λ̃ak (a ∈ Sk ) are integers, then there exist two tasks, r and s,
such that
∑ λ̃ak < 1. (7.19)
a∈Sk : ar =as

So, if some all k not all λ̃ak are integers, we seek a pair of tasks, r and s, such that
(7.19) is satisfied. Branching is performed by dividing the set of feasible solutions
into two subsets, the first of which includes all assignments in which the tasks r and
s are performed on the same machine, and the second includes the assignments in
which the tasks r and s are performed on different machines. In the first case, when
solving the pricing problem for group k, the tasks r and s are combined into one
task of processing time pkr + pks and cost ckr + cks . At the same time, it is necessary to
exclude from the node master LP all variables λak for those k and a such that ar 6= as .
In the second case, we need to add the inequality zr + zs ≤ 1 to the pricing problem
for group k, and exclude from the master LP all variables λak for those k and a such
that ar = as . Of course, the addition of new inequalities destroys the structure of the
pricing problem, and now it is not a 0,1-knapsack problem and, therefore, it must be
solved as an ordinary IP. But, despite this, computational experiments have proved
the efficiency of such branching.

7.5 Designing Telecommunication Networks

In this section we show that applications of the branch-and-price method are not
limited to the framework of the decomposition approach of Danzig and Wolfe. In a
number of cases, with a natural choice of variables, their number can be exponen-
tially large (recall the cutting stock problem from Sect. 7.1.1). In addition, there may
be restrictions that are difficult to compactly formulate by means of linear inequali-
ties. Sometimes, it is better to take into account such complex restrictions in pricing
problems, which can be solved in some other way (say, by dynamic or constraint
programming).
The input data in the problem of designing a reliable telecommunication network
are specified by two graphs: the channel graph G = (V, E) and the demand graph
H = (V , D). The set V consists of logical network nodes (offices, routers, etc.),
V is a subset of V (offices are in V , but routers are not). The edges e ∈ E of the
channel graph represent the set of physical communication lines (channels) that
7.5 Designing Telecommunication Networks 203

can potentially be installed. Different types of communication lines (representing


various technologies, for example, optical fiber, telephone lines, Ethernet, etc.) are
represented by parallel edges. The capacity of channel e ∈ E is ue , and the cost of
establishing this channel is ce . For each edge q = (v, w) ∈ D of the demand graph, we
know the demand, dq , for communication between nodes v and w: the total capacity
of communication lines between nodes v and w must be at least dq . The values of dq
are determined statistically on the basis of the forecast for the development of the
information services market.
It is common to call a network reliable if it is designed in such a way that it
continues to function even after some of its components fail. Let Gs = (Vs , Es ) be
the subgraph obtained from the graph G by deleting all its components damaged in
the emergency state s, s = 1, . . . , S (as a rule, each emergency state corresponds to the
failure of one node or channel; if a node is damaged, then all channels incident to it
also are not able to function). It is required that, for each demand edge q = (v, w) ∈ D
and for each emergency state s, the survived subnetwork still be able to route at least
ρsq dq information units between nodes v and w, where 0 < ρsq ≤ 1.
In the case of congestion of communication channels, some connections, which
are the paths carrying information between pairs of nodes, can be very long. Due
to various reasons (for example, to reduce delays of establishing connections, or to
reduce the load of computers), it is necessary to limit the length (number of edges)
of the communication paths. Let lq be the maximum path length for a demand q ∈ D.
Now we can formulate the problem of designing a reliable telecommunication
network as follows: it is needed to find in G a subgraph G0 = (V, E 0 ) with the mini-
def
mum total cost of its edges, c(G0 ) = ∑e∈E 0 ce , provided that the capacities of edges in
G0 are sufficient for routing (taking into account the restriction on the path lengths)
the required amounts of information in the normal and all emergency states.
Let us agree to consider the normal (failure-free) state of the network as state 0.
We set ρ0,q = 1 for all q ∈ D. For each state s = 0, . . . , S and each demand q =
(v, w) ∈ D, let P(s, q) denote the set of feasible v, w-paths. For the normal state
(s = 0), a feasible v, w-path is any path from v to w of length at most lq in the graph
G0 = (V0 = V, E0 = E). For any emergency state s = 1, . . . , S, a feasible v, w-path is
any path from v to w in Gs .
For each state s = 0, . . . , S, each demand q = (v, w) ∈ D, and each path P ∈
P(s, q), we introduce a flow variable fPsq that represents the amount of informa-
tion circulating between nodes v and w along path P in state s. For each edge e ∈ E
we introduce a binary variable xe with xe = 1 only if channel e is included into the
communication network (e ∈ E 0 ). In these variables, the telecommunication network
design problem is formulated as follows:

∑ ce xe → min, (7.20a)
e∈E

∑ ∑ fPsq ≤ ue xe , e ∈ Es , s = 0, . . . , S, (7.20b)
q∈D P∈P(s,q): e∈E(P)

∑ fPsq ≥ ρsq dq , q ∈ D, s = 0, . . . , S, (7.20c)


P∈P(s,q)
204 7 Branch-And-Price

fPsq ≥ 0, P ∈ P(s, q), q ∈ D, s = 0, . . . , S, (7.20d)


xe ∈ {0, 1}, e ∈ E. (7.20e)

Here E(P) denotes the set of edges on a path P.


Objective (7.20a) is to minimize the network cost. Inequalities (7.20b) impose the
capacity restrictions for the communication channels: if channel e ∈ E is included in
the network G0 (xe = 1), then the total (along all paths) information flow, transmitted
through this channel in any of the states, must not exceed the channel capacity;
if channel e is not included in the network (xe = 0), the information can not be
transmitted via this channel, i.e., fPsq = 0 for all paths P passing through edge e (e ∈
E(P)). Inequalities (7.20c) guarantee that for any demand in any state the network
will be able to transmit the required amount of information.

7.5.1 Master Problem

Since the number of variables in (7.20) is huge, we can solve it only by the branch-
and-price method. The master problem is written very simply:

∑ ce xe + M · zsq → min, (7.21a)


e∈E

∑ ∑ fPsq ≤ ue xe e ∈ Es , s = 0, . . . , S, (7.21b)
q∈D P∈P̂(s,q): e∈E(P)

∑ fPsq + zsq ≥ ρsq dq , q ∈ D, s = 0, . . . , S, (7.21c)


P∈P̂(s,q)

fPsq ≥ 0, P ∈ P̂(s, q), q ∈ D, s = 0, . . . , S, (7.21d)


xe ∈ {0, 1}, e ∈ E. (7.21e)

Here P̂(s, q) is a subset of the set P(s, q). We have also introduced slack variables,
zsq , so that the master MIP always had a solution. In particular, this allows us to start
the branch-and-price algorithm with the empty sets P̂(s, q). As it is common in LP,
M denotes a sufficiently large number.
Before proceeding to the discussion of the pricing problem, let us note that, in our
branch-and price algorithm we can apply the standard branching on integer variables
as all integer variables, xe , are always present in the active master problem,

7.5.2 Pricing Problem

Formulating (7.20), we hid the restriction on the length of the communication paths
— which is difficult to formulate by a system of inequalities — in the definition of
7.6 Notes 205

the sets P(s, q), thereby moving the accounting of this requirement into the pricing
problem.
Let us denote the dual variables for the master relaxation LP as follows:
• αse ≤ 0 is associated with the inequality in (7.21b) written for s ∈ {0, . . . , S} and
e ∈ E;
• βsq ≥ 0 corresponds to the inequality in (7.21c) written for s ∈ {0, . . . , S} and
q ∈ D.
Let the pair of α̃ ∈ R{0,...,S} × RE and β̃ ∈ R{0,...,S} × RD constitute an optimal
dual solution to the relaxation LP for (7.21). For given s ∈ {0, . . . , S}, q ∈ D and
P ∈ P(s, q), the reduced cost of the variable fPsq is

− ∑ α̃se − β̃sq .
e∈E(P)

Therefore, the pricing problem is formulated as follows:


( )
min − ∑ α̃se : P ∈ P(s, q) (7.22)
e∈E(P)

This is the shortest path problem between nodes v and w in Gs , when the weight of
every edge e ∈ Es is −α̃se ≥ 0. If the weight of a shortest path, P̂, is less than β̃sq ,
then P̂ is added to the set P̂(s, q).
For emergency states (s = 1, . . . , S), (7.22) is the shortest path problem in the
undirected graph. For the normal state (s = 0), (7.22) is the problem of finding
a shortest path of limited length. For those who do not know how to solve this
problem, let us say that some shortest path algorithms are easily adapted to search
for shortest paths of limited length. In particular, if the path length should not exceed
k, the famous Bellman-Ford algorithm must execute only k iterates (if, of course, the
algorithm does not stop earlier).
The shortest path algorithms are studied almost in every manual on graph or
network flow theory, and therefore are not discussed here.

7.6 Notes

Sect. 7.1. The works [54, 55], devoted to solving the one-dimensional cutting stock
problem, can be considered the first application of the column generation technique
for solving IPs.
Sect. 7.2. The work of Danzig and Wolfe [45] is fundamental for using decomposi-
tion in LP. Johnson [76] was one of the first to realize both the potential and the im-
plementation complexity of the branch-and-price algorithms. The ideas and method-
ology of the branch-and-price method are discussed in [18], where an overview of
206 7 Branch-And-Price

some its applications is also given. A particular success of the branch-and-price


method is associated with the routing and scheduling problems [46].
Sect. 7.3. The generalized assignment problem has become standard for demonstrat-
ing the branch-and-price method since the work [118].
Sect. 7.5. Minoux [93] was the first who studied the problem of reliability (surviv-
ability) of the generalized multicommodity flow networks. The review [3] is devoted
to the designing reliable telecommunication networks.
Sect. 7.7. The idea of Exercise 7.6 to formulate a vehicle routing problem as a set
partitioning problem originally appeared in [16].

7.7 Exercises

7.1. Solve an instance of the cutting stock problem in which from rolls of length 128
it is necessary to cut out 12 stocks of lengths 66, 34 of length 25, 9 of length 85, 20
of length 16, and 7 of length 45.
7.2. Write down a compact formulation for the one-dimensional cutting stock prob-
lem from Sect. 7.1.1.
Hint. Suppose that the desired number of stocks can be cut out from no more than
k rolls. For example, as k, we can take the required number of rolls for a cutting plan,
built by the heuristic algorithm from Sect. 7.1.3. Use the following variables:
• xi j : number of stocks of type i to be cut out from roll j, i = 1, . . . , m, j = 1, . . . , k;
• y j = 1 if roll j is cut out, and y j = 0 otherwise, j = 1, . . . , k.

7.3. Use the branch-and-price method to solve the generalized assignment problem
with the following parameters: m = 3, K = 2,
   
1 2 2 1  
2
[cki ] = 2 2 , [pki ] = 1 1 , l = .
2
2 1 1 2

7.4. Clustering problem. Given a graph G = (V, E), each edge e ∈ E of which is
assigned a cost ce , and each vertex v ∈ V is associated with a weight wv . We need to
partition the set of vertices V into K clusters V1 , . . . ,VK (some of them may be empty)
so that the sum of the vertex weights in each of the clusters does not exceed a given
limit W , and the sum of the costs of the edges between the clusters is minimum.
Formulate this clustering problem as an IP so that the branch-and-price method
can be used to solve that IP, write down the master and pricing problems, elaborate
a branching rule.
7.5. If you have mastered Exercise 7.4, you can test yourself by numerically solving
an example of the clustering problem on the complete graph of three vertices with
three possible clusters, when the weights of all vertices and the cost of all edges are
equal to one, and the sum of node weights in any cluster is at most two.
7.7 Exercises 207

7.6. Formulate the single depot vehicle routing problem (VRP) from Sect. 2.16 as a
set partitioning problem defined on a hypergraph which vertices correspond to the
customers, and hyperedges represent all feasible routes. Elaborate a branch-and-
price algorithm that solve the IP formulation for this set partition problem: write
down the master and pricing problems, specify a branching rule.
7.7. Let us remember the problem of designing a reliable telecommunication net-
work from Sect. 7.5. Suppose now that a designing network will not reroute infor-
mation flows in case of failure of some its elements. In the design of such networks,
to increase reliability, a diversification strategy is used, which requires that, for each
demand edge q = (v, w) ∈ D, no more than δq dq (0 < δq ≤ 1) information units,
circulating between nodes v and w, pass through any channel and any node. The im-
plementation of this diversification strategy will guarantee that any damage of one
channel or one node will not affect (1−δq ) 100 % of the total amount of information
between nodes v and w.
Write down a compact IP as a model of the problem of designing diversified
telecommunication networks.
7.8. Elaborate a branch-and-price algorithm for the facility location problem formu-
lated in Sect. 2.2.
7.9. Reformulate the problem of detailed placement from Sect. 2.8 as an IP so that
the branch-and-price method can be used to solve that IP, write down the master and
pricing problems, show how to solve the pricing problem.
7.10. In Sect. 7.2 for IP (7.7) we have established the equivalence of two bounds
zLPM and zCUT . Prove that the Lagrangian relaxation provides the same bound:
def
zLPM = zCUT = zLD = minm L(u),
u∈R+

where
K
L(u) = ∑ max{(ck − uT Ak )T xk : xk ∈ X k } + uT b.
k=1

7.11. Steiner tree problem. Given a graph G = (V, E) and a set T ⊆ V of terminals.
A Steiner tree is a minimum (by inclusion) subgraph in G having a path between
any pair of terminals. Let each edge e ∈ E be assigned a cost ce ≥ 0. The Steiner
tree problem is to find in G a Steiner tree of minimum cost, where the cost of a tree
is the sum of the costs of its edges.
Let P denote the set of all paths in G connecting any pair of terminals, and
let V (P) and E(P) denote the sets of, respectively, vertices end edges on a path P.
Introducing two families of binary variables
• xe = 1 if e ∈ E is an edge of the Steiner tree, and xe = 0 otherwise,
• yP = 1 if P ∈ P is a path in the Steiner tree, and yP = 0 otherwise,
we formulate the Steiner tree problem as the following IP:
208 7 Branch-And-Price

∑ ce xe → min,
e∈E

∑ yP ≥ 1, t ∈ T,
P∈P: t∈V (P)
(7.23)
yP ≤ xe , e ∈ E, P ∈ P, e ∈ E(P),
yP ∈ {0, 1}, P ∈ P,
xe ∈ {0, 1}, e ∈ E.

Elaborate a branch-and-price algorithm for solving (7.23), specify the pricing


problem.
7.12. Steiner tree packing problem. Given a graph G = (V, E) and a family of ter-
minal sets Ti ⊆ V , i = 1, . . . , k. Let each edge e ∈ E be assigned a cost ce ≥ 0. The
Steiner tree packing problem is to find a family of k edge-disjoint Steiner trees so
that each terminal set Ti is connected by exactly one tree, and the total cost of all
edges included in the trees is minimum. It is worth noting that this packing problem
is a model for the routing problem in VLSI circuit design, where each Steiner tree
represents an electrical circuit connecting a set of terminals.
Proceeding from Formulation (7.23) for the Steiner tree problem, write down an
IP formulation for the Steiner tree packing problem. Elaborate a branch-and-price
algorithm for solving your formulation.
Chapter 8
Optimization With Uncertain Parameters

The formulations of many optimization problems include uncertain parameters.


There are several approaches to solving such problems. In stochastic programming
it is assumed that all uncertain parameters are random variables with known proba-
bility distributions. Robust optimization is used when it is required that the solution
be acceptable for all possible values of uncertain parameters. The latter is very im-
portant in those situations where small changes in the input of the problem can
significantly affect its solution.
Usually the solution of an optimization problem with uncertain parameters is
reduced to the solution of its deterministic equivalent, which, as a rule, is an opti-
mization problem that is much larger than the original problem. In this chapter we
will study only such models of stochastic programming and robust optimization, the
deterministic versions of which are MIPs.

8.1 Two-Stage Stochastic Programming Problems

Stochastic programming models include two types of variables: expected and adap-
tive. Expected variables represent those decisions that are taken here-and-now: they
do not depend on the future implementation of the random parameters. Decisions
described by adaptive variables are accepted after the values of the random param-
eters become known.
For example, consider the two-stage stochastic programming problem that is for-
mulated as follows:
cT x + E(h(ω)T y(ω)) → max,
A(ω) x + G(ω) y(ω) ≤ b(ω),
(8.1)
x ∈ X,
n
y(ω) ∈ R+y .

209
210 8 Optimization With Uncertain Parameters

In (8.1) a decision for use in the current (first) period is represented by the vector x ∈
X of the expected variables, where X ⊆ Rnx is some set (for example, X = Rn+x , X =
Zn+x or X = P(Ā, b̄; S)). A decision x ∈ X must be made in the current period before an
elementary event ω from the probability space (Ω , A , P) occurs in the next period.
A decision y(ω) is made in this next period after observing ω. Therefore, the vector
y of adaptive variables is a vector-function of ω. The system A(ω)x + G(ω)y(ω) ≤
b(ω) of stochastic constraints connects the expected and adaptive variables. The
objective function in (8.1) is the sum of two terms: deterministic cT x, estimating
the quality of the solution x, and the expected value E(h(ω)T y(ω)) of the random
variable h(ω)T y(ω), which estimates the quality of the solution y(ω).
Problem (8.1) can be reformulated as follows:

max{ f (x) : x ∈ X}, (8.2)

where f (x) = E( f (x, ω)), and the random variable f (x, ω) (see Exercise 8.3) is
determined by the rule:
def
f (x, ω) = cT x + max h(ω)T y(ω),
G(ω)y(ω) ≤ b(ω) − A(ω)x, (8.3)
n
y(ω) ∈ R+y .

If the sample space Ω is infinite, then computing f (x) can be a very difficult
problem. One approach is to approximate an infinite probability space with a finite
space. Discussion of how this is done is beyond the scope of this book. In what
follows we assume that Ω = {ω1 , . . . , ωK } is a finite set and the event (scenario)
ωk occurs with probability pk (k = 1, . . . , K). For k = 1, . . . , K, we introduce the
following notations: hk = h(ωk ), wk = pk hk , Ak = A(ωk ), Gk = G(ωk ), bk = b(ωk ),
yk = y(ωk ), nk = ny . The deterministic equivalent of the stochastic problem (8.1) is
written as follows:
K
cT x + ∑ wTk yk → max,
k=1
Ak x + Gk yk ≤ bk , k = 1, . . . , K, (8.4)
x ∈ X,
n
yk ∈ R+k , k = 1, . . . , K.

Having solved (8.4), we get a decision x for use in the current period. This de-
cision, x, must be adequate to everything that can happen in the next period. If we
knew what scenario ωk would happen in the next period, we would solve the prob-
lem
cT x + hTk yk → max,
A k x + Gk y k ≤ b k ,
x ∈ X,
n
yk ∈ R+k
8.2 Benders’ Reformulation 211

that takes into account the constraints only for this scenario. But since we do not
know which scenario will be realized in the future, in (8.4) we require that the
constraints Ak x + Gk yk ≤ bk be valid for all scenarios k = 1, . . . , K.

8.2 Benders’ Reformulation

If the number of scenarios K is very large, then (8.4) can be a very difficult opti-
mization problem. Obviously, with a huge number of scenarios, the effect of each
of them on the solution x to be made in the current period is different. The Benders
decomposition approach allows us to reformulate the problem in such a way that
information about the scenarios will be provided through cuts.
Let us first assume that the vector of expected variables x is fixed. We can find
the values of the remaining variables yk by solving K LPs:
def
zk (x) = max{wTk yk : Gk yk ≤ bk − Ak x, yk ≥ 0}, k = 1, . . . , K. (8.5)

Now let us write down the dual for each of these LPs:

zk (x) = min{uTk (bk − Ak x) : GTk uk ≥ wk , uk ≥ 0}, k = 1, . . . , K.


m
We denote by Qk the polyhedron {uk ∈ R+k : GTk uk ≥ wk } of the k-th dual LP. Here
mk denotes the number of inequalities in the system Ak x + Gk yk ≤ bk . In view of
Theorem 1.1, if the polyhedron Qk is not empty, it can be represented in the form
Qk = conv(Vk ) + cone(Ck ), where Vk is the set of vertices of Qk , and Ck is the set of
m
directing vectors for the extreme rays of the cone {uk ∈ R+k : GTk uk ≥ 0}.

Theorem 8.1. If at least one polyhedron Qk is empty, then (8.4) does not have a
solution. If all polyhedra Qk (k = 1, . . . , K) are nonempty, then (8.4) is equivalent
to the following problem:

η → max, (8.6a)
K
cT x + ∑ uTk (bk − Ak x) ≥ η, (u1 , . . . , uK ) ∈ V1 × · · · ×VK , (8.6b)
k=1
uTk (bk − Ak x) ≥ 0, uk ∈ Ck , k = 1, . . . , K, (8.6c)
x ∈ X, η ∈ R. (8.6d)

Proof. If Qk = 0/ for some k, then by Theorem 3.1 (of duality), for all x, LP (8.5),
written for this given k, does not have a solution (its objective function is unbounded
or there are no feasible solutions). Therefore, in this case, (8.4) does not have a
solution as well.
Suppose now that all sets Qk are nonempty. Then (8.4) can be formulated as
follows:
212 8 Optimization With Uncertain Parameters

K
cT x + ∑ min uTk (bk − Ak x) → max,
u ∈Q
k=1 k k
(8.7)
uTk (bk − Ak x) ≥ 0, uk ∈ Ck , k = 1, . . . , K,
x ∈ X.
Introducing a new variable
K
η= ∑ umin uTk (bk − Ak x),
k=1k ∈Qk

we can rewrite (8.7) in the form (8.6). t


u

Problem (8.6) is Benders’ reformulation of Problem (8.4). Despite the fact that
the number of inequalities in (8.6b) and (8.6c) can be huge, we can still solve (8.6)
by the branch-and-cut method that generates these inequalities, also known as Ben-
ders’ cuts, in a separation procedure. Obtaining an input vector x̄ ∈ Rn+x and a number
η̄, this procedure can work as follows.

Separation procedure. For k = 1, . . . , K, solve the LP:

min{(bk − Ak x̄)T uk : GTk uk ≥ wk , uk ∈ Rm


+ }. (8.8)

If for some k this LP has no feasible solutions, and ūk is a certificate of infea-
sibility (see Sect. 3.4), then return the cut

ūTk (bk − Ak x) ≥ 0.

If (8.8) has an optimal solution ūk for each k, and if ∑Kk=1 ūTk (bk − Ak x̄) <
η − cT x̄, then return the cut
K
cT x + ∑ ūTk (bk − Ak x) ≥ η.
k=1

Otherwise, (x̄, η̄) satisfies all inequalities in both families, (8.6b) and (8.6c).
What gives us the transition from Formulation (8.4), which is relatively compact,
to Formulation (8.6) with a huge number of constraints? The obvious plus of the
transition to Benders’ reformulation is that we essentially reduced the number of
continuous variables, more precisely, we excluded the vectors yk for k = 1, . . . , K,
and added only one new variable η. Another advantage is not so obvious. As a
rule, the fewer continuous variables we have, the stronger cuts (for example, the
fractional Gomory cuts) we can generate.

Example 8.1 We need to solve the next MIP preliminary having carried out Ben-
ders’ reformulation:
8.2 Benders’ Reformulation 213

x1 + 3x2 + 2y1 − 2y2 + 2y3 → max,


2x1 + 4x2 + 2y1 + y2 + y3 ≤ 10,
−x1 + x2 + y1 − y2 + 2y3 ≤ −2 (8.9)
x1 , x2 ∈ Z+ , x1 ≤ 3,
y1 , y2 , y3 ≥ 0.

Solution. Here K = 1 and



  
 2    
1 2 4 2 1 1 10
c= , A1 = , w1 = −2 , G1 = , b1 = .
3 −1 1 1 −1 2 −2
2

Let us define X = {x ∈ Z2+ : x1 ≤ 3}. The system of


inequalities u2 6 

2u1 + u2 ≥ 2, 3
u1 − u2 ≥ −2, (GT1 u ≥ w1 ) 2r
A
u1 + 2u2 ≥ 2 1 Ar
H Hr - -
defines the polyhedron Q1 depicted in Fig. 8.1. This 1 2 3 u1
polyhedron has three vertices Fig. 8.1

 T
1 2 2 2
u = (0, 2) , T
u = , , u3 = (2, 0)T ,
3 3

and two extreme rays with direction vectors

û1 = (1, 0)T , û2 = (1, 1)T .

So, we can reformulate (8.9) as follows:

η → max,
x1 + 3x2 + 2 (−2 + x1 − x2 ) ≥ η,
2 2
x1 + 3x2 + (10 − 2x1 − 4x2 ) + (−2 + x1 − x2 ) ≥ η,
3 3
x1 + 3x2 + 2 (10 − 2x1 − 4x2 ) ≥ η,
10 − 2x1 − 4x2 ≥ 0,
(10 − 2x1 − 4x2 ) + (−2 + x1 − x2 ) ≥ 0,
x1 , x2 ∈ Z+ , x1 ≤ 3,

or after simplifications and rearranging


214 8 Optimization With Uncertain Parameters

η → max,
η − 3x1 − x2 ≤ −4,
3η − x1 + x2 ≤ 16,
η + 3x1 + 5x2 ≤ 20,
(8.10)
x1 + 2x2 ≤ 5,
x1 + 5x2 ≤ 8,
x1 ≤ 3,
x1 , x2 ∈ Z+ .

MIP (8.10) can be solved quite easily. From the inequality x1 + 5x2 ≤ 8, by inte-
grality of x2 , we have

x2 ≤ b(8 − x1 )/5c ≤ b8/5c = 1.

Taking into account the inequalities 0 ≤ x1 ≤ 3 and 0 ≤ x2 ≤ 1, from the first three
inequalities we calculate an upper bound for η:

η ≤ −4 + 3x1 + x2 ≤ 6,
η ≤ (16 + x1 − x2 )/3 ≤ 19/3,
η ≤ 20 − 3x1 − 5x2 ≤ 20.

As η ≤ 6, a feasible solution x1∗ = 3, x2∗ = 1, η ∗ = 6 is optimal to (8.10). Substituting


x1 = 3 and x2 = 1 into (8.9), we obtain the next LP

2y1 − 2y2 + 2y3 → max,


2y1 + y2 + y3 ≤ 0,
y1 − y2 + 2y3 ≤ 0,
y1 , y2 , y3 ≥ 0,

which has a unique solution y∗1 = y∗2 = y∗3 = 0. t


u

8.3 Risks

Maximization of the expected profit implies that the decision making process is re-
peated a sufficiently large number of times under the same conditions. Only then the
asymptotic statements, such as the law of large numbers, guarantee the convergence
in probability terms of random variables to their expected values. In other situa-
tions, we cannot ignore the risk of obtaining a profit, which is significantly lower
than the expected value. The identification of suitable risk measures is the subject
of active research. We are not going to investigate the problem of risk modeling in
8.3 Risks 215

its entirety, but we will only discuss one concept of risk that is convenient for use in
optimization models.
Here we will try to expand the two-stage model of stochastic programming (8.1),
adding to it a system of inequalities that limits the risk of the decision x. The concept
of risk is more conveniently to introduce in terms of the loss function g(x, ω) that
depends on the solution x and is a random variable defined on some probability
space (Ω , A , P) (ω ∈ Ω ).
Historically, the first and perhaps most famous notion of risk was introduced by
H. Markowitz, Nobel Prize winner in Economics in 1990. He defined the risk as the
variation of the random loss value:

var(g(x, ω)) = E((g(x, ω) − E(g(x, ω)))2 ).

Conceptually, this measure of risk has several drawbacks. The most important of
them is that this measure is symmetric: it equally penalizes for receiving both
smaller and larger losses than the expected value. From the point of view of the use
in MIP, a drawback is that using this risk measure means introducing a quadratic
(nonlinear) constraint in optimization models.
Another not less well-known risk measure, called the Value-at-Risk, was devel-
oped by the financial engineers of J. P. Morgan. Let
def
G(x, η) = P{ω ∈ Ω : g(x, ω) ≤ η}

be the distribution function of the random variable g(x, ω). For a given probability
0 < α < 1, the risk of making a decision x is
def
VaRα (x) = min{η : G(x, η) ≥ α},

which is the maximum loss that occurs with probability at least α.


For a finite probability space, when Ω = {ω1 , . . . , ωK } and when event ωk occurs
with probability pk (k = 1, . . . , K), we compute VaRα (x) as follows:
def
1) list the values gk (x) = g(x, ωk ) in non-decreasing order:

gπ(1) (x) ≤ gπ(2) (x) ≤ · · · ≤ gπ(K) (x);


j
2) find the minimum index j such that ∑i=1 pπ(i) ≥ α, and set VaRα (x) = gπ( j) (x).
As an example, consider a finite probability space

Ω = {0, 1, 2, 3, 4, 5, 6, 7, 8, 9},

when the first three events, 0, 1, 2, occur with probability 16 , the next two events,
3, 4, with probability 18 , and the remaining events, 5, 6, 7, 8, 9, with probability 20 1
.
Let α = 0.9 and g(x, ω) = x − ω for x ∈ Z+ . As ωk = k − 1, then for x = 4 we have
gk (4) = g(4, k − 1) = 5 − k, k = 1, . . . , 10. Next we need to sort the values gk (4):
216 8 Optimization With Uncertain Parameters

i 1 2 3 4 5 6 7 8 9 10
π(i) 10 9 8 7 6 5 4 3 2 1
gπ(i) (4) −5 −4 −3 −2 −1 0 1 2 3 4
1 1 1 1 1 1 1 1 1 1
pπ(i)
6 6 6 8 8 20 20 20 20 20
Since
8
1 1 1
∑ pπ(i) = 3 6 + 2 8 + 3 20 = 0.9,
i=1

then j = 8 and VaR0.9 (4) = gπ(8) (4) = g3 (4) = 2.


The risk measure VaR is widely used in the financial industry, and its calcula-
tion is one of the standard attributes of most financial analysis programs. Despite
its popularity, the VaR measure is also not without drawbacks. One of these draw-
backs is that this measure does not in any way estimate the amount of losses ex-
ceeding VaRα (x). Another disadvantage of the VaR measure is that the function
VaRα (x) is not superadditive (see Exercise 4.4). In financial terminology superaddi-
tivity expresses the fact that diversification of investments reduces the risk. Another
disadvantage of the VaR measure is that it is difficult to use the function VaRα (x)
in optimization because it is not convex. These and other shortcomings of the VaR
measure motivated the appearance of a number of its modifications.
The measure CVaRα (x) (Conditional-Value-at-Risk) is defined as the expected
(average) loss, provided that the losses are not less than VaRα (x). Formally, the
value of CVaRα (x) is defined as the expectation of the random variable g(x, ω) with
the so-called α-tail distribution

def
0 if η < VaRα (x),
Gα (x, η) = G(x, η) − α
 if η ≥ VaRα (x).
1−α
∞R
By definition, CVaRα (x) = −∞ η dGα (x, η). This definition is difficult to use in
calculations. The following statement removes this difficulty.
Theorem 8.2. The following equalities are valid

CVaRα (x) = min gα (x, η) = gα (x, VaRα (x)),


η∈R

where
1
Z
def
gα (x, η) = η + max{g(x, ω) − η, 0}P(dω).
1−α Ω

As before, let us restrict ourselves to the scenario approach. Therefore, we as-


sume again that Ω = {ω1 , . . . , ωK } is a finite set and event ωk occurs with probabil-
ity pk , k = 1, . . . , K. In this case
K
1
gα (x, η) = η +
1−α ∑ pk max{gk (x) − η, 0}, (8.11)
k=1
8.3 Risks 217

def
where gk (x) = g(x, ωk ).
Continuing the example in which we calculated VaR0.9 (4) = 2, we compute

CVaR0.9 (4) = g0.9 (4, VaR0.9 (4)) = g0.9 (4, 2)


1 1 3
= 2+ · · (2 + 1) = 2 + = 3.5.
1 − 0.9 20 2

8.3.1 Extended Two-Stage Model

Again, consider the two-stage stochastic program (8.1). But now we want to maxi-
mize the expected profit by limiting the risk: CVaRα (x) ≤ r, where r is the maximum
allowable risk level. Introducing new variables zk to represent max{gk (x) − η, 0} in
Formula (8.11), we can extend the deterministic equivalent (8.4) as follows:
K
cT x + ∑ wTk yk → max,
k=1
Ak x + Gk yk ≤ bk , k = 1, . . . , K,
K
1
η+ ∑ pk zk ≤ r, (8.12)
1−α k=1
gk (x) − η − zk ≤ 0, k = 1, . . . , K,
η ∈ R, x ∈ X,
n
zk ≥ 0, yk ∈ R+k , k = 1, . . . , K.

Program (8.12) is a MIP if the functions gk (x) are linear and X is a mixed-integer
set P(Ā, b̄; S). Of particular interest is also the case when g(x, ω) = − f (x, ω), where
f (x, ω) is defined in (8.3) and is a nonlinear function. Then (8.12) can be rewritten
as follows:
K
cT x + ∑ wTk yk → max,
k=1
Ak x + Gk yk ≤ bk , k = 1, . . . , K,
K
1
η+
1−α ∑ pk zk ≤ r, (8.13)
k=1
cT x + hTk yk + η + zk ≥ 0, k = 1, . . . , K,
η ∈ R, x ∈ X,
n
zk ≥ 0, yk ∈ R+k , k = 1, . . . , K.
Let us verify that both programs, (8.12) and (8.13), are equivalent when g(x, ω) =
− f (x, ω). By definition

gk (x) = −cT x − max{hTk yk : Gk yk ≤ bk − Ak x},


218 8 Optimization With Uncertain Parameters

and since
K K
cT x + ∑ wTk yk = ∑ pk (cT x + hTk yk ),
k=1 k=1

then in an optimal solution (x∗ ; y∗1 , . . . , y∗K ; η ∗ ; z∗1 , . . . , z∗K ) to (8.13) each point y∗k is
an optimal solution to the LP

max{hTk yk : Gk yk ≤ bk − Ak x∗ },

and, therefore, gk (x∗ ) = −cT x∗ − hTk y∗k .

8.3.2 Credit Risk

Credit risk is the risk caused by the fact that the obligors do not fully fulfill their
obligations, or by a decrease in the market price of assets due to the fall of credit
ratings. For example, a portfolio of bonds from emerging markets (Brazil, India,
Russia, etc.) can most likely generate revenue, but at the same time there is a small
probability of large losses. For such investments, the distribution functions of returns
(future incomes) are asymmetric and, consequently, symmetric risk measures are not
entirely appropriate here. But the VaR measure (as well as its derivative CVaR) was
invented to assess the risks in such situations.
Consider a problem of optimizing a portfolio of n potential investments (such as
shares). We need to determine the share x j of each investment j in the portfolio.
Then the portfolio is presented by the vector x = (x1 , . . . , xn )T . The set X of feasible
solutions (portfolios) is described by the system
n
∑ x j = 1,
j=1

l j ≤ x j ≤ u j, j = 1, . . . , n,

where l j and u j are minimum and maximum possible shares of investment j.


Assume that K scenarios are possible after the completion of some planning hori-
zon. Let pk be the probability of occurrence of scenario k, and let µ k be the return
vector for this scenario, where µ kj is the return (per one enclosed dollar) of invest-
ment j. Then µ = ∑Kk=1 pk µ k is the vector of expected returns. The loss of a portfolio
x, if scenario k occurs, is determined by the formula:

gk (x) = (q − µ k )T x,

where q is the return vector, provided that the credit rating of each investment does
not change. We determine the risk of the portfolio x to be CVaRα (x) and limit this
risk to a given value r. Note that with this setting, the ”security level” of our portfolio
x is determined by choosing two parameters, α and r.
8.3 Risks 219

Under the above assumptions, the problem of maximizing the expected return of
the portfolio at a limited risk is written as follows:

µ T x → max, (8.14a)
K
1
η+
1−α ∑ pk zk ≤ r, (8.14b)
k=1
(q − µ k )T x − η − zk ≤ 0, k = 1, . . . , K, (8.14c)
n
∑ x j = 1, (8.14d)
j=1

l j ≤ x j ≤ u j, j = 1, . . . , n, (8.14e)
zk ≥ 0, k = 1, . . . , K, (8.14f)
η ∈ R. (8.14g)

Note that (8.14) is an LP. However, it can turn into a MIP after taking into account
a number of standard for portfolio optimization additional logical conditions. One
of these conditions is the requirement to diversify the investments. Suppose that the
set N = {1, . . . , n} of all investments is divided into subsets (groups) N1 , . . . , Nm ,
say, according to the sectoral or territorial principle. It is required that no more than
ni different investments be present in the portfolio from group Ni , and also that the
portfolio had investments from at least s groups.
We introduce two families of binary variables:
• y j = 1 if investment j is present in the portfolio, and y j = 0 otherwise ( j =
1, . . . , n);
• δi = 1 if at least one investment from group i is present in the portfolio, and δi = 0
otherwise (i = 1, . . . , m).
To take into account the above requirements, we need to replace (8.14e) with the
following system:

l jy j ≤ x j ≤ u jy j, j = 1, . . . , n,
∑ yi ≤ ni , i = 1, . . . , m,
j∈Ni

y j ≤ δi , j ∈ Ni , i = 1, . . . , m,
m
∑ δi ≥ s,
i=1
y j ∈ {0, 1}, j = 1, . . . , n,
δi ∈ {0, 1}, i = 1, . . . , m.
220 8 Optimization With Uncertain Parameters

8.4 Multistage Stochastic Programming Problems

Although multistage models of stochastic programming have been studied for sev-
eral decades, they have not been used in practice to solve problems of required
sizes. Only recently, with the advent of sufficiently powerful computers, stochas-
tic programming models began to be used in practice, and stochastic programming
itself began to develop at a rapid pace.
Multistage problems of stochastic programming are applied when the planning
horizon includes more than one period (stage). Let T denote the number of periods,
and ω t ∈ Ω t are events that can occur in period t, t = 1, . . . , T . At the beginning
of the planning horizon (at stage 0), an expected solution x is taken when the event
ω 1 has not yet occurred. A decision y(ω 1 , . . . , ω t ) is made at stage t, when the
events ω 1 , . . . , ω t−1 have already occurred, and the event ω t has not yet happened.
The decision y(ω 1 , . . . , ω t ) depends on the decision y(ω 1 , . . . , ω t−1 ) made at the
previous stage. The multistage model is written as follows:
T
cT x + ∑ h(ω 1 , . . . , ω t )T y(ω 1 , . . . , ω t ) → max,
t=1
A(ω 1 )x + G(ω 1 )y(ω 1 ) ≤ b(ω 1 ),
A(ω 1 , ω 2 )y(ω 1 ) + G(ω 1 , ω 2 )y(ω 1 , ω 2 ) ≤ b(ω 1 , ω 2 ),
A(ω 1 , ω 2 , ω 3 )y(ω 1 , ω 2 ) + G(ω 1 , ω 2 , ω 3 )y(ω 1 , ω 2 , ω 3 ) ≤ b(ω 1 , ω 2 , ω 3 ),
. . .. . . (8.15)
.. .
A(ω 1 , . . . , ω T )y(ω 1 , . . . , ω T −1 )+
G(ω 1 , . . . , ω T )y(ω 1 , . . . , ω T ) ≤ b(ω 1 , . . . , ω T ),
x ∈ X,
1 t
y(ω , . . . , ω ) ∈ Yt , t = 1, . . . , T.

Suppose that, for t = 1, . . . , T , the sample space Ω t is finite, and a sequence of


the events
(ω1 , . . . , ωt ) ∈ Ω 1 × · · · × Ω t
occurs with probability p(ω1 , . . . , ωt ). Obviously, the following equality must hold

∑ p(ω1 , . . . , ωt ) = 1.
(ω1 ,...,ωt )∈Ω 1 ×···×Ω t

Since many of these probabilities may be zeros, to formulate the deterministic equiv-
alent of the stochastic problem (8.15), it is convenient to introduce the concept of
the scenario tree.
The nodes of the scenario tree are numbered from 0 to n. Node 0 is the root of the
tree. The nodes that are at a distance of t from the root belong to stage t. We denote
by t(i) the stage to which node i belongs. We assume that the edges are oriented
8.4 Multistage Stochastic Programming Problems 221

in the direction from the root to the leaves, and the directed edges are called arcs.
Note that any node j, except for the root, is entered by only one arc (i, j), and then
node i is called the parent of node j and is denoted by parent( j). Each arc (i, j) of
the tree is associated with an event ω(i, j) from Ω t(i) . The problem input data are
distributed among the tree nodes as follows. The set X and the vector c0 = c are
assigned to node 0. Each of the remaining nodes, j ∈ {1, . . . n}, is associated with
the following parameters:
def
p j = p(ω(0, i1 ), ω(i1 , i2 ), . . . , ω(it( j)−1 , j)),
def
c j = h(ω(0, i1 ), ω(i1 , i2 ), . . . , ω(it( j)−1 , j)) × p j ,
def
b j = b(ω(0, i1 ), ω(i1 , i2 ), . . . , ω(it( j)−1 , j)),
def
A j = A(ω(0, i1 ), ω(i1 , i2 ), . . . , ω(it( j)−1 , j)),
def
G j = G(ω(0, i1 ), ω(i1 , i2 ), . . . , ω(it( j)−1 , j)),

where the sequence (0, i1 , . . . , it( j)−1 , j) specifies the only path in the tree leading
from the root (node 0) to node j. Note that, by definition of p j , the following equa-
tions hold:

∑ p j = 1, τ = 1, . . . , T,
j: t( j)=τ

∑ p j = pi for all i ≥ 0 such that t(i) < T .


j: parent( j)=i

Denoting by x j the vector of variables describing a decision taken at node j


(x0 = x), we write down the deterministic equivalent of (8.15) as follows:
n
∑ cTj x j → max,
j=0

A j x parent( j) + G j x j ≤ b j , j = 1, . . . , n, (8.16)
x0 ∈ X,
x j ∈ Yt( j) , j = 1, . . . , n.

Solving (8.16), in particular, we find a decision x = x0 to be taken at the beginning


of the planning horizon. Decisions x j at other nodes of the scenario tree are related
to later periods and, therefore, are not implemented in practice. When the first period
is over, today’s future will become reality and we will already know which of the
first stage scenarios was realized (we will know the value of event ω 1 ). To find a
decision to be taken in period 1 for the realized scenario, we will need to build a
new scenario tree and solve a new instance of (8.16) when the new planning horizon
starts at the beginning of period 2 of the original planning horizon. Therefore, it is
said that (8.16) is applied dynamically.
222 8 Optimization With Uncertain Parameters

It should be noted that (8.16) is a MIP only if X and all Yt are polyhedral mixed
integer sets, i.e., X = P(Ā0 , b̄0 , S0 ) and Y t = P(Āt , b̄t , St ) for t = 1, . . . , T . In the
next two sections we consider two concrete examples of the multistage stochastic
programming problem.

8.5 Synthetic Options

When forming an investment portfolio, one of the most important goals is to prevent
the portfolio yield to fall below some critical level. This can be done by including in
the portfolio derivative financial assets, such as options. In situations where deriva-
tive assets are not available, we can achieve the desired result by forming a portfolio
based on the ”synthetic option” strategy.
The input for the portfolio optimization parameters are the following:
• n: number of assets;
• T : number of periods in the planning horizon, period t begins at time t − 1 and
ends at time t;
• z0 : amount of cash at the beginning of the planning horizon;
• xi0 : amount of money invested in asset i at the beginning of the planning horizon;
• R: interest on capital (1 + rate of interest) in terms of one period;
• rit = ri (ω 1 , . . . , ω t ): random return (per one enclosed dollar) of asset i in period t;
• ρit : transaction cost when buying and selling asset i in period t; it is assumed that
all transactions are made at the very beginning of each period;
• qi : maximum share of asset i in the portfolio.
Expected variables:
b : amount of money spent in period 1 on buying asset i;
• xi1
s : amount of money received in period 1 from selling asset i.
• xi1
Adaptive variables:
• xit = xi (ω 1 , . . . , ω t ): amount of money invested in asset i in period t, t = 1, . . . , T ;
• zt = zi (ω 1 , . . . , ω t ): amount of cash at the end of period t, t = 1, . . . , T ;
• xitb = xib (ω 1 , . . . , ω t−1 ): amount of money spent in period t on buying asset i,
i = 1, . . . , n, t = 2, . . . , T ;
• xits = xis (ω 1 , . . . , ω t−1 ): amount of money received in period t from selling asset i,
i = 1, . . . , n, t = 2, . . . , T ;
• ξ = ξ (ω 1 , . . . , ω T ): random component of the portfolio value at the end of the
planning horizon (at the end of period T );
• w: risk-free (attained if the worst scenario occurs) component of the portfolio
value at the end of the planning horizon.
In the selected variables, the portfolio optimization problem is formulated as
follows:
8.5 Synthetic Options 223

λ w + (1 − λ )E(ξ ) → max, (8.17a)


n n
1
zt−1 + ∑ (1 − ρit )xits − ∑ (1 + ρit )xitb = zt , t = 1, . . . , T, (8.17b)
i=1 i=1 R
1
xi,t−1 + xitb − xits = xit , i = 1, . . . , n, t = 1, . . . , T, (8.17c)
rit
!
n
xit − qi zt + ∑ x jt ≤ 0, i = 1, . . . , n, t = 1, . . . , T, (8.17d)
j=1
n
zT + ∑ (1 − ρiT )xiT = w + ξ , (8.17e)
i=1
xitb , xits , xit ≥ 0, i = 1, . . . , n, t = 1, . . . , T, (8.17f)
zt ≥ 0, t = 1, . . . , T. (8.17g)

Objective (8.17a) is to maximize the weighted (0 ≤ λ ≤ 1) combination of two


components of the portfolio value: risk-free, w, and expected (over all possible out-
comes), E(ξ ). Equations (8.17b) are the cash balance relations: in each period t, the
amount of cash at the end of the previous period, zt−1 , plus the amount of money
obtained from selling assets, ∑ni=1 (1 − ρit )xits , minus the amount of money spent on
purchasing assets, ∑ni=1 (1 + ρit )xitb , equals the amount of cash at the end of this pe-
riod, zt , divided by the interest on capital, R. Similarly, Eqs. (8.17c) establish the
balance for each asset i in each period t: the amount of money invested in the asset
at the end of the previous period, xi,t−1 , plus the amount of money invested in the
asset, xitb , minus the amount of money received from selling the asset, xitb , equals the
amount of money invested in the asset at the end of this period, xitb , divided by the
return of the asset in this period, rit . Inequalities (8.17d) limit the shares of all assets
in the portfolio. Finally, Eq. (8.17e) isolates the risk-free component of the portfolio
value at the end of the planning horizon.

Example 8.2 An investor wants to invest an amount of z0 in one risky asset. The
planning horizon consists of T = 2 periods. As for the general model, let R denote
the interest on capital in one period. In period 1 the return of this asset is r1+ or r1−
with equal probability, and in period 2 the return is r2+ with probability 32 , and r2−
with probability 31 . The cost of the transaction when buying and selling one asset
unit is constant and equal to ρ. It is necessary to write a deterministic equivalent
for (8.17) applied to this investment problem.

Solution. By assumption, at the beginning of the planning horizon, the investor


has an amount of z0 , and no investment in this asset, x0 = 0. The scenario tree for this
example is shown in Fig. 8.2. It has 7 nodes, the number near each non-root node
is the probability that the corresponding to this node scenario occurs. In stage 1, the
scenarios 1 and 2 happen with probability 21 . In stage 2 (by the end of the planning
horizon), the scenarios 3 and 5 occur with probability 13 , and the scenarios 4 and 6
224 8 Optimization With Uncertain Parameters

1
3 3h 4h61 1
3 5h 6h16
A
K  KA 
A  A 
1 h
A Ah
1
2 1 2 2
I
@ 
@
@h
0

Fig. 8.2 Scenario tree for Example 8.2

occur with probability 16 . The decisions made at node i (i = 0, 1, 2) are represented


by the variables:
• xib : amount of money spent on buying the asset at node i;
• xis : amount of money received from selling the asset at node i.
Saying here that some decision is made at a node, we mean that this decision is
made when the scenario associated with this node occurs.
At nodes i = 1, . . . , 6, we also need the following auxiliary variables:
• xi : amount of money invested at node i;
• zi : amount of cash at node i.
Note that only two variables, x0b and x0s , are expected, and the other ones are
adaptive. In these variables, the deterministic equivalent of (8.17) applied to our
instance is written as follows:
 
1 1 1 1
λ · w + (1 − λ ) ξ3 + ξ4 + ξ5 + ξ6 → max,
3 6 3 6
1
z0 + (1 − ρ)x0s − (1 + ρ)x0b = z1 , node 1
R
1
x0 + x0b − x0s = + x1 ,
r1
1
z0 + (1 − ρ)x0s − (1 + ρ)x0b = z2 , node 2
R
1
x0 + x0b − x0s = − x2 ,
r1
1
z1 + (1 − ρ)x1s − (1 + ρ)x1b = z3 , node 3
R
1
x1 + x1b − x1s = + x3 ,
r2
1
z1 + (1 − ρ)x1s − (1 + ρ)x1b = z4 , node 4
R
1
x1 + x1b − x1s = − x4 ,
r2
8.6 Yield Management 225

1
z2 + (1 − ρ)x2s − (1 + ρ)x2b = z5 , node 5
R
1
x2 + x2b − x2s = + x5 ,
r2
1
z2 + (1 − ρ)x2s − (1 + ρ)x2b = z6 , node 6
R
b s 1
x2 + x2 − x2 = − x6 ,
r2
z3 + (1 − ρ)x3 = w + ξ3 , isolating
z4 + (1 − ρ)x4 = w + ξ4 , risk-free part
z5 + (1 − ρ)x5 = w + ξ5 , of portfolio
z6 + (1 − ρ)x6 = w + ξ6 , value
x1 , z1 , x2 , z2 , x3 , z3 , x4 , z4 , x5 , z5 , x6 , z6 ≥ 0,
x0b , x0s , x1b , x1s , x2b , x2s ≥ 0,
ξ3 , ξ4 , ξ5 , ξ6 ≥ 0. t
u

8.6 Yield Management

Yield management is an approach to revenue maximization for service firms that


exhibit the following characteristics:
1. Relatively fixed capacity. Service firms with substantial investment in facilities
(e.g., hotels and airlines) are capacity-constrained (once all the seats on a flight
are sold, further demand can be met only by booking passengers on a later flight).
2. Ability to segment its market into different customer classes. Developing various
price-sensitive classes of service gives firms more flexibility in different seasons
of the year.
3. Perishable inventory. Revenue from an unsold seat in a plane or from unsold
room in a hotel is lost forever.
4. Reservation systems are adopted by service firms to sell capacity in advance of
use. However, managers are faced with uncertainty of whether to accept an early
reservation at a discount price or to wait in hope to sell later seats or rooms at a
higher price.
5. Fluctuating demand. To sell more seats or rooms and increase revenue, in periods
of slow demand managers can lower prices, while in periods of high demand
prices are higher.
Now let us consider a concrete problem. An airline starts selling tickets for a
flight to a particular destination D days before the departure. The time horizon of
D days is divided into T periods of unequal length (for example, a time horizon of
226 8 Optimization With Uncertain Parameters

D = 60 days can be divided into T = 4 periods of length 30, 20, 7 and 3 days). The
airline can use one of K available planes, the seats in all planes are divided into the
same number, I, of classes. Plane k (k = 1, . . . , K) costs fk to hire, and has qki of
seats of class i (i = 1, . . . , I). For example, plane k may have qk,1 = 30 first class
seats, qk,2 = 40 business class seats, and qk,3 = 60 economy class seats. In plane k,
up to rki l and r h seats of class i can be transformed into seats of lower, i − 1, and
ki
l = 0 and r h = 0.
higher, i + 1, classes, i = 1, . . . , I. It is assumed that rk,1 kI
For administrative simplicity, in each period t (t = 1, . . . , T ) only O price options
can be used, and let ctio denote the price of a seat of class i (i = 1, . . . , I) in period t
if option o is used.
Demand is uncertain but it is affected by ticket prices. Let us assume that S sce-
narios are possible in each period. The probability of scenario s (1 ≤ s ≤ S) in pe-
riod t is pts , ∑Ss=1 pts = 1. The results of demand forecasting are at our disposal:
if scenario s occurs in period t, and price option o is used in this period, then the
demand for seats of class i will be dtsoi .
We have to choose a plane to hire, and to decide, for each of T periods, which
price option to use, how many seats to sell in each class (depending on demand).
Our goal is to maximize the expected yield.
Let us also note that period t starts at time t − 1 and ends at time t. Therefore, it is
assumed that the decision which option to use in period t is made at time t − 1. The
other decision how many seats of each class to sell depends on the demand in this
period; therefore, this decision is assumed to be made at time t (the end of period t).
To write a deterministic model for this stochastic problem, we need to describe a
T
scenario tree. In this application the scenario tree has n + 1 = ∑t=0 |Vt | nodes, where
Vt denotes the set of nodes in level t, t = 0, 1, . . . , T . Let us also assume that the root
of the scenario tree is indexed by 0, then V0 = {0} and V = ∪t=0 T V.
t
Each node j ∈ Vt (t = 1, . . . , T ) corresponds to one of the histories, h( j) =
(s1 , s2 , . . . , st ), that may happen after t periods, where sτ is an index of a scenario for
period τ. By definition, the history of the root node is empty, h(0) = (). The parent of
node j, denoted by parent( j), is that node in Vt−1 which history is (s1 , s2 , . . . , st−1 ),
i.e, h( j) = (h(parent( j)), st ). Note, that the root node 0 is the parent of all nodes in
V1 (of level 1). In what follows, if we say that something is doing at node j it means
that this is doing when the history h( j) is realized.
def def
For j ∈ V \ {0}, the likelihood of history h( j) is p̄ j = ∏tτ=1 pτ,sτ , p̄0 = 1. If price
def
option o is used at node j, the demand for seats of class i is d¯joi = dt,st ,o,i , and their
def
price is ctoi . Let us define c̄ joi = p̄ j ctoi .
Now we introduce the variables. The first family of binary variables is to decide
which plane to use. For each plane k we define
• vk = 1 if plane k is used, and vk = 0 otherwise.
Having hired a plane, we need to decide how to transform the seats in that plane.
So, we define two families of integer variables:
• wli : number of seats of class i to be transformed into seats of class i − 1, i =
2, . . . , I;
8.6 Yield Management 227

• whi : number of seats of class i to be transformed into seats of class i + 1, i =


1, . . . , I − 1.
With each node j ∈ V \VT of the scenario tree we associate the following decision
variables:
• y jo = 1 if price option o is used at node j, and y jo = 0 otherwise.
We also introduce two families of auxiliary variables. For each node j ∈ V \ {0}
we use the following variables:
• x joi : number of seats of class i (i = 1, . . . , I) for sale at node j using price option o
(o = 1, . . . , O);
• z ji : total number of seats of class i (i = 1, . . . , I) for sale at node j.
In these variables we write down the following deterministic model:
K O I
− ∑ fk vk + ∑ ∑ ∑ c̄ joi x joi → max, (8.18a)
k=1 j∈V \VT o=1 i=1
K
∑ vk = 1, (8.18b)
k=1
K
ui = ∑ qki vk , i = 1, . . . , I, (8.18c)
k=1
K
wli ≤ ∑ tkil vk , i = 1, . . . , I, (8.18d)
k=1
K
whi ≤ ∑ tkih vk , i = 1, . . . , I, (8.18e)
k=1
O
∑ y jo = 1, j ∈ V \VT , (8.18f)
o=1
x joi ≤ d¯jio y parent( j),o , j ∈ V \ {0}, i = 1, . . . , I, o = 1, . . . , O, (8.18g)
O
z ji = z parent( j),i + ∑ x joi , i = 1, . . . , I, j ∈ V \ {0}, (8.18h)
o=1
z j,1 + w1 ≤ u1 + wl2 , j ∈ VT ,
h
(8.18i)
z ji + wli + whi ≤ ui + whi−1 + wli+1 , i = 2, . . . , I − 1, j ∈ VT , (8.18j)
z jI + wlI ≤ uI + whI−1 , j ∈ VT , (8.18k)
x joi ∈ Z+ , j ∈ V \ {0}, o = 1, . . . , O, i = 1, . . . , I, (8.18l)
y jo ∈ {0, 1}, j ∈ V \VT , o = 1, . . . , O, (8.18m)
z ji ∈ Z+ , j ∈ V, i = 1, . . . , I, (8.18n)
z0i = 0, i = 1, . . . , I, (8.18o)
vk ∈ Z+ , k = 1, . . . , K, (8.18p)
wli , whi ∈ Z+ , i = 1, . . . , I. (8.18q)
228 8 Optimization With Uncertain Parameters

Objective (8.18a) is to maximize the profit from selling seats minus the ex-
penses for hiring a plane. Equation (8.18b) prescribes to hire just one plane, and
Eqs. (8.18c) determine the capacities of all seat classes in the hired plane. Inequal-
ities (8.18d) and (8.18e) restrict the number of seats in any class that can be trans-
formed into seats of the lower and higher classes. Equations (8.18f) prescribe to
choose only one price option at each not leaf node. The variable upper bounds
(8.18g) guarantee that, in any period and for any price option, the number of sold
seats of each class does not exceed the demand for these seats. Equations (8.18h)
calculate the total number of seats in each class that are sold in any of T periods.
Inequalities (8.18i)–(8.18k) imply that, for each class, the number of sold seats plus
the number of seats transformed into seats of the adjacent classes does not exceed
the total number of seats of this class plus the number of seats of the adjacent classes
transformed into seats of this class.
When (8.18) is solved, we know which option, o1 , to use and how many seats
of each class for sale in period 1. When period 1 is over, we will know the actual
number of seats, s1i , of each class i sold in this period. To determine a price option
and the number of seats of each class for sale in period 2, we will solve a new
planning problem for the time horizon that extends from period 2 to the end of the
planning horizon. Writing (8.18) for this new problem, we need to modify (8.18c)
in order to take into account the seats sold in period 1:
K
ui = ∑ qki vk − s1i , i = 1, . . . , I.
k=1

Similarly, we will determine a price option and the number of seats of each class for
sale in any subsequent period t when period t − 1 is over.

8.7 Robust MIPs

Considering stochastic programming applications, it was assumed that we are given


a set of scenarios with prescribed probabilities. In fact, the problem of generating
scenarios is far from being trivial. Moreover, in many (or even in most) practical ap-
plications it is impossible to produce a reasonable set of scenarios because of lack of
statistically reliable data. We can say that a robust MIP is a MIP in which a part of its
parameters (coefficients) are random variables with given ranges of values, but with
unknown probability distributions. The robust approach to solving an optimization
problem first imposes some restrictions on varying the values of problem uncertain
parameters and then seeks a solution that optimizes the objective in the worst case,
i.e., when the uncertain parameters take their worst possible values.
A rather general robust MIP is formulated as follows:
8.7 Robust MIPs 229

cT x → min,
Ax ≤ b for all A ∈ A ,
(8.19)
l ≤ x ≤ u,
x j ∈ Z, j ∈ S,

where b ∈ Rm , c, l, u ∈ Rn , A is a set of real m × n-matrices, x is an n-vector of


variables, and S ⊆ {1, . . . , n} is a set of integer variables.
You probably already noticed that in the above general model, the vectors b, c, l
and u are deterministic. This is not a limitation of the generality of (8.19). For ex-
ample, if b contains uncertain components, then we can ”move” these uncertainties
into the constraint matrix by introducing an additional variable xn+1 with a fixed
value:
Ax − bxn+1 ≤ 0, xn+1 = 1.
Similarly, if c contains uncertain components, then we can also ”move” these uncer-
tainties into the constraint matrix by introducing an additional continuous variable
z to represent the objective function:

z → min,
T
c x − z ≤ 0.

It is natural to assume that there is a certain level of conservatism when chang-


ing the uncertain parameters in (8.19). We can quantitatively represent this level of
conservatism defining
A = {A : kA − Āk ≤ ε},
where the m × n-matrix Ā is some standard value for A, and the number ε ≥ 0 quan-
titatively determines the level of conservatism. This natural representation of con-
servatism proved to be very effective for the robust LPs (when S = 0),
/ since it allows
us to reduce a robust LP to a conic linear program1 , which can be solved efficiently
(in polynomial time). From a computational point of view, such a representation of
conservatism is not entirely appropriate for the robust MIPs (S 6= 0)
/ because we are
not able yet to solve efficiently conic linear programs with integer variables.

8.7.1 Row-Wise Uncertainties

It is difficult to imagine a situation where all coefficients of the constraint matrix


of a non-trivial robust MIP are interdependent. It is natural to assume that the de-
pendent uncertain parameters are in one matrix row. The robust MIP with row-wise
uncertainties is the following optimization problem:

1 A conic linear program is a problem of minimizing a linear function over the intersection of an
affine subspace and a convex cone. In particular, an LP in standard form is a conic linear program
which convex cone is polyhedral.
230 8 Optimization With Uncertain Parameters

cT x → min,
sup aT x ≤ bi , i = 1, . . . , m,
a∈Ai (8.20)
l ≤ x ≤ u,
x j ∈ Z, j ∈ S,

where, for i = 1, . . . , m, bi ∈ R and Ai is a non-empty set from Rn , l, u ∈ Rn , x is an


n-vector of variables, and S ⊆ {1, . . . , n} is a subset of integer variables. Note that
this problem can be rewritten in Form (8.19) if we define

A = {A : Ai ∈ Ai , i = 1, . . . , m}.

In general, we can solve (8.20) by the branch-and-cut algorithm if the sets


def
Xi = {x ∈ Rn : aT x ≤ bi , a ∈ Ai }

are represented by efficient separation procedures. More precisely, to separate a


given point x̃ ∈ Rn from Xi , we need to solve the following optimization problem

max{x̃T a : a ∈ Ai }. (8.21)

To simplify further arguments, let us assume that this problem has a solution denoted
by a(x̃). If a(x̃)T x̃ > bi , then the inequality a(x̃)T x ≤ bi is valid for Xi but not for x̃;
otherwise, x̃ belongs to Xi .
For example, let us consider the case of ellipsoidal uncertainties when, for i =
1, . . . , m, Ai = {a ∈ Rn : a = ai + Pi u, kuk ≤ 1} with ai ∈ Rn and Pi symmetric and
positive defined n × n-matrix. Then (8.21) is rewritten as follows:

max{x̃T ai + (Pi x̃)T u) : kuk ≤ 1}.

The point u∗ = 1
kPi x̃k Pi x̃ is the only optimal solution to this problem (prove this!).
Therefore, for a(x̃) = ai + kP1i x̃k Pi2 x̃, if a(x̃)T x̃ = ai x̃ + kPi x̃k > bi , then the inequality
a(x̃)T x ≤ bi is valid for Xi but not for x̃; otherwise, x̃ belongs to Xi .

8.7.2 Polyhedral Uncertainties

A widely used solution approach applied to many robust optimization problems


reduces a robust program to its deterministic equivalent, which is also called the
robust counterpart. The computational difficulty of a robust program depends on
how efficiently its robust counterpart can be solved. Now we begin studying those
robust MIPs which robust counterparts are MIPs.
Problem (8.20) is a robust MIP with polyhedral uncertainties if all sets Ai ⊆ Rn
are non-empty polyhedra.
8.7 Robust MIPs 231

Theorem 8.3. If, for i = i, . . . , m, Ai = {a ∈ Rn : Hi a ≤ gi }, where Hi is a real


mi × n-matrix, and gi ∈ Rmi , then (8.20) is equivalent to the following MIP:

cT x → min,
gTi zi ≤ bi , i = 1, . . . , m,
HiT zi = x, i = 1, . . . , m,
(8.22)
i
z ≥ 0, i = 1, . . . , m,
l ≤ x ≤ u,
x j ∈ Z, j ∈ S,

which variables are x = (x1 , . . . , xn )T , and zi = (zi1 , . . . , zimi )T for i = 1, . . . , m.

Proof. For i = 0, . . . , m, let us consider the computation of


def
γi = sup aT x
a∈Ai

as an LP with a as the vector of variables, and then let us write down the dual to this
LP:
γi = max{xT a : Hi a ≤ gi } = min{gTi zi : HiT zi = x, zi ≥ 0}.
Substituting these expressions into (8.20), we obtain (8.22). t
u

8.7.3 Combinatorial Uncertainties

In this section we consider a model in which the uncertain parameters of the con-
straint matrix take values from given intervals, and the level of conservatism is ex-
pressed by the combinatorial requirement that the number of uncertain parameters
with values different from the standard ones is limited in each row of the constraint
matrix.
Let us consider (8.20) when, for i = 1, . . . , m,
n
Ai = (ai1 , . . . , ain )T : ai j ∈ [āi j − αi j , āi j + αi j ] for j = 1, . . . , n,
n
|ai j − āi j | o (8.23)
∑ αi j ≤ qi ,
j=1

where qi (i = 1, . . . , m) are parameters that controls the degree of conservatism. Here


we also have a robust problem with polyhedral uncertainties because each set Ai
(i = 1, . . . , m) is the polyhedron that is the projection (on the space of variables
ai1 , . . . , ain ) of the polyhedron given by the following system of inequalities:
232 8 Optimization With Uncertain Parameters

āi j − αi j δi j ≤ ai j ≤ āi j + αi j δi j , j = 1, . . . , n,
n
∑ δi j ≤ qi ,
j=1

0 ≤ δi j ≤ 1, j = 1, . . . , n.

We cannot apply Theorem 8.3 directly here, since the polyhedra Ai (i = 1, . . . , m)


are not represented in the required way. Nevertheless, we can formulate a similar
theorem which proof differs only in details from the proof of Theorem 8.3.

Theorem 8.4. Robust MIP (8.20) with the uncertainties given by (8.23) is equiva-
lent to the following MIP:
n
∑ c j x j → min, (8.24a)
j=1
n n
∑ āi j x j + qi vi + ∑ wi j ≤ bi , i = 1, . . . , m, (8.24b)
j=1 j=1

vi + wi j ≥ αi j y j , j = 1, . . . , n, i = 1, . . . , m, (8.24c)
−y j ≤ x j ≤ y j , j = 1, . . . , n, (8.24d)
l j ≤ x j ≤ u j, j = 1, . . . , n, (8.24e)
x j ∈ Z, j ∈ S, (8.24f)
vi ≥ 0, i = 1, . . . , m, (8.24g)
wi j ≥ 0, i = 1, . . . , m, j = 1, . . . , n. (8.24h)

Proof. For i = 1, . . . , m and a fixed x, we define


n
def
γi (x) = max ∑ x j ai j ,
j=1

āi j − αi j δi j ≤ ai j ≤ āi j + αi j δi j , j = 1, . . . , n,
n
∑ δi j ≤ q i ,
j=1

0 ≤ δi j ≤ 1, j = 1, . . . , n.

It is easy to see that this LP has an optimal solution (ai1 , . . . , ain ; δi1 , . . . , δin ) such
that (
αi j δi j if x j ≥ 0,
ai j − āi j =
−αi j δi j if x j < 0.
Therefore, ai j x j = āi j x j + αi j δi j |x j |, and by Theorem 3.1 (of duality) we have
8.7 Robust MIPs 233
n
γi (x) − ∑ āi j x j =
j=1
( )
n n
= max ∑ αi j |x j |δi j : ∑ δi j ≤ qi , 0 ≤ δi j ≤ 1 for j = 1, . . . , n (8.25)
j=1 j=1
( )
n
= min qi vi + ∑ wi j : vi + wi j ≥ αi j |x j |, vi ≥ 0, wi j ≥ 0 for j = 1, . . . , n .
j=1

Now, to get (8.24), it remains to replace supa∈Ai aT x in (8.20) with the above
expression for γi (x). Note that the variables y j are introduced in (8.24) to represent
the modules |x j |. Therefore, for all non-negative variables x j , it is better to substitute
x j for y j . t
u

If the value of the parameter qi is non-negative integer, then the first LP in (8.25)
always has an integer solution. and, therefore, the definition of Ai means that no
more than qi of uncertain elements ai j (αi j > 0) in row i may take values other
than āi j .
Example 8.3 We need to solve the following robust IP

3x1 + 2x2 + 2x3 → max,


x1 + x2 + x3 ≤ 3,
(8.26)
x1 + x2 ≤ 2,
x1 , x2 , x3 ∈ Z+ ,

when, in each inequality, at most one non-zero coefficient can vary by not more than
one.
Solution. In this example, q1 = q2 = 1, α1,1 = α1,2 = α1,3 = 1 α2,1 = α2,2 = 1
and α2,3 = 0. Now, let us write down (8.24) applied to our instance:

−3x1 − 2x2 − 2x3 → min,


x1 + x2 + x3 + v1 + w11 + w12 + w13 ≤ 3,
x1 + x2 + v2 + w21 + w22 ≤ 2,
v1 + w11 ≥ x1 , v1 + w12 ≥ x2 , v1 + w13 ≥ x3 ,
(8.27)
v2 + w21 ≥ x1 , v2 + w22 ≥ x2 ,
x1 , x2 , x3 ∈ Z+ ,
v1 , v2 ≥ 0,
w11 , w12 , w13 , w21 , w22 ≥ 0.

Note that here we have excluded the variables y j since all the variables x j are non-
negative and, consequently, for any optimal solution, y j = x j for j = 1, 2, 3.
It is easy to verify that an optimal solution to (8.27) has the following compo-
nents:
234 8 Optimization With Uncertain Parameters

x1 = 1, x2 = 0, x3 = 1, v1 = v2 = 1,
w11 = w12 = w13 = w21 = w22 = 0.

Consequently, the point x∗ = (1, 0, 1)T is a solution to our robust program. It is worth
noting that x∗ would not be an optimal solution to (8.26) if this program were not
robust. t
u

8.7.4 Robust Single-Product Lot-Sizing Problem

Let us consider again a single-product version of the lot-sizing problem studied in


Sect. 1.7.1. A firm is producing some product. The planning horizon consists of T
periods, where period t starts at time t − 1 and ends at time t, t = 1, . . . , T . Inventory
of the product in the warehouse before the start of the planning horizon is s0 . For
each period t = 1, . . . , T , we know
• utp : production capacity (in product units);
• uts : storage capacity (in product units);
• ft : fixed production cost;
• ct : unit production cost;
• ht : unit storage cost;
• pt : unit shortage cost (penalty for not supplying a unit of product to satisfy the
demands in the first t periods).
In period t = 1, . . . , T , the demand for product, dt is an uncertain parameter that
takes values from an interval [d¯t − αt , d¯t + αt ]. To rule out large deviations in the cu-
mulative demands in all periods, we impose the following restrictions on the values
of uncertain parameters:
t
|dτ − d¯τ |
∑ ≤ qt , t = 1, . . . , T.
τ=1 ατ

It is required that the budgets of uncertainty, qt , are increasing in t, reflecting the


fact that uncertainty increases with the number of periods passed. It is also assumed
that qt − qt−1 ≤ 1 for all t = 2, . . . , T , which means that the budget of uncertainty at
any period does not exceed the number of involved uncertain parameters.
It is necessary to determine how many product units to produce in each period
in order to minimize the maximum (over all possible demands) total production,
storage and shortage cost during the planning horizon.
For t = 1, . . . , T , we introduce the following variables:
• xt : amount of product produced in period t;
• st : amount of product stored in the warehouse at the end of period t;
• yt = 1 if the product is produced in period t, and yt = 0 otherwise.
8.7 Robust MIPs 235

It is important to notice that the decision on the production level, xt , in period t is


taken at the very beginning of period t (at time t − 1), and the demand in this period
will be known only at the end of the period (at time t).
The dynamic of stock during the planning horizon is described by the following
balance equations:
st−1 + xt = dt + st , t = 1, . . . , T. (8.28)
Let us note that here s0 is not a variable but a constant. Each balance equation in
(8.28) relates two neighboring periods: the amount of product, st−1 , in the ware-
house at the end of period t − 1 plus the amount, xt , produced in period t equals the
demand, dt , in period t plus the amount, st , stored in the warehouse at the end of
period t.
Using (8.28) we can get explicit expressions for the stock variables:
t
st = s0 + ∑ (xτ − dτ ), t = 1, . . . , T. (8.29)
τ=1

It is important to notice that, from the point of view of robust optimization, the bal-
ance relations written in Form (8.29) are preferable to those written in Form (8.28).
This is because in (8.29) the expression for st contains all uncertain parameters,
d1 , . . . , dt , affecting the value of st .
Having determined

def
n t
|dτ − d¯τ |
At = (d1 , . . . , dt ) : ∑ ≤ qt ,
τ=1 ατ
o
d¯τ − ατ ≤ dτ ≤ d¯τ + ατ for τ = 1, . . . ,t

for t = 1, . . . , T , we can write the following robust problem:


T
∑ ( ft yt + ct xt + zt ) → min, (8.30a)
t=1
t
1
s0 + ∑ (xτ − dτ ) ≤ zt , (d1 , . . . , dt ) ∈ At , t = 1, . . . , T, (8.30b)
τ=1 ht
t
1
−s0 − ∑ (xτ − dτ ) ≤ zt , (d1 , . . . , dt ) ∈ At , t = 1, . . . , T, (8.30c)
τ=1 pt
t
s0 + ∑ (xτ − dτ ) ≤ uts , (d1 , . . . , dt ) ∈ At , t = 1, . . . , T, (8.30d)
τ=1
0 ≤ xt ≤ utp yt , t = 1, . . . , T, (8.30e)
yt ∈ {0, 1}, t = 1, . . . , T. (8.30f)

Objective (8.30a) is to minimize the total expenses over all T periods. Here,
in view of (8.29), (8.30b) and (8.30c), each variable zt represents the value of
max{ht st , −pt st }, which is either the cost of storing st product units in period t
236 8 Optimization With Uncertain Parameters

if st ≥ 0, or, if st < 0, the penalty for not supplying −st product units in the
first t periods. Inequalities (8.30d) express the storage capacity restrictions. In-
equalities (8.30e) impose the production capacity restrictions and the implications:
yt = 0 ⇒ xt = 0.

Theorem 8.5. The robust single-product lot-sizing problem (8.30) is equivalent to


the following MIP:
T
∑ ( ft yt + ct xt + zt ) → min, (8.31a)
t=1
t t
1
s0 + ∑ (xτ − d¯τ ) + qt vt + ∑ wtτ ≤ zt , t = 1, . . . , T, (8.31b)
τ=1 τ=1 ht
t t
1
−s0 − ∑ (xτ − d¯τ ) + qt vt + ∑ wtτ ≤ zt , t = 1, . . . , T, (8.31c)
τ=1 τ=1 pt
t t
s0 + ∑ (xτ − dτ ) + qt vt + ∑ wtτ ≤ uts , t = 1, . . . , T, (8.31d)
τ=1 τ=1
vt + wtτ ≥ αt , τ = 1, . . . ,t, t = 1, . . . , T, (8.31e)
0 ≤ xt ≤ utp yt , t = 1, . . . , T, (8.31f)
zt , vt ≥ 0, t = 1, . . . , T, (8.31g)
wtτ ≥ 0, τ = 1, . . . ,t, t = 1, . . . , T, (8.31h)
yt ∈ {0, 1}, t = 1, . . . , T. (8.31i)

Proof. We cannot apply Theorem 8.4 directly because all uncertain parameters dt
are not in the constraint matrix but their linear combinations are the constant terms in
the inequalities (8.30b), (8.30c) and (8.30d). Nevertheless, we can replace in (8.30b)
and (8.30d) each occurrence of any uncertain parameter dτ with −dτ γτ− , where γτ−
is a new variable set to −1. Similarly, we replace in (8.30c) each occurrence of any
uncertain parameter dτ with dτ γτ+ , where γτ+ is a new variable set to 1. Then we
apply Theorem 8.4 to this modified version of (8.30) to obtain an equivalent MIP
that is transformed into (8.31) after substituting −1 for any variable γτ− , and 1 for
any variable γτ+ . t
u

8.8 Notes

The most comprehensively stochastic programming is presented in [28, 113]. A


simple and accessible introduction to the subject is given in [77].
Sect. 8.1. The deterministic equivalent (8.4) of the two-stage model (8.1) is also
known from linear stochastic programming (see [28, 77, 113]).
Sect. 8.2. Benders’ decomposition applied to MIPs was described in [22].
8.9 Exercises 237

Sect. 8.3. Theorem 8.2 was proved in [115]. The problem of risk measuring in
stochastic programming models with integer variables is discussed in [124]. The
application from Sect. 8.3.2 is explored in more detail in [5].
Sect. 8.4. A more detailed introduction to multi-stage models of stochastic inte-
ger programming is given in [116]. Synthetic options are considered in [146]. The
model of yield management in airline industry was adapted from [137].
Sect. 8.7. Theorem 8.4 was proved in [26]. The single-product lot-sizing robust
problem was studied in [27]. The book [24] and the survey [25] are devoted to
robust optimization and its applications.
Sect. 8.9. See [28] if you have problems with Exercise 8.2.

8.9 Exercises

8.1. You want to invest $50 000. Today the XYZ shares are sold at $20 per share. A
$700 European option gives the right (but does not obligate) in six months to buy
100 of XYZ shares at $15 per share. In addition, six-month risk-free bonds with
a face value of $100 are now sold at $90. You decided not to buy more than 20
options.
Six months later, three equally likely scenarios for the XYZ share price are pos-
sible: 1) the price will not change; 2) the price will rise to $40; 3) the price will drop
to $12.
Formulate and solve three MIPs in which you want to form a portfolio in order
to maximize:
a) expected income;
b) expected income provided that the income must not be less than $2000 for any
of three scenarios;
c) risk-free income that is determined as the income in the worst of three possible
scenarios.
Compare optimal solutions to your three models.
8.2. A newspaper seller decides how many newspapers to buy at a price of α to
sell them at α + β , provided that the demand, u, is a random variable with a dis-
tribution function G. Seller’s goal is to maximize his profit. Solve this stochastic
programming problem.
8.3. Prove that, for a fixed x, the function f (x, ω) defined by (8.3) is in fact a random
variable.
8.4. Specify how to calculate VaRα and CVaRα for a discrete probability space
when all events are equally likely.
8.5. Write explicitly (as in Example 8.1) and then solve Benders’ reformulation for
the next MIP:
238 8 Optimization With Uncertain Parameters

2x1 + 3x3 + 6y1 + 4y2 → max,


x1 + 2x3 + 4y3 + 3y3 ≤ 8,
−x1 + 3x3 − 2y1 + 4y2 ≤ 10,
x1 , x2 ∈ Z+ , x1 ≤ 4,
y1 , y2 ≥ 0.

8.6. Write Benders’ reformulation for (1.20), which is a MIP formulation of the
single-product lot-sizing problem.
8.7. The robust optimization problem with scenario-type uncertainties is formulated
as follows:
min max{cTk x : x ∈ X}, (8.32)
0≤k≤K

where X ⊆ Rn , and ck ∈ Rn is an objective vector for scenario k, k = 1, . . . , K.


Explain why Problem (8.32) can be NP-hard even for those sets X for which the
problem
max{cT x : x ∈ X}
is polynomially solvable.
8.8. An LP with probability constraints is written as follows:

cT x → max,
Ax ≤ b, (8.33)
P{Hx ≥ ξ (ω)} ≥ α,

where x is a vector of n variables, c ∈ Rn , b ∈ Rm , α ∈ (0, 1), an m × n-matrix A


and an s × n-matrix H are deterministic parameters, ω is an elementary event from
a probability space (Ω , A , P), and ξ : Ω → Rs is a random vector. Any feasible
solution x̄ to (8.33) is a solution to the system Ax ≤ b, and the probability that x̄
satisfies the system of random inequalities Hx ≥ ξ (ω) is at least α.
Prove that for a finite probability space, when Ω = {ω1 , . . . , ωK } is a finite set
and event (scenario) ωk occurs with probability pk (k = 1, . . . , K), (8.33) can be
formulated as a MIP.
8.9. Consider a variation of the problem of designing a telecommunication network
from Sect. 2.7 when d ∈ RE+ is a random demand vector that takes a value of d k ∈ RE+
with probability pk , k = 1, . . . , K. We also assume that Ineqs. (2.9c) and (2.9d) must
be satisfied with probability α ∈ (0, 1).
Formulate this variant of the telecommunication network design problem as a
MIP.
8.10. Reformulate the short-term financial management problem from Sect. 2.11 in
such a way that it can be used for medium and long-term planning. Write down a
stochastic programming model and its deterministic equivalent.
References 239

References

1. Abara, J.: Applying integer linear programming to the fleet assignment problem. Interfaces
19, 20–28 (1989)
2. van den Akker, M., Van Hoesen, P.M., Savelsbergh, M.W.P.: A polyhedral approach to single-
machine scheduling problems. Math. Program. 85, 541–572 (1999)
3. Alevras, D., Grötschel, M., Wessäly, R.: Capacity and survivability models for telecommu-
nication networks. Tech. Rep. Technical Report SC 97-22, Konrad-Zuse-Zentrum für Infor-
mationstechnik, Berlin (1997)
4. Andersen, E., Andersen, K.: Presolving in linear programming. Math. Program. 71, 221–245
(1995)
5. Anderson, F., Mausser, H., Rosen, D., Uryasev, S.: Credit risk optimization with conditional
value-at-risk criterion. Math. Program. 89, 273–291 (2001)
6. Applegate, D., Bixby, R., Chvátal, V., Cook, W.: Finding cuts in the TSP. Tech. Rep. DI-
MACS Technical Report 95-05, Rutgers University, New Brunswick, NJ (1995)
7. Applegate, D., Bixby, R., Chvátal, V., Cook, W.: Implementing the Dancig-Fulkerson-
Johnson algorithm for large traveling salesman problem. Math. Program. 97, 91–153 (2003)
8. Atamtürk, A., Nemhauser, G.L., Savelsberg, M.W.P.: Conflict graphs in solving integer pro-
gramming problems. European Journal of Oper. Res. 121, 40–45 (2000)
9. Atamtürk, A., Rajan, D.: On splittable and unsplittable flow capacitated network design arc-
set polyhedra. Math. Program. 92, 315–333 (2002)
10. Balas, E.: Facets of the knapsack polytope. Math. Program. 8, 146–164 (1975)
11. Balas, E.: Disjunctive programming. Annals of Discrete Mathematics 5, 3–51 (1979)
12. Balas, E., Bockmayr, A., Pisaruk, N., Wolsey, L.: On unions and dominants of polytopes.
Math. Program. 99, 223–239 (2004)
13. Balas, E., Ceria, S., Cornuéjols, G.: A lift-and-project cutting plane algorithm for mixed 0-1
programs. Math. Program. 58, 295–324 (1993)
14. Balas, E., Ceria, S., Cornuéjols, G., Natraj, N.: Gomory cuts revisited. Oper. Res. Lett. 19,
1–9 (1996)
15. Balinski, M.L.: On finding integer solution to linear programs. Tech. rep., Mathematica,
Princeton, N.J. (1964)
16. Balinski, M.L., Ouandt, R.: On an integer program for a delivery problem. Oper. Res. 12,
300–304 (1964)
17. Barany, I., Van Roy, T., Wolsey, L.A.: Uncapacitated lot-sizing: the convex hull of solutions.
Math. Program. Study 22, 32–43 (1984)
18. Barnhart, C., Johnson, E.L., Nemhauser, G.L., Savelsbergh, M.W.P., Vance, P.: Branch-and-
price: column generation for solving huge integer programs. Oper. Res. 46, 316–329 (1998)
19. Beale, E.M.L., Tomlin, J.A.: Special facilities in a general mathematical programming sys-
tem for nonconvex problems using ordered sets of variables. In: Proceedings of the Fifth
Annual Conference on Operational Research, pp. 447–454. J. Lawrence (ed.), Tavistock Pub-
lications (1970)
20. Beasley, J.E.: An exact two-dimensional non-guillotine cutting tree search procedure. Oper.
Res. 33, 49–64 (1985)
21. Belvaux, G., Boissin, N., Sutter, A., Wolsey, L.A.: Optimal placement of add/drop multiplex-
ers: static and dynamic models. European Journal of Oper. Res. 108, 26–35 (1998)
22. Benders, J.F.: Partitioning procedures for solving mixed-variables programming problems.
Numerische Mathematik 4, 238–252 (1962)
23. Benichou, M., Gauthier, J., Girodet, P., Hentges, G., Ribiere, G., Vincent, O.: Experiments
in mixed-integer programming. Math. Program. 1, 76–94 (1971)
24. Bental, A., Ghaoui, L.E., Nemirovski, A.: Robust Optimization. Princeton University Press
(2009)
25. Bertsimas, D., Brown, D.B., Caramanis, C.: Theory and applications of robust optimization.
SIAM Rev. 53(3), 464–501 (2011)
26. Bertsimas, D., Sim, M.: The price of robustness. Oper. Res. 52, 35–53 (2004)
240 8 Optimization With Uncertain Parameters

27. Bertsimas, D., Thiele, A.: A robust optimization approach to supply chain management. In:
D. Bienstock, G. Nemhauser (eds.) Integer Programming and Combinatorial Optimization.
IPCO 2004. Lecture Notes in Computer Science, vol. 3064, pp. 86–100. Springer, Berlin,
Heidelberg (2004)
28. Birge, J.R., Louveaux, F.V.: Introduction to stochastic programming. Springer Verlag, New
York (2011)
29. Boyd, S., Vandenberghe, L.: Convex optimization. Cambridge University Press (2004)
30. Brearley, A.L., Mitra, G., Williams, H.P.: Analysis of mathematical programming problems
prior to applying the simplex algorithm. Math. Program. 8, 54–83 (1975)
31. Burdett, C.A., Johnson, E.L.: A subadditive approach to solve linear integer programs. Ann.
Discrete Math. 1, 117–144 (1977)
32. Caprara, A., Fischetti, M.: 0,1/2-chvatal-gomory cuts. Math. Program. 74, 221–236 (1996)
33. Charnes, A.: Measuring the efficiency of decision-making units. European Journal of Oper-
ational Research 2, 429–444 (1978)
34. Chernikov, S.N.: Linear Inequalities (in Russian). Nauka, Moscow (1968)
35. Chvátal, V.: Edmonds polytopes and a hierarchy of combinatorial problems. Discrete Math.
4, 305–337 (1973)
36. Chvátal, V.: Edmonds polytopes and a weakly hamiltonian graphs. Math. Program. 5, 29–40
(1973)
37. Chvátal, V.: Linear Programming. Freeman, New York (1983)
38. Cordeau, J.F., Laporte, G., Savelsbergh, M.W., Vigo, D.: Chapter 6 vehicle routing. In:
C. Barnhart, G. Laporte (eds.) Transportation, Handbooks in Operations Research and Man-
agement Science, vol. 14, pp. 367–428. Elsevier (2007)
39. Cornuéjols, G.: Valid inequalities for mixed integer linear programs. Math. Program. 112,
3–44 (2008)
40. Cornuéjols, G., Fisher, M.L., Nemhauser, G.L.: Location of bank accounts to optimize float:
An analytic study of exact and approximate algorithms. Management Science 23, 789–810
(1977)
41. Dakin, R.: Valid inequalities for mixed integer linear programs. Computer Journal 8, 250–
255 (1965)
42. Dancig, G.: Linear programming and extentions. Princeton University Press, Princeton
(1963)
43. Dancig, G., Fulkerson, D., Johnson, S.: On a linear programming combinatorial approach to
the traveling salesman problem. Oper. Res. 7, 58–66 (1959)
44. Dancig, G.B., Fulkerson, D., Johnson, S.: Solution of a large-scale traveling salesman prob-
lem. Oper. Res. 2, 393–410 (1954)
45. Dantzig, G., Wolfe, P.: Decomposition principle for linear programs. Oper. Res. 8, 101–111
(1960)
46. Desrosiers, J., Dumas, Y., Solomon, M., Soumis, F.: Time constrained routing and schedul-
ing. Handbooks in operations research and management science 8, 35–139 (1995)
47. Dyer, M., Wolsey, L.: Formulating the single-machine sequencing problem with release dates
as a mixed integer program. Discrete Appl. Math. 26, 255–270 (1990)
48. Edmonds, J.: Paths, trees and flowers. Canadian Journal of Mathematics 17, 449–467 (1965)
49. Edmonds, J., Giles, R.: A min-max relations for submodular functions on graphs. Ann.
Discrete Math. 1, 185–204 (1977)
50. Eisenbrand, F.: On the membership problem for the elementary closure of a polyhedron.
Combinatorica 19, 297–300 (1999)
51. Forrest, J.J., Goldfarb, D.: Steepest-edge simplex algorithms for linear programming. Math.
Program. 57, 341–374 (1992)
52. Fulkerson, D.R.: Blocking and anti-blocking pairs of polyhedra. Math. Program. 1, 160–194
(1971)
53. Gavish, B., Graves, S.: The traveling salesman problem and related problems. Tech. rep.,
Graduate School of Management, University of Rochester, New York (1979). Working Paper
54. Gilmore, P.C., Gomory, R.E.: A linear programming approach to cutting stock problem.
Oper. Res. 9, 849–859 (1961)
References 241

55. Gilmore, P.C., Gomory, R.E.: A linear programming approach to cutting stock problem: Part
ii. Oper. Res. 11, 863–888 (1963)
56. Gilmore, P.C., Gomory, R.E.: The theory and computation of knapsack functions. Oper. Res.
14, 1045–1077 (1966)
57. Goldfarb, D., Reid, J.K.: A practical steepest-edge simplex algorithm. Math. Program. 12,
361–371 (1977)
58. Gomory, R.E.: Outline of an algorithm for integer solutions to linear programs. Bull. Amer.
Soc. 64, 275–278 (1958)
59. Gomory, R.E.: An algorithm for the mixed integer problem. Tech. Rep. Technical Report
RM-2597, The RAND Cooperation (1960)
60. Gomory, R.E.: Solving linear programming problems in integers. In: Proceedings of Sym-
posia in Applied Mathematics, vol. 10 (1960)
61. Gomory, R.E.: Some polyhedra related to corner problems. Linear Algebra and its Applica-
tions 2, 451–588 (1969)
62. Grötschel, M., Jünger, M., Reinelt, G.: A cutting plane algorithm for the linear ordering
problem. Oper. Res. 32, 1195–1220 (1984)
63. Grötschel, M., Lovász, L., Schrijver, A.: The ellipsoid method and its consequences in com-
binatorial optimization. Combinatorica 1, 169–197 (1981)
64. Grötschel, M., Lovász, L., Schrijver, A.: Geometric algorithms and combinatorial optimiza-
tion. Springer, Berlin (1988)
65. Grötschel, M., Padberg, M.: On the symmetric travellling salesman problem ii: lifting theo-
rems and facets. Math. Program. 16, 281–302 (1979)
66. Grunbaum, B.: Convex polytopes. Wiley, New York (1967)
67. Gu, Z., Nemhauser, G.L., Savelsbergh, M.W.P.: Lifted cover inequalities for 0-1 integer pro-
grams: computation. INFORMS J. Comput. 10, 427–437 (1998)
68. Gu, Z., Nemhauser, G.L., Savelsbergh, M.W.P.: Lifted flow cover inequalities for mixed 0-1
programs. Math. Program. A 85, 436–467 (1999)
69. Gu, Z., Nemhauser, G.L., Savelsbergh, M.W.P.: Sequence independent lifting in mixed inte-
ger programming. J. Combinat. Optim. 4, 109–129 (2000)
70. Guignard, M., Spielberg, K.: Logical reduction methods in zero-one programming. Oper.
Res. 29, 49–74 (1981)
71. Gusfield, D.: Very simple method for all pairs network flow analysis. SIAM J. Comput. 19,
143–155 (1990)
72. Hammer, P.L., Johnson, E.L., Peled, U.N.: Facets of regular 0-1 polytopes. Math. Program.
8, 179–206 (1975)
73. Heller, I., Tompkins, C.B.: An extension of a theorem of dancig’s. In: Linear inequalities and
related systems, ed H.W. Kuhn and A.W. Tucker, pp. 247–252. Princeton University Press,
Princeton, N.J. (1956)
74. Hoffman, K., Padberg, M.: Improving representations of zero-one linear programs for
branch-and-cut. ORSA Journal of Computing 3, 121–134 (1991)
75. Ibarra, O.H., Kim, C.E.: Fast approximations algorithms for the knapsack and sum of subset
problems. Journal of the ACM 22, 463–468 (1975)
76. Johnson, E.L.: Modeling and strong linear programs for mixed integer programming. In:
Wallace S.W. (eds) Algorithms and Model Formulations in Mathematical Programming.
NATO ASI Series (Series F: Computer and Systems Sciences), vol. 51, pp. 1–41. Springer,
Berlin, Heidelberg (1989)
77. Kall, P., Wallace, S.: Stochastic programming. Wiley (1994)
78. Kallenberg, L.C.M.: Linear programming and finite markovian control problems. Tech. Rep.
148, Mathematisch Centrum, Math. Centre Tract, Amsterdam (1983)
79. Karger, D.R.: Minimum cuts in near-linear time. Journal of the ACM 47, 46–76 (2000)
80. Kaufmann, A., Henry-Labordére, A.: Méthodes et modeles de la recherche operationnelle.
Dunon, Paris-Bruxelles-Montreal (1974)
81. Khachian, L.G.: Complexity of linear programming problems (in Russian). Moscow (1987)
82. Kondili, E., Pantelides, C.C., Sargent, R.W.H.: A general algorithm for short-term scheduling
of batch operations – I. MILP formulation. Computers chem. Engng. 17, 211–227 (1993)
242 8 Optimization With Uncertain Parameters

83. Land, A.H., Doig, A.G.: An automatic method for solving discrete programming problems.
Econometrica 28, 497–520 (1960)
84. Laporte, G., Nobert, Y.: A branch and bound algorithm for the capacitated vehicle routing
problem. OR Spektrum 5, 77–85 (1983)
85. Letchford, A.L.: On disjunctive cuts for combinatorial optimization. Journal of Combinato-
rial Optimization 5, 299–315 (2001)
86. Letchford, A.L.: Totally tight chvátal-gomory cuts. Oper. Res. Lett. 30, 71–73 (2002)
87. Lovász, L., Schrijver, A.: Cones of matrices and set-functions and 0-1 optimization. SIAM
J. Optim. 1, 166–190 (1991)
88. Marchand, H., Martin, A., Weismantel, R., Wolsey, L.: Cutting planes in integer and mixed
integer programming. Discrete Appl. Math. 123, 397–446 (2002)
89. Marchand, H., Wolsey, L.A.: Aggregation and mixed integer rounding to solve mips. Oper.
Res. 49, 363–371 (2001)
90. Markowitz, H.: Portfolio Selection: Efficient Diversification of Investments. Wiley, New
York (1959)
91. Miller, A.J., Wolsey, L.A.: Tight formulations for some simple mixed integer programs and
convex objective integer programs. Math. Program. 98, 73–88 (2003)
92. Miller, C.E., Tucker, A.W., Zemlin, R.A.: Integer programming formulations and the travel-
ing salesman problem. J. Assoc. Comput. Mach. 7, 326–329 (1960)
93. Minoux, M.: Optimum synthesis of a network with non-simultaneous multicommodity flow
requirements. In: P. Hansen (ed.) Studies on Graphs and Discrete Programming, pp. 269–
277. North-Holland Publishing Company (1981)
94. Minoux, M.: Programmation Mathémattique. Bordas et C.N.E.T.-E.N.S.T., Paris (1989)
95. Murty, R.G., Yu, F.T.: Linear complementarity, linear and nonlinear programming (internet
edition). http://ioe.engin.umich.edu/people/fac/books/murty/linear complementarity web-
book (1993)
96. Naddef, D.: Polyhedral theory and branch-and-cut algorithms for the symmetric tsp. In:
G. Gutin, A. Punnen (eds.) The traveling salesman problem and its variations, pp. 29–116.
Kluwer Academic Publishers (2002)
97. Nagamochi, H., Ibaraki, T.: Computing edge connectivity in multigraphs and capacitated
graphs. SIAM J. Disc. Math. 5, 54–66 (1992)
98. Nemhauser, G.L., Wolsey, L.A.: Integer and Combinatorial Optimization. Wiley (1988)
99. Nemhauser, G.L., Wolsey, L.A.: A recursive procedure to generate all cuts for 0–1 mixed
integer programs. Math. Program. 46, 379–390 (1990)
100. Nisan, N.: Bidding and allocation in combinatorial auctions. In: Proceedings ACM Confer-
ence on Electronic Commerce (EC-00), pp. 1–12. Minneapolis, MN (2000)
101. Orgler, Y.: An unequal period model for cash management decisions. Management Science
16, B77–B92 (1969)
102. Padberg, M.: Linear Optimization and Extensions. Springer-Verlag, Berlin, Heidelberg
(1995)
103. Padberg, M.W.: On the facial structure of set packing polyhedra. Math. Program. 5, 199–215
(1973)
104. Padberg, M.W.: A note on zero-one programming. Oper. Res. 23, 833–837 (1975)
105. Padberg, M.W., Rao, M.R.: Odd minimum cut-sets and b-matchings. Math. Oper. Res. 7,
67–80 (1982)
106. Padberg, M.W., Rinaldi, G.: Optimization of a 532 city symmetric traveling salesman prob-
lem by branch and cut. Oper. Res. Lett. 6, 1–7 (1987)
107. Padberg, M.W., Rinaldi, G.: Facet identification for the symmetric traveling salesman poly-
tope. Math. Program. 47, 219–257 (1990)
108. Padberg, M.W., Van Roy, T.J., Wolsey, L.A.: Valid linear inequalities for fixed charge prob-
lems. Oper. Res. 33, 842–861 (1985)
109. Papadimitriou, C.H.: Computational complexity. Addison-Wesley Publishing Company
(1994)
110. Papadimitriou, C.H., Stieglitz, K.: Combinatorial Optimization: Algorithms and Complexity.
Prentice-Hall, Englewood Cliffs, NJ (1982)
References 243

111. Pinedo, M.L.: Handbook of Scheduling: Algorithms, Models, and Performance Analysis.
Springer (2012). Discussion of the basic properties of scheduling models, provides an up-to-
date coverage of important theoretical models in the scheduling literature as well as signifi-
cant scheduling problems that occur in the real world.
112. Precido-Walters, F., Rardin, R., Langer, M., Thai, V.: A coupled column generation, mixed
integer approach to optimal planning of intensity modulated radiation therapy for cancer.
Math. Program. 101, 319–338 (2004)
113. Prékopa, A.: Stochastic programming. Kluwer Academic Publishers, Dordrecht (1995)
114. Queyranne, M.A., Schulz, A.S.: Polyhedral approaches to machine scheduling. Tech. Rep.
Technical report 408/1994, Institut für Mathematik, Technische Universität Berlin, Berlin
(1994)
115. Rockafellar, R.T., Uryasev, S.: Optimization of conditional value-at-risk. J. Risk 2, 21–41
(2000)
116. Römisch, W., Schultz, R.: Multistage stochastic integer programs: an introduction. In: M.
Grötschel, S.O. Krumke, J. Rambau (eds.) Online Optimization of Large Scale Systems, pp.
581–600. Springer, Berlin, Heidelberg (2001)
117. Saigal, R.: Linear programming: a modern integrated analysis. Kluwer Academic Publishers,
Boston/Dordrecht/London (1995)
118. Savelsbergh, M.W.P.: A branch and price algorithm for the generalized assignment problem.
Tech. Rep. COC-93-02, Computational Optimization Center, Georgia Institute of Technol-
ogy, Atlanta (1993)
119. Savelsbergh, M.W.P.: Preprocessing and probing techniques for mixed integer programming
problems. ORSA Journal on Computing 6, 445–454 (1994)
120. Scholl, A.: Balancing and sequencing of assembly lines. Physica-Verlag, Berlin, Heidelberg
(1999)
121. Schrijver, A.: On cutting planes. Ann. Discrete Math. 9, 291–296 (1980)
122. Schrijver, A.: Theory of linear and integer programming. Wiley (1986)
123. Schrijver, A.: Combinatorial optimization. Springer Verlag (2004)
124. Schultz, R.: Stochastic programming with integer variables. Math. Program. 97, 285–309
(2003)
125. Shahookar, K., Mazumder, P.: VLSI cell placement techniques. ACM Computing Surveys
23, 143–220 (1991)
126. Sheble, G.B., Fahd, G.N.: Unit commitment literature synopsis. IEEE Transactions on Power
Systems 9, 128–135 (1994)
127. Sherali, H., Adams, W.: A hierarchy of relaxations between the continuous and convex hull
representations for zero-one programming problems. SIAM J. Discr. Math. 3, 311–430
(1990)
128. Sönke, H., Briskorn, D.: A survey of variants and extensions of the resource-constrained
project scheduling problem. European Journal of Oper. Res. 207, 1–14 (2010)
129. Suhl, U.H., Szymanski, R.: Supernode processing of mixed-integer models. Computational
Optimization and Applications 3, 317–331 (1994)
130. Sutanthavibul, S., Shragowitz, E., Rosen, J.B.: An analytical approach to floorplan design
and optimization. IEEE Transactions on Computer-Aided Design 10, 761–769 (1991)
131. Sutter, A., Vanderbeck, F., Wolsey, L.A.: Optimal placement of add/drop multiplexers:
heuristic and exact algorithms. Oper. Res. 46, 719–728 (1998)
132. Tomlin, J.A.: On scaling linear programming problems. Math. Program. 4, 144–166 (1975)
133. Van Vyve, M., Wolsey, L.A.: Approximate extended formulations. Math. Program. 105,
501–522 (2006)
134. Vanderbei, R.J.: Linear Programming: Foundations and Extensions. Kluwer Academic Pub-
lishers (2001)
135. Wagner, H.M., Whitin, T.M.: Dynamic version of the economic lot size model. Management
Science 5, 89–96 (1958)
136. Weismantel, R.: On the 0/1 knapsack polytope. Math. Program. 77, 49–68 (1997)
137. Williams, H.P.: Model Building in Mathematical Programming, 5th Edition. Wiley (2013)
244 8 Optimization With Uncertain Parameters

138. Wolsey, L.A.: Faces for a linear inequality in 0-1 variables. Math. Program. 8, 165–178
(1975)
139. Wolsey, L.A.: Facets and strong valid inequalities for integer programs. Oper. Res. 24, 367–
372 (1976)
140. Wolsey, L.A.: Valid inequalities and superadditivity for 0/1 integer programs. Math. Oper.
Res. 2, 66–77 (1977)
141. Wolsey, L.A.: Strong formulations for mixed integer programs: a survey. Math. Program. 45,
173–191 (1989)
142. Wolsey, L.A.: Integer Programming. Wiley (1998)
143. Wolsey, L.A.: Solving multi-item lot-sizing problems with an mip solver using classification
and reformulation. Management Science 48, 1587–1602 (2002)
144. Wolsey, L.A.: Strong formulations for mixed integer programs: valid inequalities and ex-
tended formulations. Math. Program. 97, 423–447 (2003)
145. Yannakakis, M.: Expressing combinatorial optimization problems by linear programs. J.
Comp. Syst. Sci. 43, 441–466 (1991)
146. Zhao, Y., Ziemba, W.T.: The russell-yasuda model: A stochastic programming model using
an endogenously determined worst case risk measure for dynamic asset allocation. Math.
Program. 89, 293–309 (2001)
Index

acyclic subgraph problem, 181 balanced, 164


adjacent vertices, 13 by pseudocosts, 163
aggregation of equations, 36 node selection rule
arbitrage, 92 maximum cost, 161
assembly line balancing problem, 71 maximum depth, 161
ATM allocation problem, 70 on most fractional variable, 162
SOS1, 164
balanced airplane loading problem, 72 SOS2, 165
Benders’ strong, 163
cuts, 212
reformulation, 212 certificate
bimatrix game, 34 of infeasibility, 82, 87
binary classification problem, 34 of unboundedness, 77, 88
boolean Chvátal-Gomory procedure, 99
formula, 6 Chvátal rank
variable, 6 of a set, 99, 128
bound of an inequality, 99, 128
upper, 161 clearing problem, 71
generalized, see GUB clique, 148
variable clustering problem, 206
lower, 3 column generation algorithm, 189
upper, 3, 21, 22 combinatorial auctions, 39
branch-and-bound, 153 complementary slackness condition, 34, 88
lower bound, 153 conflict graph, 149
method, 153 control of fuel consumption, 70
record, 153 convex
solution, 154 cone, 12
search tree, 153 function, 5
upper bound, 153 hull, 12
branch-and-cut, 158 set, 12
method, 158 cover, 131
block diagram, 158 excess of, 131
branch-and-price, 183 generalized, 143
master problem, see master problem excess of, 143
method, 183, 191 inequality, 131
pricing problem, see pricing problem lifted, see LCI
branching, 153 minimal, 131

245
246 Index

mixed, 140 Hamiltonian cycle, 174


credit risk, 218 hyperedge, 37
crew scheduling problem, 38 hypergraph, 37
cut, 97 hyperplane, 13
Chvátal-Gomory, 98 facet defining, 13
fractional Gomory, 107 supporting, 13
global, 158, 165 hypothesis testing, 95
local, 158, 165
pool, 158 inequality
totally tight, 128 clique, 148, 152
cutting plane algorithm, 84, 97, 101, 108 consequence of, 16
cutting stock problem, 183 cover, 131
lifted, see LCI
Dancig-Wolfe reformulation, 188 facet defining, 17
DEA, 96 global, 158
designing a reliable telecommunication local, 158
network, 202 non-redundant, 17
detailed placement problem, 48, 207 odd cycle, 148, 152
deterministic equivalent, 210 redundant, 17
disaggregation strong, 17
of inequalities, 16 valid, 15
of variables, 19 integer program, see IP
disjunctive principle, 104 IP, 1
duality
gap, 158 KKT
theorem, 88 conditions, 10
dynamic programming point, 10
direct step of, 27 knapsack
recurrence formula, 27, 28 cover, see cover
reverse step of, 27–29 problem, 26
0, 1 , 26, 135, 193
facility location problem, 39 integer, 26, 185
Farkas’ lemma, 87 multidimensional, see m-KP
fixed charge network flow problem, 20 quadratic, 33
fleet assignment problem, 49 set, 131
floor planning, 8
flow cover inequality, 143 Lagrangian relaxation, 207
formulation LCI, 135
alternative, 22 lifting, 133
extended, 17 down, 133
approximate, 20 function, 142
ideal, 15 sequence independent, 142
function up, 133
convex, 5 line balancing problem, 43
piecewise-linear, 3 linear complementarity problem, 9
superadditive, 127, 142 linear program, see LP
lot-sizing problem
generalized multiproduct, 42
assignment problem, 191 single-product, 18, 35
cover, 143 robust, 234
upper bound, see GUB with backlogging, 71
GUB, 3, 147 LP, 1
degenerate, 75, 93
half-space, 13 dual, 75
Index 247

in canonical form, 73, 89 operation, 76


in standard form, 90 row, 76
relaxation, 30, 97 placement of logic elements, 47
with probability constraints, 238 planning treatment of cancerous tumors, 54
LP heuristic, 133 polyhedron, 12
degenerate, 75
management edge of, 13
financial, 53 face of, 13
of portfolio, 40 facet of, 13
Markov decision process, 95 integer, 16, 36
master problem, 189, 204 of full-dimension, 13
matching, 99 relaxation, 14
polytope, 100 vertex of, 13
problem, 37 polytope, 12
matrix matching, 100
basic, 74 packing, 147, 152
incidence, 37 parity, 152
totally unimodular, 36, 152 portfolio optimization, 40
method index fund, 40
branch-and-bound, 153 Markowitz Model, 118
branch-and-cut, 158 synthetic options, 222
branch-and-price, 183, 191 potentials, 75
cutting plane, 84, 97, 101, 108 preprocessing, 167
minimum cut problem, 175 pricing, 79
minimum distance of a linear code, 152 operation, 79
minimum Hamiltonian cycle problem, 174 problem, 185, 190, 193, 205
subtour elimination inequalities, 175 rule
MIP, 1 first negative, 79
robust, 228 maximum increase, 80
with combinatorial uncertainties, 231 most negative, 79
with ellipsoidal uncertainties, 230 steepest edge, 80
with polyhedral uncertainties, 230 probabilistic classifier, 95
with row-wise uncertainties, 229 probing, 172
mixed integer program, see MIP project scheduling, 56
mixed integer rounding, 105, 140 projection, 114
m-KP, 62 pseudocost, 163
multidimensional branching, 163
bin packing problem, 62
knapsack problem, see m-KP quadratic programming, 2, 9
packing problem, 61 problem, 2, 9, 119
strip packing problem, 62
relaxation LP, 25, 153
Nash equilibrium, 34 risk measure
nearest substring problem, 72 CVaRα , 216
node heuristic, 160 var, 215
NP-hard problem, 26 VaRα , 215
robust optimization, 209
optimizing hybrid cars, 52 rounding principle, 98
overdetermined system of linear equations, 94 generalized, 127

piecewise-linear approximation, 3 scenario tree, 220


pivot scheduling
column, 76 batch operations, 58
element, 76 m machines, 70
248 Index

problem, 22 optimal, 75
continuous time formulation, 23 primal feasible, see feasible
time-index formulation, 24 dual, 75
sport tournament, 71 feasible, 74
separation, 85 SOS1, 3
for convex quadratic constraints, 129 SOS2, 4, 165
for cover inequalities, 132 stationary point, see KKT point
for flow cover inequalities, 145 Steiner tree, 207
for norm cones, 130 packing problem, 208
problem, 118 problem, 207
rule stochastic programming, 209
first violated, 85 multistage problem, 220
maximum decrease, 85 two-stage problem, 209
most violated, 85 strong branching, 163
steepest edge, 85
set telecommunication network design problem,
basic, 74 46, 207
dual feasible, 75 transportation problem, 11
feasible, 74 traveling salesman problem, 174
of columns, 89
of rows, 89 unit commitment problem, 45
primal feasible, see feasible
covering problem, 37 variable
packing problem, 37, 39 adaptive, 209
partitioning problem, 2, 37–39, 207 binary, 1
polyhedral, 17 boolean, 6
special ordered discrete, 2, 164
of type 1, see SOS1 dual, 75
of type 2, see SOS2 expected, 209
shadow price, 75 priority, 163
simplex method vehicle routing problem, 66
cycling, 93 classical, 68
lexicographic rule for preventing, 93 vertex
dual, 81 of hypergraph, 37
primal, 77 of polyhedron, 74
solution degenerate, 74
basic, 74
degenerate, 74 Weyl’s theorem, 12
dual feasible, 75
feasible, 74 yield management, 225

View publication stats

You might also like