0% found this document useful (0 votes)

15 views20 pages

Review of Serial and Parallel Min-Cut/Max-Flow Algorithms For Computer Vision

This document evaluates and compares the performance of several serial and parallel minimum cut/maximum flow algorithms on computer vision problems. It finds that specialized grid-based algorithms like GridCut generally perform best on grid graphs, while pseudoflow algorithms achieve the best overall performance. Among parallel algorithms, a bottom-up merging approach works best, but no single method dominates. The most memory efficient serial algorithm is Boykov-Kolmogorov.

Uploaded by

Ashish Rp

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

15 views20 pages

Review of Serial and Parallel Min-Cut/Max-Flow Algorithms For Computer Vision

Uploaded by

Ashish Rp

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 20

1

Review of Serial and Parallel Min-Cut/Max-Flow

Algorithms for Computer Vision
Patrick M. Jensen, Niels Jeppesen, Anders B. Dahl, and Vedrana A. Dahl

Abstract—Minimum cut/maximum flow (min-cut/max-flow) algorithms solve a variety of problems in computer vision and thus significant
effort has been put into developing fast min-cut/max-flow algorithms. As a result, it is difficult to choose an ideal algorithm for a given
problem. Furthermore, parallel algorithms have not been thoroughly compared. In this paper, we evaluate the state-of-the-art serial and
parallel min-cut/max-flow algorithms on the largest set of computer vision problems yet. We focus on generic algorithms, i.e., for
unstructured graphs, but also compare with the specialized GridCut implementation. When applicable, GridCut performs best. Otherwise,
the two pseudoflow algorithms, Hochbaum pseudoflow and excesses incremental breadth first search, achieves the overall best
arXiv:2202.00418v2 [cs.CV] 20 Apr 2022

performance. The most memory efficient implementation tested is the Boykov-Kolmogorov algorithm. Amongst generic parallel
algorithms, we find the bottom-up merging approach by Liu and Sun to be best, but no method is dominant. Of the generic parallel
methods, only the parallel preflow push-relabel algorithm is able to efficiently scale with many processors across problem sizes, and no
generic parallel method consistently outperforms serial algorithms. Finally, we provide and evaluate strategies for algorithm selection to
obtain good expected performance. We make our dataset and implementations publicly available for further research.

Index Terms—Algorithms, computer vision, graph algorithms, graph-theoretic methods, parallel algorithms, performance evaluation of
algorithms and systems

1 I NTRODUCTION

M IN - CUT / MAX - FLOW algorithms are ubiquitous in computer

vision, since a large variety of computer vision problems can
be formulated as min-cut/max-flow problems. Example applications
abstract things [24, 48, 57, 74, 97, 100], e.g., candidate positions
for mesh vertices.
For energy functions which are submodular, meaning that all
include image segmentation [11, 18, 49, 50, 57, 84], stereo pairwise energy terms satisfy the condition
matching [14, 64], surface reconstruction [71], surface fitting [24,
Eij (0, 0) + Eij (1, 1) ≤ Eij (0, 1) + Eij (1, 0), (2)
60, 70, 74, 97, 101], graph matching [48], and texture restoration
[88]. In recent years, min-cut/max-flow algorithms have also found the minimization can be solved directly as a min-cut/max-flow
use in conjunction with deep learning methods — for example, problem [29, 65]. Submodular energies favor neighbors that have
to quickly generate training labels [61] or in combination with the same label, i.e., (xi , xj ) having the labels (0, 0) or (1, 1)
convolutional neural networks (CNNs) [42, 80, 93]. rather than (0, 1) or (1, 0). Therefore, submodularity imposes a
Greig et al. [40] were the first to use min-cut/max-flow local smoothness of the solution, which is useful in many computer
algorithms to solve maximum a posterior Markov random field vision problems. However, some important vision problems are
(MRF) problems in computer vision. Later, Boykov and Jolly [9] not submodular. In such cases, one can use either a submodular
showed how this could be generalized and Boykov and Kolmogorov approximation or an approach based on quadratic pseudo-Boolean
[11] proposed a fast min-cut/max-flow algorithm for computer optimization (QPBO) as described in [6, 44, 63, 88].
vision problems. Min-cut/max-flow algorithms in computer vision Due to the wide applicability of min-cut/max-flow in computer
are used to solve a large family of energy minimization problems, vision, several fast generic min-cut/max-flow algorithms have been
and the most commonly used energy function is of the form developed [11, 36, 45]. In addition, more specialized algorithms
X X have been created that exploit the grid structure of images to reduce
E(x) = Ei (xi ) + Eij (xi , xj ), (1) memory usage and run time [22, 51, 52, 86, 96]. Furthermore,
i∈P (i,j)∈N methods for dynamic problems [36, 62, 102], where a series of
where P is a set of indices for the binary variables xi ∈ {0, 1}, similar min-cut/max-flow problems are solved in succession, have
and N is a set of index pairs. A unary term Ei : {0, 1} → R is been proposed. For such problems, the result of the first solution
associated with variable xi , and a pairwise term Eij : {0, 1}2 → R can be reused to speed up computations of subsequent solutions.
is associated with the pair of variables xi , xj . As the inputs to the Finally, some papers [43, 89, 91, 103] have explored methods
energy terms are binary, the terms are often represented as lookup that also allow for distributed computation of min-cut/max-flow
tables. In a typical application, such as binary segmentation with problems across several computational nodes. This approach is
MRFs [10], P represents pixels in an image and xi represents the primarily suited for graphs too large to fit in physical memory.
assignment of pixel i. However, variables can also describe more In this paper, we focus on generic min-cut/max-flow algorithms,
which do not make assumptions about the graph structure (e.g.,
• P. M. Jensen, N. Jeppesen, A. B. Dahl, and V. A. Dahl are with the requiring a grid structure). However, for comparison, we include
Department of Applied Mathematics and Computer Science, Technical the GridCut algorithm [52] in our evaluation on grid-based graphs.
University of Denmark, Kongens Lyngby, Denmark. Furthermore, we consider only static problems where a solution is
E-mail: {patmjen, niejep, abda, vand}@dtu.dk
calculated once, without access to a previous solution (as opposed
2

to dynamic problems). Finally, for parallel algorithms, we do not preflow push-relabel (PPR) [35], and the GridCut [51] algorithms.
consider whether the algorithm works well in a distributed setting, Moreover, to reduce the influence of implementation details, we
but focus on the shared memory case where the complete graph evaluate different versions (including our own) of the Excesses
can be loaded into the memory of one machine. Incremental Breadth First Search (EIBFS) [36] and the Boykov-
The goal is that our experimental results can help researchers Kolmogorov (BK) [11] algorithm. We chose these for an extended
understand the strengths and weaknesses of the current state-of- evaluation, as EIBFS is the most recent min-cut/max-flow algorithm
the-art min-cut/max-flow algorithms and help practitioners when and BK is still widely used in the computer vision community.
choosing a min-cut/max-flow algorithm to use for a given problem. For the parallel algorithms, we provide the first comprehensive
comparison of all major approaches. This includes our own
1.1 Related Work implementation of the bottom-up merging algorithm by Liu and
Sun [75], our own version of the dual decomposition algorithm
Serial Algorithms Several papers [17, 27, 36, 95] provide
by Strandmark and Kahl [91], the reference implementation of the
comparisons of different serial min-cut/max-flow algorithms on a
region discharge algorithm by Shekhovtsov and Hlaváč [89], an
variety of standard benchmark problems. However, many of these
implementation of the parallel preflow push-relabel algorithm by
benchmark problems are small w.r.t. the scale of min-cut/max-flow
Baunstark et al. [5], and the parallel implementation of GridCut
problems that can be solved today — especially when it comes
(P-GridCut) [51]. In our comparison, we evaluate not just the run
to grid graphs. Also, graphs in which nodes are not based on an
time — including both the initialization time and the time for the
image grid are severely underrepresented. Furthermore, [17, 27,
min-cut/max-flow computations — but also the memory use of the
95] do not include all current state-of-the-art algorithms, while
implementations. Memory usage has not received much attention in
other papers do not include initialization times for the min-cut
the literature, despite it often being a limiting factor when working
computation. As shown by Verma and Batra [95], it is important
with large problems. Finally, we show that the current parallel
for practical use to include the initialization time, as algorithm
algorithm implementations have unpredictable performance and
implementations may spend as much time on initialization as on the
unfortunately often perform worse than serial algorithms.
min-cut computation. Additionally, existing papers only compare
All tested C++ implementations (except GridCut [51]), in-
reference implementations (i.e., the implementation released by the
cluding our new implementations of several algorithms, are
authors) of algorithms — the exception being that an optimized
available at https://github.com/patmjen/maxflow algorithms and
version of the BK algorithm is sometimes included, e.g., in [36].
are archived at DOI:10.5281/zenodo.4903945 [54]. We also provide
However, as implementation details — i.e., choices that are left
Python wrapper packages for several of the algorithms (includ-
unspecified by the algorithm description — can significantly impact
ing BK and HPF), which can be found at https://github.com/
performance [95], a systematic investigation of their effect is also
skielex/shrdr. All of our benchmark problems are available at
important. Finally, existing comparisons focus on determining the
DOI:10.11583/DTU.17091101.
overall best algorithm, even though, as we show in this work, the
best algorithm depends on the features of the given graph.
Parallel Algorithms To our knowledge, parallel min- 2 M IN -C UT /M AX -F LOW A LGORITHMS FOR C OM -
cut/max-flow algorithms have not been systematically compared. PUTER V ISION
Papers introducing parallel algorithms only compare with serial
To illustrate the use of min-cut/max-flow, we will sketch how a
algorithms [75, 91, 103] or a single parallel algorithm [5]. The
vision problem, image segmentation, can be solved using min-
most comprehensive comparison so far was made by Shekhovtsov
cut/max-flow. We start by introducing our notation and defining
and Hlaváč [89] who included a generic and grid-based parallel
the min-cut/max-flow problem.
algorithm. However, no paper compares with the approach by Liu
We define a directed graph G = (V, E) by a set of nodes, V ,
and Sun [75], as no public implementation is available, even though
and a set of directed arcs, E . We let n and m refer to the number
it is expected to be the fastest [89, 91]. Additionally, all papers use
of nodes and arcs, respectively. Each arc (i, j) ∈ E is assigned
the same set of computer vision problems used to benchmark serial
a non-negative capacity cij . For min-cut/max-flow problems, we
algorithms. This is not ideal, as the set lacks larger problems which
define two special terminal nodes, s and t, which are referred to
we expect to benefit the most from parallelization [56]. Therefore,
as the source and sink, respectively. The source has only outgoing
how big the performance benefits of parallelization are, and when
arcs, while the sink has only incoming arcs. Arcs to and from the
to expect them, is still to be determined.
terminal nodes are known as terminal arcs.
A feasible flow in the graph G is an assignment of non-
1.2 Contributions negative numbers (flows), fij , to each arc (i, j) ∈ E . A fea-
We evaluate current state-of-the-art generic serial and parallel min- sible flow must satisfy the following two types of constraints:
cut/max-flow algorithms on the largest set of computer vision capacity
P P fij ≤ cij , and conservation constraints,
constraints,
problems so far. We compare the algorithms on a wide range of i|(i,j)∈E fij = k|(j,k)∈E fjk for all nodes j ∈ V \ {s, t}.
graph problems including commonly used benchmarks problems, Capacity constraints ensure that the flow along an arc does not
as well as many new problem instances from recent papers — some exceed its capacity. Conservation constraints ensure that the flow
of which are significantly larger than previous problems and expose going into a node equals the flow coming out. See Fig. 1(a) for an
weaknesses in the algorithms not seen with previous datasets. Since example of the graph and a feasible flow. The value of the flow is
the performance of the algorithms varies between problems, we the total flow out of the source or, equivalently, into the sink, and
also provide concrete strategies on algorithm selection and evaluate the maximum flow problem refers to finding a feasible flow that
the expected performance of these. maximizes the flow value.
For the serial algorithms, we evaluate the reference imple- An s-t cut is a partition of the nodes into two disjoint sets S
mentations of the Hochbaum pseudoflow (HPF) [45, 46], the and T such that s ∈ S and t ∈ T . The sets S and T are referred
3

2 15 3 9 4 8/15 8/9
20 8 8/20 8/8
1 7 2 5
7 2
2/12 2/10 2/12 10/10
10 10 2/20 11
8/10 2/20
s 3/6 t s 5/6 t
8/10 5/15 10/10 7/15
12 6/10 8/10
6/6 5/6
3/5 5
6
3/3 4
9
3/3 3/4 (a) (b) (c) (d) (e)
7 9 8 3/9

(a) Graph and flow (b) Min-cut/max-flow Fig. 2: Some possibilities for associating graph nodes with en-
15 9 15 9 tities used for segmentation. Graph nodes (gray dots) associated
8
20
7 2
8 20
7 2
with (a) image pixels, (b) superpixels (c) positions in the image (d)
10
10 18 2 2 8 10 10/20
10/12 10/10 mesh faces (e) mesh vertices.
2
s t s 6 t
3 3
10 5 10/15
8 2 4 6 10/10 10
6 6 s A
5 5
3 4 3 4 B
9 9
A C
(c) Residual graph (d) Augmenting paths
2 1 1
1
+1 9/15 +1 8/9 3 1 11/15 +2 9/9 +1 A
10/20 8/8 1 10/20 8/8 0
1 1
−2
B
7 2 1/7 2
1/12 9/10 12 10/10
t B C
10/10 1 1/20
1 10/10 1
0
20
14
s 6/6
0
t s 6/6 t (a) (b) (c) (d)
10/10 7/15 10/10 7/15
0 +6 10 0 10/10
6 6/6
1
3/3 0
5
0
1
3/3 0
5
0
−4 Fig. 3: Some typical segmentation models. Terminal arcs are
0 4 0 4
+3 9 +3 9 shown only for the first example. Arcs drawn in purple have infinite
(e) Preflow push-relabel (f) Pseudoflow capacity. (a) A classical MRF segmentation with 4-connected grid
graph. (b) A multi-column graph used for segmenting layered
Fig. 1: Graph basics and serial algorithms. (a) An example of
structures. (c) Two-object segmentation with inclusion constraint.
the graph and a feasible (non-maximal) flow. The flow and capacity
(d) Three-object segmentation with mutual exclusion using QPBO.
for each arc is written as fij /cij , and (to reduce clutter) zero-values
of the flow are omitted. The flow is 8, which is not maximal, so
no s-t cut is evident. (b) The min-cut/max-flow with a value of 18,
which all min-cut/max-flow algorithms will eventually arrive at. The energy formulation (1) is convenient when min-cut/max-
(c) Residual graph for the flow from (a). (d) An intermediate flow flow algorithms are used to optimize MRFs. Here, each unary
while running the AP algorithm. In the first iteration, 10 units are energy term is a likelihood energy (negative log likelihood) of a
pushed along the path highlighted in orange and red, saturating two pixel being labeled 0 or 1. Likelihood terms are typically computed
terminal arcs (red). In the next iteration, flow is pushed along the directly from image data. The pairwise terms are defined for pairs
residual path highlighted in blue. (e) A preflow at an intermediate of pixels, so-called neighbors, and for 2D images, the neighborhood
stage of a PPR algorithm. Nodes with excess are shown in red, and structure is usually given by a 4 or 8-connectivity.
a label in green is attached to every node. (f) A pseudoflow at an The typical pairwise energy terms used in (1) are
intermediate stage of the HPF algorithm. Nodes with surplus/deficit
are shown in red/blue, a label is attached to every node, and arcs Eij (0, 0)=Eij (1, 1)=0 and Eij (0, 1)=Eij (1, 0)=βij . (3)
of the tree structure are highlighted in green. These terms penalize neighboring pixels having different labels
by a fixed amount, βij , thus encouraging smoothness of the
segmentation. In this case, the construction of the s-t graph which
to as the source and sink set, respectively. The capacity of the cut exactly represents the energy function is straightforward: The node
is the sum of capacities of the arcs going from S to T . And the set is V = P ∪ {s, t}. For terminal arc capacities, csi and cit ,
minimum cut problem refers to finding a cut that minimizes the we use the unary terms Ei (0) and Ei (1), respectively. Meanwhile,
cut capacity. Often, this partition of the nodes is all that is needed pairwise energy terms correspond to non-terminal arc capacities,
for computer vision applications. Therefore, some algorithms only such that cij = cji = βij . Fig. 3(a) shows this construction for
compute the minimum cut, and an additional step would be needed a 4-connected grid graph. The binary segmentation of the image
to extract the flow value for every arc. corresponds directly to the binary labeling given by the minimum
Finally, the max-flow min-cut theorem states that the value of cut. Put in another way, the sets S and T give the optimal labeling
the maximum flow is exactly the capacity of the minimum cut. of the nodes, and because we have a 1-to-1 mapping between non-
Fig. 1(b) shows min-cut/max-flow on a small graph. This can be terminal nodes and pixels, the node labeling is the segmentation.
shown by formulating both problems as linear programs, which However, there are many more advanced ways to formulate image
reveals that max-flow is the strong dual of the min-cut. segmentation using binary energy optimization and s-t graphs, and
ways to formulate other computer vision problems as well [65].
An example closely related to image segmentation is surface
2.1 Image Segmentation
fitting, where [74] uses arcs of infinite capacity (i.e., infinite
When formulating segmentation as a min-cut/max-flow problem, pairwise energy terms) to impose a structure to the optimal solution.
one modeling choice involves deciding which structures to repre- In Fig. 3(b), downward-pointing arcs ensure that if a pixel is in a
sent as graph nodes. Often, nodes of the graph represent individual source set, the column of pixels below it is also in the source set —
image pixels, but various other entities may be associated with so the optimal solution has to be a layer. The slanted arcs impose
graph nodes, some of which are illustrated in Fig. 2. the smoothness of this layer.
4

It is also possible to formulate multi-label/multi-object segmen- maintain the capacity constraints, the flow that is pushed equals the
tation problems that can be solved with a single s-t cut [22, 50, minimum residual capacity along the path. Conservation constrains
57, 74], or by iteratively changing and computing the cut [8, 49]. are maintained as the algorithm only updates complete paths from
For the single-cut Ishikawa method [50], it is common to duplicate s to t. The algorithm terminates when no augmenting paths can be
the graph for each label, i.e., having a sub-graph per label. For found. Fig. 1(d) shows an intermediate stage of an AP algorithm.
example, in Fig. 3, each pixel is represented by two nodes: one for The primary difference between various AP algorithms lies
object A and one for object B , so a pixel may be segmented as in how the augmenting paths are found. For computer vision
belonging to A, B , both, or neither. The submodular interaction applications, the most popular AP algorithm is the Boykov-
between the objects may be achieved by adding arcs between the Kolmogorov (BK) algorithm [11], which works by building search
sub-graphs. Fig. 3(c) shows submodular interaction, where arcs trees from both the source and sink nodes to find augmenting paths
with infinite capacity ensure that if a pixel belongs to object A, this and uses a heuristic that favors shorter augmenting paths. The BK
pixel and all its neighbors also belong to object B . This is known algorithm performs well on many computer vision problems, but
as inclusion or containment with a minimum margin of one. its theoretical run time bound is worse than other algorithms [95].
In the examples covered so far, the arcs between the graph In terms of performance, the BK algorithm has been surpassed
nodes correspond to submodular energy terms, which means the by the Incremetal Breadth First Search (IBFS) algorithm by
energy is lower when the nodes belong to the same set (S or T ). Goldberg et al. [37]. The main difference between the two
Mutual exclusion, in the general case, requires non-submodular algorithms is that IBFS maintains the source and sink search trees
energies which are not directly translatable to arcs in the graphs as breadth-first search trees, which results in both better theoretical
shown so far. An alternative is to use QPBO [63], as illustrated run time and better practical performance [36, 37].
in Fig. 3(d), which can handle any energy function of the form in
(1) — submodular or not. When using QPBO, we construct two 3.2 Preflow Push-Relabel
sub-graphs for each object: one representing the object and another The second family of algorithms are the preflow push-relabel
representing its complement. The exclusion of two objects, say A (PPR) algorithms, which were introduced by Goldberg and Tarjan
and B , is then achieved by adding inclusion arcs from A to B [35]. These algorithms use a so-called preflow, which satisfies
and from B to A. However, there is no guarantee that the min- capacity constraints but allows nodes to have more incoming
cut/max-flow solution yields a complete segmentation of the object than outgoing flow, thus violating conservation constraints. The
as the object and its complement may disagree on the labeling of difference between the incoming and outgoing flows for a node, i,
some nodes leaving them “unlabeled”. The number of unlabeled is denoted as its excess, ei ≥ 0.
nodes depends on the non-submodularity of the system. Extensions The PPR algorithms work by repeatedly pushing flow along the
to QPBO, such as QPBO-P and QPBO-I [88], may be used to individual arcs. To determine which arcs admit flow, the algorithms
iteratively assign labels to the nodes that QPBO failed to label. maintain an integer labeling (so-called height), di , for every node.
The labeling provides a lower bound on the distance from the node
to the sink and has a no steep drop property, meaning d(i)−d(j) ≤
3 S ERIAL M IN -C UT /M AX -F LOW A LGORITHMS 1 for any residual arc (i, j).
All min-cut/max-flow algorithms find the solution by iteratively An algorithm from the PPR family starts by saturating the
updating a flow that satisfies the capacity constraints. Such a flow source arcs and raising the source to d(s) = n. The algorithm then
induces a residual graph with the set of residual arcs, R, given by works by repeatedly selecting a node with excess (after selection
R = {(i, j) ∈ V ×V | (i, j) ∈ E, fij < cij or called a selected node) and applying one of two actions [19, 35]:
(4) push or relabel. If there is an arc in the residual graph leading from
(j, i) ∈ E, fji > 0}. the selected node to a lower-labeled node, push is performed. This
Each of the residual arcs has a residual capacity given by c0ij = pushes excess along the arc, until all excess is pushed or the arc
cij − fij if (i, j) ∈ E or c0ij = fji if (j, i) ∈ E . In other is saturated. If no residual arc leads to a lower node, the relabel
words, residual arcs tell us how much flow on the original arc operation is used to lift the selected node (increase its label) by
we can increase or decrease, see Fig. 1(c). If the graph contains one. Fig. 1(e) shows an intermediate step of a PPR algorithm.
bidirectional arcs, both conditions from (4) may be met, and the When there are no nodes with excess left, the preflow is the
residual capacity then equals the sum of two contributions. maximum flow. It is possible to terminate the algorithm earlier,
Serial min-cut/max-flow algorithms can be divided into three when no nodes with excess have a label di < n. At this point, the
families: augmenting paths, preflow push-relabel, and pseodoflow minimum s-t cut can be extracted by inspecting the node labels.
algorithms. In this section, we provide an overview of how If di ≥ n, then i ∈ S , otherwise i ∈ T . Extracting the maximum
algorithms from each family work. flow requires an extra step of pushing all excess back to the source.
However, this work generally only represents a small part of the run
time [95] and, for computer vision applications, we are typically
3.1 Augmenting Paths only interested in the minimum cut anyway.
The augmenting paths (AP) family of min-cut/max-flow algorithms The difference between various PPR algorithms lies in the
is the oldest of the three families and was introduced with the order in which the push and relabel operations are performed. Early
Ford-Fulkerson algorithm [28]. An algorithm from the AP family variants used simple heuristics, such as always pushing flow from
always maintains a feasible flow. It works by repeatedly finding so- the node with the highest label or using a first-in-first-out queue to
called augmenting paths, which are paths from s to t in the residual keep track of nodes with positive excess [17]. More recent versions
graph. When an augmenting path is found, a flow is pushed along [3, 33, 34] use sophisticated heuristics and a mix of local and
the path. Pushing flow means increasing flow for each forward global operations to obtain significant performance improvements
arc along the path, and decreasing flow for each reverse arc. To over early PPR algorithms.
5

Unlike other serial algorithms, the algorithms from the PPR the node it points to, a pointer to the next outgoing arc for the node
family operate locally on nodes and arcs. This, as we shall discuss it points from, a pointer to its reverse arc, and a residual capacity.
later, has resulted in a whole family of parallel PPR algorithms. For algorithms implemented with computer vision applications in
mind (e.g., BK, IBFS, and EIBFS), the terminal arcs are stored
3.3 Pseudoflow as a single combined terminal capacity for each Node, instead
of using the Arc structures. Other implementations simply keep
The most recent family of min-cut/max-flow algorithms is the
track of the source and sink nodes and use Arc structures for all
pseudoflow family, which was introduced with the Hochbaum
arcs. The HPF implementation uses a bidirectional Arc structure
pseudoflow (HPF) algorithm [45, 46]. These algorithms use a so-
with a capacity, a flow, and a direction. It is also common to store
called pseudoflow, which satisfies capacity constraints but not the
auxiliary values such as excesses, labels, or more.
conservation constraints, as it has no constraints on the difference
As a result of these differences, the memory footprint varies
between incoming and outgoing flow. As with preflow, we refer to
between implementations, as shown in Table 1. The footprint
the difference between incoming and outgoing flow for a node as
also depends heavily on the data types used to store the data, in
its excess, ei . A positive excess is referred to as a surplus and a
particular references to nodes and arcs, as we discuss in the next
negative excess as a deficit.
subsection. For storing arc capacities, integers are common because
During operation, HPF algorithms maintain two auxiliary
they are computationally efficient and may use as little as 1 byte.
structures: the forest of trees and a node labeling function. Only
However, some graph constructions involve large capacities to
one node in every tree, the root, is allowed to have an excess. The
model hard constraints, and here some care must be taken to avoid
algorithm works by repeatedly pushing the flow along the paths
overflow issues. With floats, this can be modeled using infinite
connecting the trees, and growing the trees.
capacity. However, floats are less efficient and some algorithms are
A generic algorithm from the HPF family is initialized by
not guaranteed to terminate with floats due to numerical errors.
saturating all terminal arcs. At this point, each graph node is a
As the size of the data structures influences how much the CPU
singleton tree in the forest. The algorithm then selects a tree with
can store in its caches, which has a large effect on performance,
surplus and containing at least one node with the label less than
it is generally beneficial to keep the data structures small. Note
n (the number of nodes in the graph). In this tree, i denotes the
that some compilers do not pack data structures densely by default,
node with the lowest label. If there are no residual arcs from i to
which may significantly increase the size of the Arc and Node
a node with a lower label, the label of i is incremented. If there
data structures.
is a residual arc (i, j) that leads to a node j with a lower label, a
merge is performed. This operation involves pushing surplus along 3.4.2 Indices vs. Pointers
the path from the root of the tree containing i, over i, over j , and
One way to reduce the size of the Arc and Node data structures
to the root of the tree containing j . If the arc capacities along this
on 64-bit system architectures is to use indices instead of pointers
path allow it, the entire surplus will be pushed and the trees will
to reference nodes and arcs. As long as the indices can be stored
be merged with j as the root. If the flow along the path saturates
using unsigned 32-bit integers, we can halve the size arc and node
an arc (i0 , j 0 ), a surplus will be collected in i0 , and a new tree
references by using unsigned 32-bit integers instead of pointers
rooted in i0 will be created. In contrast to the AP algorithms, the
(which are 64-bit). This approach can significantly reduce the
only restrictions on how much flow to push are the individual arc
size of the Arc and Node data structures, as the majority of the
capacities, not the path capacity.
structures consist of references to other arcs and nodes [52]. As the
The algorithm terminates when no selection can be made,
performance of min-cut/max-flow algorithms is mainly limited by
at which point nodes labeled with n constitute the source set.
memory speed, smaller data structures can often lead to improved
Additional processing is needed to recover the maximum feasible
performance. The downside of indices is that extra computations
flow. Fig. 1(f) shows an intermediate step of the HPF algorithm.
may be needed for every look-up, although this depends on the
There are two main algorithms in this family: HPF and
exact assembly instructions the compiler chooses to use.
Excesses Incremental Breadth First Search (EIBFS) [36]. The
Some grid-based algorithms [52] use 32-bit indices to reduce
main differences are the order in which they scan through nodes
the size of their data structure. The generic algorithms we have
when looking for an arc connecting two trees in the forest, and how
investigated in this work all use pointers to store references between
they push flow along the paths. Both have sophisticated heuristics
nodes and arcs. Some implementations avoid the extra memory
for these choices, which makes use of many of the same ideas
requirement by compiling with 32-bit pointers. However, 32-bit
developed for PPR algorithms.
pointers limit the size of the graph much more than 32-bit indices.
The reason is that the 32-bit pointers only have 4 GiB of address
3.4 Implementation Details space, and the Node and Arc structures they point to take up many
As stressed by [95], the implementation details can significantly bytes. For example, the smallest Arc structure we have tested, c.f .
affect the measured performance of a given min-cut/max-flow Table 1, uses 24 bytes, meaning that an implementation based on
algorithm. In this section, we will highlight the trends of modern 32-bit indices could handle graphs with 24 times more arcs than an
implementations and how they differ. implementation based on 32-bit pointers.

3.4.1 Data Structures and Data Types 3.4.3 Arc Packing

The implementations considered in this paper all use a variant of The order in which the arcs are stored may significantly affect
the adjacency list structure [21] to represent the underlying graph. performance. Arc packing is used to reduce CPU cache misses by
The most common setup mimics the BK algorithm: there is a list of storing the arcs in the same order that the algorithm will access
nodes and a list of directed (half-)arcs. Each Node structure stores them. For example, min-cut/max-flow algorithms often iterate over
a pointer to its first outgoing half-arc. Each Arc stores a pointer to outgoing arcs from a node, making it beneficial to store outgoing
6

arcs from the same node adjacent in memory. However, as arcs may Since parallel PPR algorithms parallelize over every node, they
be added to the graph in any order, packing the arcs usually incurs can achieve good speed-ups and scale well to modern multi-core
an overhead from maintaining the correct ordering or reordering processors [5], or even GPUs [96]. However, these algorithms
all arcs as an extra step before computing the min-cut/max-flow. have not achieved dominance outside of large grid graphs for min-
Similar to arc packing, node packing may improve performance. cut/max-flow problems [103]. Since GPU hardware has advanced
However, this is not done in practice as opposed to arc packing. considerably in recent years, it is unclear whether GPU method
Of the serial reference implementations that we examined, should remain restricted to grid graphs, but this question is not
only HI-PR [19], IBFS, and EIBFS implement arc packing. These within the scope of this paper.
all implement it as an extra step, where arcs are reordered after
building the graph but before the min-cut/max-flow computations 4.2 Adaptive Bottom-Up Merging
start. None of the examined implementations use node packing.
The adaptive bottom-up merging approach introduced by Liu and
Sun [75] uses block-based parallelism and has two phases, which
3.4.4 Arc Merging
are summarized in Fig. 4. In phase one, the graph is partitioned
In practice, it is not uncommon that multiple arcs between the into a number of disjoint sets (blocks), and arcs between blocks
same pair of nodes are added to the graph. Merging these arcs have their capacities set to 0 — effectively removing them from
into a single arc with a capacity equal to the sum of capacities of the graph. For each pair of blocks connected by arcs, we store a
the merged arcs may reduce the graph size significantly. As this list of the connecting arcs (with capacities now set to 0) along
decreases both the memory footprint of the graph and the number of with their original capacities. Disregarding s and t, the nodes in
arcs to be processed, it can provide substantial performance benefits each block now belong to disjoint sub-graphs and we can compute
[52, 89]. However, as redundant arcs can usually be avoided by the min-cut/max-flow solution for each sub-graph in parallel. The
careful graph construction and they should have approximately min-cut/max-flow computations are done with the BK algorithm —
the same performance impact on all algorithms, we have not although one could in theory use any min-cut/max-flow algorithm.
investigated the effects of this further.
Split graph First merge Second merge Last merge

4 PARALLEL M IN -C UT /M AX -F LOW
Like serial algorithms, parallel algorithms for min-cut/max-flow
problems can be split into families based on shared characteristics.
A key characteristic is whether the algorithms parallelize over
individual graph nodes (node-based parallelism) or split the graph (a) (b) (c) (d)
into sub-graphs that are then processed in parallel (block-based
parallelism). Other important algorithmic traits include whether Fig. 4: Illustration of the adaptive bottom-up merging ap-
the algorithm is distributed, which we do not consider in this proach for parallel min-cut/max-flow. Terminal nodes and arcs
paper, and the guarantees in terms of convergence, optimality, and are not shown. Note that the underlying graph does not have to
completeness provided by the algorithm. be a grid graph. Phase one: (a) The graph is split into blocks
We should note that since many (but not all) min-cut/max-flow and the min-cut/max-flow is computed for each block in parallel.
problems in computer vision are defined on grid graphs, several Phase two: (b) The topmost blocks are locked, merged, and the
algorithms [51, 52, 86, 96] have exploited this structure to create min-cut/max-flow recomputed. (c) As the topmost block is locked,
very efficient parallel implementations. However, many important the next thread works on the bottom-most blocks (in parallel). (d)
computer vision problems are not defined on grid graphs, so in this Last two blocks are merged and min-cut/max-flow recomputed to
paper we focus on generic min-cut/max-flow algorithms. achieve the globally optimal solution.
The category of node-based parallel algorithms is generally
dominated by parallel versions of PPR algorithms. In the block- In phase two, we merge the blocks to obtain the complete
based category, we have identified three main approaches: adaptive globally optimal min-cut/max-flow. To merge two blocks, we
bottom-up merging, dual decomposition, and region discharge, restore the arc capacities for the connecting arcs and then recompute
which we investigate. In the following sections, we give an overview the min-cut/max-flow for the combined graph. This step makes
of each approach and briefly discuss its merits and limitations. use of the fact that the BK algorithm can efficiently recompute the
min-cut/max-flow when small changes are made to the residual
graph for a min-cut/max-flow solution [62].
4.1 Parallel Preflow Push-Relabel For merges in phase two to be performed in parallel, the method
PPR algorithms have been the target of most parallelization efforts marks the blocks being merged as locked. The computational
[2, 4, 5, 22, 32, 47, 96], since both push and relabel are local threads then scan the list of block pairs, which were originally
operations, which makes them well suited for parallelization. Be- connected by arcs, until they find a pair of unlocked blocks. The
cause the operations are local, the algorithms generally parallelize thread then locks both blocks, performs the merge, and unlocks the
over each node — performing pushes and relabels concurrently. To new combined block. To avoid two threads trying to lock the same
avoid data races during these operations, PPR algorithms use either block, a global lock prevents more than one thread from scanning
locking [2] or atomic operations [47]. As new excesses are created, the list of block pairs at a time.
the corresponding nodes are added to a queue from which threads As the degree of parallelism decreases towards the end of phase
can poll them. In [5], a different approach is applied, where pushes two — since there are few blocks left to merge — performance
are performed in parallel, but excesses and labels are updated later increases when computationally expensive merges are performed
in a separate step, rather than immediately after the push. early in phase two. To estimate the cost of merging two blocks, [75]
7

uses a heuristic based on the potential for new augmenting paths to introduced a new version with a simple strategy that guarantees
be formed by merging two blocks. This heuristic determines the convergence: if the duplicated nodes in two blocks do not belong to
merging order of the blocks. the same set, S or T , after a fixed number of iterations, the blocks
By using block-based, rather than node-based parallelism, are merged and the algorithm continues. This trivially guarantees
adaptive bottom-up merging avoids much of the synchronization convergence since, in the worst case, all blocks will be merged, at
overhead that the parallel PPR algorithms suffer from. However, its which point the global solution will be computed serially. However,
performance depends on the majority of the work being performed performance significantly drops when merging is needed for the
in phase one and in the beginning of phase two, where the degree algorithm to converge, as merging only happens after a fixed
of parallelism is high. number of iterations and all blocks may (in the worst case) have to
be merged for convergence.
4.3 Dual Decomposition
4.4 Region Discharge
The dual decomposition (DD) approach was introduced by
Strandmark and Kahl [91] and later refined by Yu et al. [103]. The region discharge (RD) approach was introduced by Delong
The approach was originally designed to allow for distributed and Boykov [22] and later generalized by Shekhovtsov and Hlaváč
computing, such that it is never necessary to keep the full graph in [89]. The idea builds on the vertex discharge operation introduced
memory. Their algorithm works as follows: first, the nodes of the for PPR in [35]. Similarly to DD by Strandmark and Kahl, RD
graph are divided into a set of overlapping blocks (see Fig. 5(a)). was designed to allow for distributed computing. The method first
The graph is then split into disjoint blocks, where the nodes in the partitions the graph into a set of blocks (called regions in [89]
overlapping regions are duplicated in each block (see Fig. 5(b)). It following the terminology of [22]). Each block R has an associated
is important that the blocks overlap such that if node i is connected boundary defined as the set of nodes
to node j in block bj and node k in block bk , then i is also in both B R = {v ∈ V | v ∈
/ R, (u, v) ∈ E, u ∈ R, v 6= s, t}. (5)
blocks bj and bk .
Capacities for arcs going from a boundary node to a block node
Split graph Solve blocks Overlap disagrees Overlap agrees are set to zero. This means that flow can be pushed out of the block
into the boundary, but not vice versa. Furthermore, each node is
allowed to have an excess.
Split graph Sync. borders Re-solve blocks

(a) (b) (c) (d)

Fig. 5: Illustration of the dual decomposition approach. Termi-

nal nodes and arcs are not shown. Note that the underlying graph
does not have to be a grid graph. (a) Graph nodes are divided into (a) (b) (c)
a set of overlapping blocks. (b) The graph is split into disjoint sub-
Fig. 6: Illustration of the region discharge approach. Terminal
graphs and nodes in overlapping regions are duplicated into each
nodes and arcs are not shown. Note that the underlying graph does
blocks. (c) The min-cut/max-flow for each block is computed in
not have to be a grid graph. (a) Graph nodes are divided into a set
parallel which gives an assignment to source set (black) or sink set
of blocks and the region discharge operation is run on each block,
(white). The source/sink capacities are then adjusted for disagreeing
which pushes flow to the sink or boundary. (b) Flow is synchronized
duplicated nodes. (d) The min-cut/max-flow is recomputed and
between boundaries. (c) Region discharge is run again. The process
capacities are adjusted until all duplicated nodes agree.
repeats until no flow crosses the block boundaries.
Once the graph has been partitioned into overlapping blocks,
the algorithm proceeds iteratively. First, the min-cut/max-flow for The method then performs the region discharge operation,
each disjoint block is computed in parallel using the BK algorithm. which aims to push as much excess flow to the sink and/or the
Next, for each duplicated node, it is checked if all duplicates boundary nodes as possible (the source, s, is assumed to have
of that node are in the same s-t partitioned set, S or T . In that infinite excess). This has been done with a PPR [22, 89] or an AP
case, we say that the node duplicates agree on their assignment. algorithm (specifically BK) [89]. When using a PPR algorithm, the
If all duplicated nodes agree on their assignment, the computed discharge of a block is done by performing only push and relabel
solution is globally optimal and the algorithm terminates. If not, operations between nodes in the same block.
the terminal arc capacities for the disagreeing duplicated nodes When using the BK algorithm, a distance labeling is maintained
are updated according to a supergradient1 ascent scheme and the for the boundary nodes which gives an estimate of how many
min-cut/max-flow is recomputed. This process of updating terminal boundaries must be crossed to reach the sink. Initially, in each
capacities and recomputing the min-cut/max-flow is repeated until block, flow is pushed exclusively to the sink. Then, flow is pushed
all duplicated nodes agree on their assignment. to the boundary nodes with distance labels less than 1, then less than
A limitation of the original dual decomposition approach is 2, etc., until no more flow can be pushed. The BK implementation
that convergence is not guaranteed. Furthermore, [103] and [89] used by Shekhovtsov and Hlaváč has been slightly modified to
have demonstrated that the risk of nonconvergence increases as the allow excess in the boundary nodes and for flow to be pushed from
graph is split into more blocks. To overcome this, Yu et al. [103] the boundary nodes out of the block (but not back).
The discharge operation is performed on all blocks in parallel.
1. Analogous to subgradients for convex functions [7]. Afterward, flow along boundary arcs is synchronized between
8

neighboring blocks. This may create additional excesses in some (2) The 4 super resolution [30, 88], 4 texture restoration [88],
blocks, since boundary nodes overlap with another block. The 2 deconvolution [88], 78 decision tree field (DTF) [82], and
discharge and synchronization process is repeated until no new 3 automatic labelling environment (ALE) [25, 67, 68, 69]
excesses are created, at which point the algorithm terminates. It is datasets from Verma’s and Batra’s survey [95].
proved in [89] that this process terminates in at most 2n2 iterations (3) New problems that use anisotropic MRFs [38] to segment
of discharge and synchronization when using PPR and 2n2B + 1 blood vessels in large voxel volumes from [87]. We include
when using AP, where nB is the total number of boundary nodes. 3 problems where the segmentation is applied directly to the
The guarantee of convergence, without having to merge blocks, image data and 3 to the output of a trained V-Net [79].
is beneficial, as it means that the algorithm can maintain a (4) New problems that use MRFs to clean 3D U-Net [20]
high degree of parallelism while computing the min-cut/max-flow segmentations of prostate images from [90]. We contribute 4
solution. However, because flow must be synchronized between benchmark problems.
blocks, the practical performance of the method still depends (5) New problems on mesh segmentation based on [76]. We
on well-chosen blocks and may be limited by synchronization contribute 8 benchmark problems. The original paper uses
overhead. For details on the heuristics used in the algorithm, which α-expansion and αβ -swaps [9, 11] to handle the multi-class
are also important for its practical performance, see [89]. segmentation problem. For our benchmarks, we instead use
QPBO to obtain the segmentation with a single min-cut, which
5 P ERFORMANCE C OMPARISON may lead to different results compared with the referenced
method.
We now compare the performance of the algorithms discussed
(6) New problems using the recent Deep LOGISMOS [42] to
in the previous sections. For all experiments, the source code
segment prostate images from [90]. We contribute 8 problems.
was compiled with the GCC C++ compiler version 9.2.0 with
(7) New problems performing multi-object image segmentation
-O3 optimizations on a 64-bit Linux-based operating system with
via surface fitting from two recent papers [53, 57]. We
kernel release 3.10. Experiments were run on a dual socket NUMA
contribute 9 problems using [53] and 8 using [57].
(Non-Uniform Memory Access) system with two Intel Xeon Gold
(8) New problems performing graph matching from the recent
6226R processors with 16 cores each and HTT (Hyper-Threading
paper [48]. The original matching problems can be found at
Technology) disabled, for a total of 32 parallel CPU threads. The
https://vislearn.github.io/libmpopt/iccv2021. For each match-
system has 756 GB of RAM, and for all experiments all data
ing several QPBO sub-problems are solved. We contribute the
were kept in memory. All resources were provided by the DTU
QPBO subproblems (300 per matching problem) for each of
Computing Center [23].
the 316 matching problems.
For all parallel benchmarks, we prefer local CPU core and
memory allocation. This means that for all parallel benchmarks with In total, our benchmark includes 495 problems covering a
up to 16 threads, all cores are allocated on the same CPU/NUMA variety of different computer vision applications. Note that some
node. If the data fits in the local memory of the active node, we datasets consist of many small sub-problems that must be run in se-
use this memory exclusively. If the data cannot fit in the local quence. Here, we report the accumulated times. All the benchmark
memory of one node, memory of both NUMA nodes is used. problems are available at: DOI:10.11583/DTU.17091101 [55].
For benchmarks with more than 16 threads, both CPUs and their For the parallel algorithm benchmarks, we only include a subset
memory pools are used. of all datasets. This is because parallelization is mainly of interest
Run time was measured as the minimum time over three for large problems with long solve times. For the block-based
runs and no other processes (apart from the OS) were running algorithms, we split the graph into blocks in one of the following
during the benchmarks. We split our measured run time into two ways: For graphs based on an underlying image grid, we define
distinct phases: build time and solve time. Build time refers to blocks by recursively splitting the image grid along its longest axis.
the construction of the graph and any additional data structures For the surface-based segmentation methods [53, 57], we define
used by an algorithm. If the algorithm performs arc packing or blocks such that nodes associated with a surface are in their own
similar steps, this is included in the build time. To ensure that block. For mesh segmentation, we compute the geodesic distance
the build time is a fair representation of the time used by a given between face centers and then use agglomerative clustering to
algorithm, we precompute a list of nodes and arcs and load these divide the nodes associated with each face into blocks. For bottom-
lists fully into memory before starting the timer. Solve time refers up merging, we use 64 blocks for the following dataset: the grid
to the time required to compute the min-cut/max-flow. For the graphs, the mesh segmentation, and the cells, foam, and simcells.
pseudoflow, PPR, and region discharge algorithms (c.f . Table 1), For the NT32 tomo data we use two blocks per object. For 4Dpipe
that only compute a minimum cut, we do not include the time to we use a block per 2D slice. For P-GridCut we use the same blocks
extract the full feasible maximum flow solution. The reason for this as for bottom-up merging. For dual decomposition and region
is that for most computer vision applications the minimum cut is discharge, we use one and two blocks per thread, respectively.
of principal interest. Furthermore, converting to a maximum flow
solution usually only adds a small overhead [95]. 5.2 Tested Implementations
All tested implementations (except GridCut [51]) are available at
5.1 Datasets https://github.com/patmjen/maxflow algorithms and are archived
We test the algorithms on the following benchmark datasets: at DOI:10.5281/zenodo.4903945 [54]. Beware that the implemen-
(1) The commonly used University of Waterloo [98] benchmarks tations are published under different licenses — some open and
problems. Specifically, we use 6 stereo [14, 64], 36 3D voxel some restrictive. See the links above for more information.
segmentation [9, 12, 10], 2 multi-view reconstruction [72, 13], In the following, typewriter font refers to a specific
and 1 surface fitting [71] problems. implementation of a given algorithm. We use this for BK and
9

EIBFS, where we test more that one implementation of each index.html3 . The original implementation can only handle grid
algorithm, e.g., BK refers to the algorithm, BK is the reference graphs with rectangular blocks, while our implementation can
implementation, and MBK is one of our implementations. handle arbitrary graphs and arbitrary blocks at the cost of some
BK [11] We test the reference implementation (BK) of additional overhead during graph construction. Our implementation
the Boykov-Kolmogorov algorithm from http://pub.ist.ac.at/∼vnk/ uses MBK as the base solver. Note that our implementation does not
software.html. Furthermore, we test our own implementation of implement the merging strategy proposed by [103] and, therefore,
BK (MBK), which contains several optimizations. Most notably, is not guaranteed to converge. We only include results for cases
our version uses indices instead of pointers to reduce the memory where the algorithm does converge.
footprint of the Node and Arc data structures. Finally, we test a GridCut [51, 52] We test both the serial and parallel versions
second version (MBK-R), which reorders arcs so that all outgoing of the highly optimized commercial GridCut implementation
arcs from a node are adjacent in memory. This increases cache from https://gridcut.com. The primary goal is to show how much
efficiency, but uses more memory (see Table 1) and requires an performance can be gained by using an implementation optimized
extra initialization step. The memory overhead from reordering for grid graphs. GridCut is only tested on problems with graph
could be reduced by ordering the arcs in-place; however, this may structures that are supported by the reference implementation, i.e.,
negatively impact performance. Therefore, we opt for the same 4- and 8-connected neighbor grids in 2D, and 6- and 26-connected
sorting strategy as EIBFS, where arcs are copied during reordering. (serial only) neighbor grids in 3D.
EIBFS [36] We test a slightly modified version [49] (EIBFS)
Table 1 lists the tested implementations along with their type
of the excesses incremental breadth first search algorithm originally
and memory footprint. Their memory footprint can be calculated
implemented by [36] available from https://github.com/sydbarrett/
based on the number of nodes and arcs in the graph and will be
AlphaPathMoves. This version uses slightly larger data structures
discussed further in Section 7.
to support non-integer arc capacities and larger graphs, compared
to the implementation tested in [36]. Although these changes
may slightly decrease performance, we think it is reasonable to TABLE 1: Summary of the tested implementations including
use the modified version, as several of the other algorithms have their memory footprint. The table shows the bytes required as a
made similar sacrifices in terms of performance. Additionally, function of the number of nodes, n, number of terminal arcs, mT ,
we test our own modified version of EIBFS (EIBFS-I), which and number of neighbor arcs, mN . We assume the common case
replaces pointers with indices to reduce the memory footprint. of 32-bit capacities and 32-bit indices, which is also what we use
Finally, since both EIBFS and EIBFS-I perform arc reordering for all of our experiments. Since HPF stores undirected arcs, we
during initialization, we also test a version without arc reordering give all sizes as undirected arcs, i.e., for implementations using
(EIBFS-I-NR) to better compare with other algorithms. directed arcs the size per arc reported here is doubled. Note that
HPF [45] We test the reference implementation of Hochbaum the numbers depend on, but are not the same as, the Node and
pseudoflow (HPF) from https://riot.ieor.berkeley.edu/Applications/ Arc structure sizes, as the footprint reported includes all stored
Pseudoflow/maxflow.html. This implementation has four different data (connectivity, capacity, and any auxiliary data).
configurations that we test:
Serial algorithms Algorithm type Memory footprint
1) Highest label with FIFO buckets (HPF-H-F).
2) Highest label with LIFO buckets (HPF-H-L). HI-PRa [19] Preflow push-relabel 40n + 40mT + 40mN
HPFb [45] Pseudoflow 104n + 48mT + 48mN
3) Lowest label with FIFO buckets (HPF-L-F). EIBFS [36] Pseudoflow
4) Lowest label with LIFO buckets (HPF-L-L). EIBFSc Pseudoflow 72n + 72mN
HI-PR [19] We test the implementation of the preflow EIBFS-I∗i Pseudoflow 29n + 50mN
EIBFS-I-NR∗i Pseudoflow 49n + 24mN
push-relabel algorithm from https://cmp.felk.cvut.cz/∼shekhovt/d BK [11] Augmenting path
maxflow/index.html2 . BKd Augmenting path 48n + 64mN
P-ARD [89] We test the implementation of parallel augment- MBK∗i Augmenting path 23n + 24mN
ing paths region discharge (P-ARD) from https://cmp.felk.cvut.cz/ MBK-R∗i Augmenting path 23n + 48mN
∼shekhovt/d maxflow/index.html. P-ARD is an example of the Parallel algorithms
region discharge approach. It uses BK as the base solver. Note that, P-PPRei [5] Parallel PPR 48n + 68mT + 68mN
as the implementation is designed for distributed computing, it Liu-Sun∗i [75] Ada. bot.-up merging† 25n + 24mN
makes use of disk storage during initialization, which increases the Strandmark-Kahl∗i [91] Dual decomposition† 29n + 24mN
build time. P-ARDa [89] Region discharge† 40n + 32mN
† Uses BK (augmenting path)
Liu-Sun [75] Since no public reference implementation is
∗ Implemented or updated by us:
available, we test our own implementation of the adaptive bottom-
https://github.com/patmjen/maxflow algorithms
up merging approach based on the paper by Liu and Sun [75]. Our i Assuming 32-bit indices
implementation uses MBK as the base solver. a https://cmp.felk.cvut.cz/∼shekhovt/d maxflow/index.html
b https://riot.ieor.berkeley.edu/Applications/Pseudoflow/maxflow.html
P-PPR [5] We test the implementation of a recent parallel
c https://github.com/sydbarrett/AlphaPathMoves
preflow push-relabel algorithm from https://github.com/niklasb/ d http://pub.ist.ac.at/∼vnk/software.html
pbbs-maxflow. e https://github.com/niklasb/pbbs-maxflow
Strandmark-Kahl [91] We test our own implementation of
the Strandmark-Kahl dual decomposition algorithm based on the
implementation at https://cmp.felk.cvut.cz/∼shekhovt/d maxflow/

2. Orignally from http://www.avglab.com/andrew/soft.html, but the link is no 3. Originally from https://www1.maths.lth.se/matematiklth/personal/petter/

longer available. cppmaxflow.php but the link is no longer available.
10

1.00 5.3.1 Algorithm Variants

Relative performance
0.75 The different variants of each algorithm are compared in Fig. 8,
0.50 which shows the relative performance of each implementation
0.25 compared to a chosen “reference” implementation. For the BK
algorithm, the BK implementation is used for reference, for the
0.00
BK EIBFS HPF HI-PR
EIBFS algorithm, the EIBFS implementation is used as a reference,
(a) Relative total times. and for HPF the HPF-H-F configuration is used as reference,
1.00 since it is the one recommended by the authors. As we are now
Relative performance

0.75 measuring relative to a specific implementation, rather than the

0.50
fastest implementation as in Fig. 7, it is possible to obtain a relative
performance score of more than one.
0.25
For the BK algorithm, both MBK and MBK-R overall perform
0.00 similarly or slightly better than BK, when measured on total
BK EIBFS HPF HI-PR
(b) Relative solve times. time. Looking at solve time, MBK-R shows a large speed-up over
the other variants. This clearly reflects the effect of arc packing
Fig. 7: Relative performance for the serial algorithms. For each (reordering the arcs), in that it typically decreases solve at the cost
dataset, the solve and total times for each algorithm were compared of increased build time. From Table 2, we see that BK is generally
to those of the fastest algorithm for that dataset and a relative time best for smaller problems where the smaller memory footprint of
was computed. This shows how often an algorithm was fastest the index-based variants is less of an advantage. However, the very
and, if it was not fastest, how much slower than the fastest it small difference in absolute time for these small problems will in
was. We oversample speed-ups from each problem family (c.f . many cases render the choice of algorithm irrelevant.
Table 2) so all groups have the same number of entries. This is to For the EIBFS variants, the index-based version (EIBFS-I)
avoid bias due to some problem groups having more entries than consistently outperforms the reference implementation with a me-
others. Finally, we overlay a random sample of the (oversampled) dian improvement of more than 20%. Meanwhile, EIBFS-I-NR
speed-ups as jittered points. performs worse than EIBFS on almost all datasets w.r.t. to solve
time, but better w.r.t. total time for the majority of the problems.
In some cases, it also outperforms EIBFS-I, again showing that
5.3 Serial Algorithms while arc packing generally significantly reduces the solve time,
the additional overhead is not always worth it.
The primary experimental results for the serial algorithms are listed For the HPF algorithm, HPF-H-L consistently performs the
in Table 2 and Fig. 7. Table 2 shows a representative subset of best, while HPF-L-F andHPF-L-L perform worse than the
the results, grouped by problem family, while Fig. 7 shows the reference HPF-H-F for the majority of datasets. However, for
distribution of the solve time and the total time for each algorithm some datasets HPF-L-F and HPF-L-L show large speed-ups
on each dataset relative to the fastest algorithm on the dataset. over the other variants. Table 2 reveals that the HPF-L variants
Thus, for a given dataset, a relative performance score of 0.5 means seem to be better for graph matching and ALE datasets.
that the algorithm used double the amount of time as the fastest
algorithm on that dataset. The distribution of these scores indicates
how well the different algorithms perform relative to each other. 5.4 Parallel Algorithms
From Fig. 7(b), we see that EIBFS and HPF outperform the Our benchmark results for the parallel algorithms are shown in
other two algorithms on the majority of the datasets in terms of Table 3, where we compare the build and solve time for each
solve time and total time, as the algorithms have most of their algorithm on each dataset. The table includes the number of
relative times close to 1. Looking at the median, EIBFS has a CPU threads used by each algorithm for the listed solve times.
slightly better relative solve time than HPF, while HPF is faster w.r.t. Furthermore, it includes the solve time of the best serial algorithm
total time. Furthermore, HPF has the best worst-case performance for each dataset for comparison. We focus on the solve time, as
for both solve and total time. However, despite its overall good that is what reveals how successfully the algorithms distribute
performance, HPF performs significantly worse on the oriented the work as more threads are added. Additionally, a lack of
MRF and U-Net cleaning datasets. The performance of BK varies optimization leads to very long build times for some of the parallel
significantly depending on the benchmark problem. Although it has implementations, especially P-PPR and P-ARD. Finally, some
a median relative total time of just over 0.5, its relative performance datasets are omitted for P-PPR due to run-time errors and for
is considerably more inconsistent than that of the three other Strandmark-Kahl due to excessive run time.
algorithms. It performs particularly poorly on the 4Dpipe datasets, From Table 3, it is clear that no algorithm is dominant, except P-
using over 6 hours on 4Dpipe small, which both HPF and EIBFS GridCut for 6-connected grid graphs. Every algorithm has datasets
completed in less than 30 seconds. For 4Dpipe large, BK was where it is the fastest and a serial algorithm often gives the best or
not able to find the solution within 45 hours. HI-PR generally close to the best performance. The parallel algorithms show their
has the worst performance but does have the fastest solve time strength for the large datasets with more than 1 M nodes where
for a few datasets. However, measured on total time, it almost significant performance improvements are found. Curiously, only
never manages a relative score of more than 0.5. It is worth noting P-ARD shows a significant speed-up for smaller problems.
that the distribution of relative times for all algorithms exhibits a The parallel benchmarks are also summarized in Fig. 9. All
bimodality. This indicates that all algorithms have datasets where algorithms have median speed-ups less than one. Liu-Sun generally
they are poorly suited compared to the others. We further investigate performs best, giving a speed-up for almost half of the dataset and
this in Section 6. having the largest maximum speed-up. P-PPR and P-ARD still
11

8.00 BK 16.00 HPF-H-F

2.00
Total time Total time
4.00 8.00
Solve time Solve time
Speed-up

Speed-up

Speed-up
1.00 4.00
2.00
2.00
0.50 EIBFS
1.00 1.00
Total time
Solve time 0.50
0.50 0.25
0.25
MBK MBK-R EIBFS-I EIBFS-I-NR HPF-H-L HPF-L-F HPF-L-L
(a) BK (b) EIBFS (c) HPF

Fig. 8: Performance comparison of serial algorithm variants. The solve time and total time is compared against the times for the
chosen reference algorithm for each dataset. The violin plots show a Gaussian kernel density estimate of the data and the horizontal bars
indicate — from top to bottom — the maximum, median, and minimum. The values were re-sampled as described in Fig. 7.

6 A LGORITHM S ELECTION
4 As the previous section shows, the performance of the individual
Speed-up

min-cut/max-flow algorithms depends on the problem to be solved,

2 i.e., the structure of the graph. Choosing the wrong algorithm may
significantly increase the run time. In this section, we investigate
0 strategies for selecting a min-cut/max-flow algorithm that maximize
-Kahl P-ARD
the expected performance given different levels of knowledge about
Liu-Sun P-PPR rk
Strandma the graph. To quantify the expected performance of a strategy,
we will use the relative performance (RP), which we compute
Fig. 9: Speed-up of the parallel algorithms relative to the best
as follows: 1. Use the strategy to select an algorithm for each
serial solve time for each dataset. The values were re-sampled as
dataset. 2. For each dataset, compute the relative performance of
described in Fig. 7.
the selected algorithm. For serial algorithms, this is the total time
of the selected algorithm divided by the total time of the fastest
10 20 algorithm for that dataset. For parallel algorithms, we use the solve
time. This score shows the expected performance of a given strategy
Speed-up

compared to choosing the fastest algorithm.

5 10
Scenario 1: No Graph Knowledge If one has no knowledge
of the graph to be solved, the best strategy is to choose the
overall best algorithm. Table 4 shows summary statistics for the
1 2 4 8 16 32 1 2 4 8 16 32 performance scores of each algorithm. To avoid bias in Fig. 8, we
Number of threads Number of threads oversample scores from each problem family so that they all have
(a) Liu-Sun (b) P-PPR the same number of samples.
For the serial algorithms, the best choice is by far GridCut if
4 it is applicable. It is almost always the fastest option and never
Speed-up

10 more than 36% slower than the best option. Otherwise, the best
2 option is HPF-H-L in which case the expected performance 64%
of the optimal. Another good option is EIBFS-I due to its high
0 0 mean and high minimum RP scores. All implementations, except
1 2 4 8 16 32 1 2 4 8 16 32 EIBFS, have a maximum RP of 1, meaning that they outperformed
Number of threads Number of threads all other implementations on at least one problem instance.
(c) Strandmark-Kahl (d) P-ARD For the parallel algorithms, GridCut again dominates when
applicable. Otherwise, the best parallel option is Liu-Sun which
Fig. 10: Speed-up of the the parallel algorithms compared to their is slightly better than P-PPR. Surprisingly, using the best serial
single-threaded performance. For each number of threads, the algorithm for a dataset is the overall best option, although we
distribution of the speed-ups over all datasets is shown. The values should note that comparing to the best serial algorithm gives some
were re-sampled as described in Fig. 7. advantage to the serial algorithms. If one compares to a single
serial algorithm, the parallel algorithms do give an improvement —
although the mean RP is only 1.8x higher in the best case.
provide good speed-ups for some datasets. Strandmark-Kahl comes Scenario 2: Known Problem Family If one knows from
off the worst, as it rarely beats the best serial algorithm. which problem family the graph to be solved comes, a good strategy
Finally, Fig. 10 shows the speed-up distribution of the parallel is to select the algorithm that performs well on that problem family.
algorithms compared to their single-threaded performance. Only This could, for example, be established beforehand by running a
P-PPR improves consistently as more threads are added. Liu-Sun set of benchmarks on example graphs.
and P-ARD only show consistent improvements when looking at Table 5 shows the best performing serial algorithm for each
the maximum speed-up, and for over half of the datasets they have problem family. Note that, as opposed to Table 2, we split graph
issues scaling beyond 12 threads. matching into sub-groups as papers use different energy functions
12

TABLE 2: Performance comparison of serial algorithms based on both their solve and total (build + solve) times. We show a
representative subset of the datasets, which have been grouped according to their problem family. For each problem family we only show
the fastest variant of each algorithm measured in total time. The fastest solve time for each dataset has been underlined and the fastest
total time has been marked with bold face. Datasets which contain many sub-problems are marked with (s).

Dataset Nodes Arcs Solve Total Solve Total Solve Total Solve Total Solve Total
3D segmentation: voxel-based MBK-R [11] EIBFS-I [36] HPF-H-L [45] HI-PR [19] GridCut [52, 51]
adhead.n26c100 [9, 12, 10] 12 M 327 M 65.81 s 92.57 s 22.60 s 33.93 s 24.29 s 29.03 s 225.38 s 424.67 s 25.19 s 27.79 s
adhead.n6c100 [9, 12, 10] 12 M 75 M 23.88 s 28.03 s 13.23 s 15.85 s 14.13 s 15.87 s 59.65 s 102.84 s 6.98 s 7.31 s
babyface.n26c100 [9, 12, 10] 5M 131 M 82.29 s 92.87 s 30.13 s 34.74 s 54.47 s 56.71 s 183.60 s 228.09 s 53.21 s 54.11 s
babyface.n6c100 [9, 12, 10] 5M 30 M 7.78 s 9.44 s 5.56 s 6.61 s 11.56 s 12.24 s 57.28 s 69.66 s 2.88 s 3.00 s
bone.n26c100 [9, 12, 10] 7M 202 M 9.01 s 25.62 s 9.18 s 16.28 s 4.24 s 7.16 s 68.39 s 173.75 s 4.52 s 5.88 s
bone.n6c100 [9, 12, 10] 7M 46 M 4.09 s 6.65 s 2.74 s 4.35 s 2.30 s 3.36 s 23.66 s 46.71 s 0.91 s 1.12 s
bone subx.n6c100 [9, 12, 10] 3M 23 M 4.10 s 5.36 s 2.38 s 3.11 s 1.28 s 1.81 s 10.34 s 21.49 s 1.34 s 1.44 s
bone subx.n26c100 [9, 12, 10] 3M 101 M 7.70 s 15.78 s 4.74 s 8.23 s 2.14 s 3.61 s 25.51 s 75.15 s 3.69 s 4.45 s
liver.n26c100 [9, 12, 10] 4M 108 M 11.78 s 20.41 s 10.49 s 14.20 s 5.72 s 6.50 s 71.88 s 131.00 s 5.62 s 6.21 s
liver.n6c100 [9, 12, 10] 4M 25 M 10.08 s 11.40 s 5.82 s 6.57 s 5.70 s 6.24 s 30.49 s 42.71 s 3.87 s 3.99 s
3D segmentation: oriented MRF MBK [11] EIBFS-I-NR [36] HPF-H-L [45] HI-PR [19] GridCut [52, 51]
vessel.orimrf.256 [11, 38, 87] 16 M 66 M 1.84 s 2.95 s 1.13 s 2.03 s 3.19 s 6.80 s 4.11 s 30.99 s 0.40 s 1.04 s
vessel.orimrf.512 [11, 38, 87] 134 M 536 M 12.44 s 21.40 s 7.95 s 15.39 s 25.29 s 55.32 s 32.16 s 321.75 s 2.43 s 7.73 s
vessel.orimrf.900 [11, 38, 87] 688 M 2B 75.23 s 121.82 s 48.13 s 88.09 s 147.22 s 300.79 s 177.38 s 1774.65 s 15.97 s 44.70 s
3D U-Net segmentation cleaning MBK [11] EIBFS-I-NR [36] HPF-H-L [45] HI-PR [19] GridCut [52, 51]
clean.orimrf.256 [11, 38, 87] 16 M 66 M 0.97 s 2.09 s 0.69 s 1.61 s 3.21 s 6.89 s 3.93 s 30.83 s 0.13 s 0.77 s
clean.orimrf.512 [11, 38, 87] 134 M 536 M 7.87 s 17.03 s 5.51 s 13.51 s 27.10 s 58.22 s 31.40 s 320.87 s 0.91 s 6.27 s
clean.orimrf.900 [11, 38, 87] 688 M 2B 35.83 s 81.92 s 25.96 s 64.22 s 130.22 s 280.73 s 163.88 s 1755.87 s 3.90 s 31.43 s
unet mrfclean 2 [11] 8 M 32 M 0.47 s 1.01 s 0.29 s 0.74 s 3.55 s 5.36 s 9.80 s 22.82 s 62 ms 0.36 s
unet mrfclean 3 [11] 15 M 63 M 0.82 s 1.88 s 0.52 s 1.37 s 5.68 s 9.14 s 20.59 s 46.69 s 0.11 s 0.68 s
unet mrfclean 8 [11] 4 M 19 M 0.48 s 0.81 s 0.24 s 0.50 s 2.39 s 3.46 s 6.55 s 13.89 s 0.11 s 0.28 s
Surface fitting MBK [11] EIBFS-I [36] HPF-H-L [45] HI-PR [19] GridCut [52, 51]
LB07-bunny-lrg [71] 49 M 300 M 15.40 s 21.17 s 6.38 s 15.25 s 21.87 s 32.13 s 610.24 s 820.64 s 2.36 s 3.75 s
3D segmentation: sparse layered graphs (SLG) MBK-R [11] EIBFS-I [36] HPF-H-L [45] HI-PR [19] GridCut [52, 51]
4Dpipe small [57] 14 M 124 M 6.03 h 6.03 h 2.06 s 15.55 s 17.91 s 28.85 s 202.49 s 266.01 s - -
4Dpipe big [57] 143 M 1B - - 20.59 s 195.41 s 222.09 s 332.06 s 2611.65 s 3436.43 s - -
NT32 tomo3 .raw 3 [57] 7 M 49 M 15.42 s 18.69 s 24.22 s 27.19 s 15.87 s 18.33 s 176.11 s 200.29 s - -
NT32 tomo3 .raw 10 [57] 22 M 154 M 52.86 s 63.15 s 50.82 s 60.01 s 36.46 s 44.14 s 645.33 s 741.98 s - -
NT32 tomo3 .raw 30 [57] 67 M 462 M 145.23 s 176.37 s 194.79 s 221.90 s 179.82 s 202.73 s 2939.04 s 3260.63 s - -
NT32 tomo3 .raw 100 [57] 183 M 1 B 778.39 s 860.71 s 553.50 s 627.08 s 520.26 s 583.76 s 9732.34 s 2.95 h - -
3D segmentation: seperating surfaces MBK-R [11] EIBFS-I [36] HPF-H-L [45] HI-PR [19] GridCut [52, 51]
cells.sd3 [53] 13 M 126 M 48.23 s 59.03 s 35.24 s 40.66 s 15.52 s 21.84 s 98.25 s 167.56 s - -
foam.subset.r160.h210 [53] 15 M 205 M 6.05 s 22.02 s 3.21 s 12.52 s 17.14 s 26.18 s 15.16 s 145.58 s - -
foam.subset.r60.h210 [53] 1 M 24 M 0.62 s 2.58 s 0.39 s 1.49 s 1.98 s 3.01 s 1.85 s 12.82 s - -
simcells.sd3 [53] 3 M 27 M 9.93 s 12.10 s 2.94 s 4.12 s 3.23 s 4.60 s 21.57 s 33.89 s - -
Deep LOGISMOS MBK [11] EIBFS-I [36] HPF-H-F [45] HI-PR [19] GridCut [52, 51]
deeplogismos.2 [42] 511 K 4M 0.15 s 0.25 s 28 ms 0.21 s 0.12 s 0.31 s 0.16 s 1.29 s - -
deeplogismos.3 [42] 707 K 5M 0.18 s 0.31 s 41 ms 0.30 s 0.18 s 0.45 s 0.24 s 1.90 s - -
deeplogismos.7 [42] 989 K 7M 0.34 s 0.54 s 0.26 s 0.66 s 0.29 s 0.69 s 0.36 s 2.86 s - -
Super resolution BK [11] EIBFS-I [36] HPF-H-L [45] HI-PR [19] GridCut [52, 51]
super res-E1 [30, 88] 10 K 62 K 2 ms 2 ms 1 ms 2 ms 2 ms 3 ms 1 ms 7 ms - -
super res-E2 [30, 88] 10 K 103 K 4 ms 5 ms 2 ms 3 ms 2 ms 3 ms 2 ms 12 ms - -
super res-Paper1 [30, 88] 10 K 62 K 2 ms 3 ms 1 ms 2 ms 2 ms 3 ms 1 ms 7 ms - -
superres graph [30, 88] 43 K 742 K 62 ms 78 ms 10 ms 26 ms 7 ms 12 ms 19 ms 0.16 s - -
Texture MBK-R [11] EIBFS-I [36] HPF-H-L [45] HI-PR [19] GridCut [52, 51]
texture-Cremer [88] 44 K 783 K 1.54 s 1.58 s 0.35 s 0.37 s 0.17 s 0.19 s 42 ms 0.19 s - -
texture-OLD-D103 [88] 43 K 742 K 0.60 s 0.65 s 0.19 s 0.21 s 73 ms 92 ms 41 ms 0.19 s - -
texture-Paper1 [88] 43 K 742 K 0.65 s 0.69 s 0.19 s 0.21 s 76 ms 95 ms 36 ms 0.17 s - -
texture-Temp [88] 14 K 239 K 0.22 s 0.23 s 30 ms 34 ms 9 ms 15 ms 6 ms 32 ms - -
Automatic labelling envrionment (ALE) MBK-R [11] EIBFS-I-NR [36] HPF-L-L [45] HI-PR [19] GridCut [52, 51]
graph 1 (s) [68, 69, 26, 67] 185 K 5M 16.80 s 18.52 s 0.35 s 0.79 s 1.00 s 1.60 s 1.58 s 10.60 s - -
graph 2 (s) [68, 69, 26, 67] 175 K 3M 7.38 s 10.47 s 0.83 s 1.64 s 2.25 s 3.55 s 2.91 s 20.87 s - -
graph 3 (s) [68, 69, 26, 67] 179 K 7M 27.68 s 35.55 s 2.69 s 4.51 s 4.63 s 6.96 s 6.49 s 43.73 s - -
Multi-view MBK-R [11] EIBFS-I [36] HPF-H-L [45] HI-PR [19] GridCut [52, 51]
BL06-camel-lrg [13] 18 M 93 M 107.53 s 111.42 s 28.55 s 31.54 s 24.44 s 28.82 s 291.71 s 337.91 s - -
BL06-gargoyle-lrg [13] 17 M 86 M 238.08 s 241.65 s 33.76 s 36.57 s 26.51 s 30.61 s 208.27 s 251.10 s - -
13

TABLE 2: Continued

Dataset Nodes Arcs Solve Total Solve Total Solve Total Solve Total Solve Total
Deconvolution MBK-R [11] EIBFS-I [36] HPF-H-L [45] HI-PR [19] GridCut [52, 51]
graph3x3 [88] 2K 47 K 9 ms 11 ms 3 ms 3 ms 1 ms 1 ms 1 ms 5 ms - -
graph5x5 [88] 2K 139 K 62 ms 67 ms 6 ms 9 ms 3 ms 4 ms 2 ms 15 ms - -
Stereo 1 BK [11] EIBFS-I-NR [36] HPF-H-L [45] HI-PR [19] GridCut [52, 51]
BVZ-sawtooth (s) [14] 164 K 796 K 0.91 s 1.16 s 0.58 s 0.69 s 1.39 s 1.85 s 7.89 s 12.27 s - -
BVZ-tsukuba (s) [14] 110 K 513 K 0.49 s 0.58 s 0.35 s 0.41 s 0.66 s 0.84 s 4.69 s 6.64 s - -
BVZ-venus (s) [14] 166 K 795 K 1.72 s 2.03 s 1.30 s 1.44 s 1.94 s 2.46 s 15.00 s 20.11 s - -
Stereo 2 BK [11] EIBFS-I [36] HPF-H-L [45] HI-PR [19] GridCut [52, 51]
KZ2-sawtooth (s) [64] 294 K 1M 2.59 s 3.40 s 1.14 s 2.02 s 3.30 s 4.66 s 23.79 s 36.55 s - -
KZ2-tsukuba (s) [64] 199 K 1M 1.41 s 1.84 s 0.71 s 1.12 s 1.92 s 2.55 s 20.95 s 27.14 s - -
KZ2-venus (s) [64] 301 K 2M 3.98 s 4.89 s 2.18 s 3.16 s 4.70 s 6.21 s 41.63 s 55.60 s - -
Decision tree field (DTF) MBK-R [11] EIBFS-I [36] HPF-H-L [45] HI-PR [19] GridCut [52, 51]
printed graph1 [82] 20 K 1M 0.63 s 0.73 s 0.13 s 0.17 s 40 ms 51 ms 51 ms 0.25 s - -
printed graph16 [82] 11 K 683 K 0.24 s 0.29 s 44 ms 62 ms 16 ms 22 ms 25 ms 0.12 s - -
Graph matching: small BK [11] EIBFS-I-NR [36] HPF-L-L [45] HI-PR [19] GridCut [52, 51]
atlas1.dd (s) [58, 48] 1K 5K 37 ms 59 ms 34 ms 52 ms 21 ms 39 ms 23 ms 0.18 s - -
car1.dd (s) [25, 73, 48] 38 131 1 ms 1 ms 0 ms 1 ms 1 ms 1 ms 0 ms 3 ms - -
hassan1.dd (s) [1, 92, 48] 120 2K 17 ms 30 ms 5 ms 25 ms 2 ms 6 ms 4 ms 65 ms - -
matching1.dd (s) [66, 59, 48] 38 380 10 ms 12 ms 5 ms 8 ms 2 ms 4 ms 6 ms 14 ms - -
Graph matching: big MBK-R [11] EIBFS-I [36] HPF-H-L [45] HI-PR [19] GridCut [52, 51]
pair1.dd (s) [48] 1K 58 K 1.42 s 1.97 s 0.70 s 0.96 s 92 ms 0.13 s 0.82 s 1.68 s - -
Mesh segmentation MBK-R [11] EIBFS-I [36] HPF-H-F [45] HI-PR [19] GridCut [52, 51]
bunny.segment [76] 97 K 536 K 0.12 s 0.14 s 63 ms 75 ms 68 ms 91 ms 0.20 s 0.30 s - -
bunnybig.segment [76] 2M 13 M 1.01 s 1.59 s 0.62 s 1.23 s 1.43 s 2.12 s 4.99 s 9.91 s - -
candle.segment [76] 159 K 959 K 87 ms 0.13 s 49 ms 83 ms 0.11 s 0.15 s 0.29 s 0.53 s - -
candlebig.segment [76] 1M 5M 0.51 s 0.72 s 0.26 s 0.44 s 0.60 s 0.91 s 2.03 s 3.70 s - -
chair.segment [76] 305 K 1M 0.76 s 0.88 s 0.31 s 0.39 s 0.27 s 0.37 s 0.86 s 1.37 s - -
chairbig.segment [76] 3M 26 M 1.62 s 2.89 s 1.02 s 2.45 s 3.23 s 4.59 s 9.98 s 20.92 s - -
handbig.segment [76] 248 K 1M 0.15 s 0.19 s 71 ms 0.11 s 0.13 s 0.18 s 0.35 s 0.63 s - -
handsmall.segment [76] 15 K 69 K 4 ms 5 ms 2 ms 3 ms 4 ms 6 ms 10 ms 16 ms - -

for the matching. For all but four problem families, the best neighbor capacities, degrees, out degrees, and in degrees counts
algorithm achieves a mean relative performance of 95% or higher. only non-zero arcs. Note that these statistics can be computed
Furthermore, for most problem families, one algorithm is always efficiently during graph construction. We normalize all capacity
the best. This indicates that the problem family is a strong predictor statistics by the mean over all arc capacities. In total, our feature
of algorithm performance. The problem family where this strategy vector has 31 entries per graph. Fig. 11 shows a UMAP embedding
performs the worst is 3D segmentation with sparse layered graphs [78] of the feature vectors for all benchmark datasets. Similar
(SLG). Here, the mean RP is only 81%, which is likely due to the problem families cluster together, despite UMAP receiving no
large variation in graph size in this problem family. information on this. This suggests the feature vectors provide a
Table 6 shows the best performing parallel algorithm for each good description of the graphs.
problem family. For the 6-connected graphs, the parallel GridCut We train the decision tree using Scikit-learn [83] version 0.23.1.
algorithm is clearly superior, but otherwise, the different families We use Gini impurity as the split criterion and reduce the tree
appear to favor different algorithms. using minimal cost-complexity pruning [15]. The optimal amount
Scenario 3: Known Graph Finally, we consider a strategy of pruning is determined with 5-fold cross validation. We split
where the graph is known, but the problem family is not. Here, each problem family evenly into the folds (if it contains at least 5
our strategy is to train a simple decision tree to predict the best datasets). When fitting, each dataset is weighed by one over the
algorithm given a feature vector that describes the graph to be number of datasets in its problem family. When evaluating, we
solved. Although a single decision tree is not the strongest classifier, oversample the validation data, so that each problem family has the
it has the benefit of being easily interpretable. same number of entries. This indicates how well the decision tree
The first components of our feature vector consist of the number will perform with representative training data. We also perform an
of nodes, the number of terminal arcs, the number of neighbor additional evaluation where we hold out one problem family, fit on
arcs, and whether the graph is a grid graph. Then we include mean, the rest, and then evaluate on the held out family. This indicates
standard deviation, and standard deviation of non-zero values for how well the decision tree will perform for a problem family that it
a number of arc and node properties. For arc properties, we use: has not yet encountered. We use the mean RP as validation metric.
source, sink, terminal (source and sink combined), and neighbor We first train a decision tree for the serial algorithms; the
capacities. Finally, for node properties we use: sum of in-going result is shown in Fig. 12. It achieves a mean RP of 0.82 and 0.82
neighbor capacities, sum of out-going neighbor capacities, sum of in the two evaluations, respectively. This means that the tree is
14

TABLE 3: Performance of parallel algorithms based on build and solve times. We show a representative subset of the datasets grouped
according to their problem family. See Table 2 for the number of nodes and arcs. The algorithms were run with 1, 2, 4, 6, 8, 12, 16, 24,
and 32 threads. Only the best time is shown along with the thread count for that run. For comparison, the solve time for the fastest serial
algorithm is also included. All times are in seconds. The fastest solve time for each dataset has been marked with bold face

Liu-Sun [75] P-PPR [5] Strandmark-Kahl [91] P-ARD [89] P-GridCut [52, 51] Best serial
Dataset Build Best solve Build Best solve Build Best solve Build Best solve Build Best solve Algo. Solve
3D segmentation: voxel-based
adhead.n26c10 [9, 12, 10] 6.90 17.41 8T 42.71 14.37 12T 26.14 25.40 2T 78.20 35.17 4T - - - GridCut 13.21
adhead.n26c100 [9, 12, 10] 6.86 20.87 8T 41.79 8.41 32T 27.65 19.30 6T 75.37 42.91 4T - - - EIBFS 22.60
babyface.n26c100 [9, 12, 10] 2.89 72.26 32T 15.99 7.93 32T 9.15 40.86 4T 32.01 61.20 32T - - - EIBFS 30.13
bone.n26c100 [9, 12, 10] 4.36 3.68 32T 25.63 4.01 32T 16.38 5.04 8T 48.90 11.21 4T - - - HPF 3.48
bone subx.n26c100 [9, 12, 10] 2.30 4.04 16T 12.48 2.34 32T 8.03 4.43 8T 24.07 11.71 16T - - - HPF 2.14
liver.n26c10 [9, 12, 10] 2.37 10.79 6T 14.42 7.91 32T 0.89 3.49 1T 21.93 14.95 1T - - - GridCut 2.95
liver.n26c100 [9, 12, 10] 2.36 18.24 6T 13.94 5.45 32T 0.86 6.69 1T 27.07 25.74 24T - - - GridCut 5.62
liver.n6c100 [9, 12, 10] 0.53 7.62 6T 4.78 3.68 16T 0.51 7.24 1T 6.18 7.74 32T 0.11 2.70 6T GridCut 3.87
adhead.n6c100 [9, 12, 10] 1.59 11.17 8T 14.49 4.72 32T 2.52 7.17 4T 15.46 14.41 2T 0.31 3.83 4T GridCut 6.98
babyface.n6c10 [9, 12, 10] 0.66 2.70 32T 5.30 3.09 24T 0.73 3.55 1T 8.87 5.35 1T 0.12 0.88 32T GridCut 1.52
babyface.n6c100 [9, 12, 10] 0.66 5.34 32T 6.53 3.63 24T 0.98 5.24 4T 7.29 7.28 32T 0.12 1.66 16T GridCut 2.88
bone.n6c100 [9, 12, 10] 0.99 0.79 24T 8.30 2.66 32T 1.23 2.01 2T 11.18 2.08 4T 0.20 0.17 12T GridCut 0.91
3D segmentation: oriented MRF
vessel.orimrf.256 [11] 1.22 0.69 6T 14.67 1.69 32T 1.89 1.04 2T 16.24 0.82 32T 0.52 0.14 12T GridCut 0.40
vessel.orimrf.512 [11] 9.72 4.92 8T - - - 15.08 6.49 2T 100.95 5.54 16T 4.26 0.43 32T GridCut 2.43
vessel.orimrf.900 [11] 49.62 28.66 8T - - - 79.82 38.45 2T 599.09 24.02 32T 21.98 2.49 32T GridCut 15.97
3D U-Net segmentation cleaning
clean.orimrf.256 [11] 1.23 0.39 12T 16.50 2.48 24T 2.39 0.48 4T 15.96 0.84 8T 0.52 0.06 32T GridCut 0.13
clean.orimrf.512 [11] 9.73 2.45 16T - - - 18.26 3.81 4T 131.62 4.81 8T 4.37 0.27 32T GridCut 0.91
clean.orimrf.900 [11] 50.52 12.36 16T - - - 85.77 18.66 4T 578.09 20.17 16T 23.64 0.82 32T GridCut 3.90
unet mrfclean 3 [11] 1.15 0.27 32T - - - 2.15 0.45 4T 12.66 0.53 4T 0.46 0.04 32T GridCut 0.11
unet mrfclean 8 [11] 0.37 0.21 8T - - - 0.64 0.23 4T 4.01 0.28 2T 0.14 0.04 16T GridCut 0.11
Surface fitting
LB07-bunny-lrg [71] 6.14 1.86 16T 55.88 24.27 32T 7.73 4.14 4T 72.02 4.31 16T 1.24 0.32 24T GridCut 2.36
3D segmentation: sparse layered graphs (SLG)
4Dpipe small [57] 9.93 9.39 12T - - - - - - 47.53 421.60 24T - - - EIBFS 2.06
4Dpipe big [57] 122.43 86.18 16T - - - - - - 570.44 7485.11 4T - - - EIBFS 20.59
NT32 tomo3 .raw 10 [57] 4.99 18.58 12T 34.90 14.39 32T 22.02 85.49 1T 43.95 15.40 12T - - - HPF 36.46
NT32 tomo3 .raw 30 [57] 14.93 45.70 16T 111.70 59.81 32T 66.56 363.06 1T 132.41 36.13 32T - - - BK 145.23
NT32 tomo3 .raw 100 [57] 38.93 95.24 32T - - 1T 170.38 1189.78 1T 365.60 158.92 24T - - - HPF 498.94
3D segmentation: seperating surfaces
cells.sd3 [53] 4.18 10.33 16T 25.93 9.47 32T 23.90 76.40 1T 21.84 44.98 1T - - - HPF 15.52
foam.subset.r160.h210 [53] 5.91 8.59 32T 37.08 3.61 32T 52.11 17.24 1T 33.72 7.32 1T - - - EIBFS 3.21
simcells.sd3 [53] 0.69 2.28 16T 5.60 1.99 32T 1.49 2.82 32T 4.96 0.89 32T - - - EIBFS 2.94
Multi-view
BL06-camel-lrg [13] 3.99 57.41 8T - - - 1.87 75.56 1T 13.76 95.40 1T - - - HPF 24.44
BL06-gargoyle-lrg [13] 3.70 29.28 16T - - - 1.70 190.07 1T 12.20 102.31 2T - - - HPF 26.51
Mesh segmentation
bunnybig.segment [76] 0.30 0.37 12T 2.81 1.15 32T 1.12 1.89 1T 3.66 0.41 32T - - - EIBFS 0.62
chairbig.segment [76] 0.60 0.64 24T 6.10 1.76 32T 2.62 3.29 1T 6.89 0.57 32T - - - EIBFS 1.02
handbig.segment [76] 0.02 0.11 8T 0.25 0.25 16T 0.04 0.16 1T 0.40 0.11 32T - - - EIBFS 0.07

significantly better than naively choosing the overall best algorithm 7 D ISCUSSION
but not as good as knowing the best algorithm for a problem family. In this section, we discuss the most interesting findings from our
experiments.
Next, we train a decision tree for the parallel algorithms. We
include a category ‘Serial’, which means that choosing a serial
algorithm would be faster. For simplicity, we do not specify which 7.1 Serial Algorithms
serial algorithm to choose in this scenario. The result is shown in Our results clearly show that GridCut is superior to the other tested
Fig. 13. The decision tree achieves a mean RP of 0.56 and 0.57 in algorithms for min-cut/max-flow problems with fixed neighborhood
the two evaluations, respectively. Thus, the tree is slightly better grids. This is not surprising since GridCut has been designed and
than simply choosing the overall best algorithm. However, the best optimized specifically for this type of graph. However, as shown in
option is to choose the best algorithm for a given category. Table 2, the performance benefit of GridCut decreases significantly
15

TABLE 4: Summary of relative performance (RP) scores for TABLE 6: Relative performance (RP) scores for the best
each of the min-cut/max-flow algorithm variants. The best score parallel algorithm for each problem family. Since the parallel
(higher is better) in each column has been marked with bold face. GridCut implementation can only handle 6-connected graphs ‘3D
Results were oversampled as described in Fig. 7. We only include segmentation: voxel based’ has been split into two subgroups: 6-
results where the algorithm ran to completion. connected graphs and 26-connected graphs. If an algorithm did not
run to completion on a dataset we count the RP as 0.
Serial algorithms Mean RP ± Std. RP Min RP Max RP
EIBFS-I 0.59 ± 0.28 0.1309 1.00 Problem family Algorithm Mean RP
EIBFS-I-NR 0.56 ± 0.32 0.0535 1.00 3D segmentation: SLG [57] Liu-Sun 0.63
EIBFS 0.47 ± 0.23 0.1288 0.94 Multi-view [13] Serial 1.00
HI-PR 0.16 ± 0.17 0.0046 1.00 Surface fitting [71] P-GridCut 1.00
HPF-H-F 0.59 ± 0.33 0.0279 1.00 3D seg.: voxel-based [9, 12, 10] (26-conn.) Serial 0.86
HPF-H-L 0.64 ± 0.36 0.0393 1.00 3D seg.: voxel-based [9, 12, 10] (6-conn.) P-GridCut 1.00
HPF-L-F 0.49 ± 0.29 0.0313 1.00 Mesh segmentation [76] P-ARD 0.88
HPF-L-L 0.53 ± 0.31 0.0312 1.00 3D segmentation: sep. surfaces [53] P-PPR 0.74
MBK-R 0.27 ± 0.20 0.0006 1.00 3D MRF [11] P-GridCut 1.00
BK 0.27 ± 0.24 0.0005 1.00
MBK 0.28 ± 0.22 0.0005 1.00 Mean ± std. 0.89 ± 0.14
GridCut∗ 0.99 ± 0.03 0.6419 1.00
Parallel algorithms
Liu-Sun 0.48 ± 0.30 0.0667 1.00
P-PPR 0.46 ± 0.38 0.0133 1.00
Strandmark-Kahl 0.23 ± 0.16 0.0667 0.85

UMAP-2
P-ARD 0.35 ±0.32 0.0028 1.00
P-GridCut ∗ 1.00 ± 0.00 1.0000 1.00
Best serial 0.59 ± 0.33 0.1365 1.00
∗ Only grid graphs included (6- and 26-conn. for serial, 6-conn for parallel).

TABLE 5: Relative performance (RP) scores for the best serial UMAP-1

algorithm variant for each problem family. Almost all problem 3D seg. : SLG Mesh segmentation DTF
Multi-view 3D seg. : sep. surfaces Deconvolution
families have one dominant algorithm.
Stereo 1 3D MRF Graph match. : small
Stereo 2 Deep LOGISMOS Graph match. : big
Problem family Algorithm Mean RP Surface fitting ALE Super resolution
3D segmentation: SLG [57] HPF-H-L 0.81 3D seg. : voxel-based
Multi-view [13] HPF-H-L 1.00
Surface fitting [71] GridCut 1.00 Fig. 11: UMAP embedding [78] of the extracted graph features.
3D segmentation: voxel-based [9, 12, 10] GridCut 0.98
Mesh segmentation [76] EIBFS-I 0.95 Each point correspond to a benchmark dataset and is colored
3D segmentation: sep. surfaces [53] EIBFS-I 0.92 according to its problem family. When a benchmark consists of
3D MRF [11] GridCut 1.00 multiple sub-problem we use the mean feature vector. Notice that
Deep LOGISMOS [42] EIBFS-I-NR 0.96
Deconvolution [88] HPF-H-L 0.96
points from the same problem family tend to cluster together.
DTF [82] HPF-H-L 1.00
Super resolution [30, 88] EIBFS-I 0.87
Stereo 1 [14] EIBFS-I 0.99 Grid graph
Stereo 2 [64] EIBFS-I 1.00 No Yes
ALE [68, 69, 26] EIBFS-I-NR 1.00 Sink cap. std. ≤ 2.151 GridCut
Graph matching: small [58, 48] HPF-L-L 1.00 No Yes
Graph matching: small [25, 73, 48] EIBFS-I-NR 0.91
Sink num. nonzero ≤ 43630 Num. arcs ≤ 29289
Graph matching: small [1, 92, 48] HPF-L-F 1.00
Graph matching: small [94, 16, 48] HPF-L-L 1.00 No Yes No Yes
Graph matching: small [66, 59, 48] HPF-L-F 1.00 EIBFS-I-NR EIBFS-I HPF-L-L HPF-H-L
Graph matching: big [48] HPF-H-L 1.00
Mean ± std. 0.97 ± 0.05 Fig. 12: Decision tree trained to select the best serial algorithm.
Note that capacity statistics are normalized, c.f . Section 6.

when moving from 6-connected to 26-connected graphs. Actually,

both EIBFS and HPF manage to beat GridCut on a couple of the 32-bit pointer for most datasets, which may also provide slightly
26-connected problems. This indicates that the benefit of using improved performance. Finally, the hardware used in [36] may
GridCut significantly decreases for graphs with high connectivity, have different performance characteristics than ours. We observed
perhaps because it is an AP algorithm. that EIBFS actually performed better on an older system than
In general, the pseudoflow algorithms have the best overall on the one we used for our experiments. We speculate if this
performance. Measured on solve time, EIBFS performs the best, could be due to the lower cache and memory latency (estimated
which aligns with existing literature [36]. However, looking at using Intel Memory Latency Checker v3.9a) on the older system
the total time, HPF performs better overall, slightly contradicting compared to the one used for our benchmarks. In any case, this
previous benchmarks [36]. The reason for the difference between raises the question whether HPF implemented with arc packing
the results may be that [36] compared EIBFS with HPF-H-F, could outperform EIBFS on even more problems.
which we show to be inferior to HPF-H-L. Also, in [36] they use Our results also show that the performance of the different
16

Grid graph and low-degree graphs, a serial algorithm or GridCut performs

No Yes best. Otherwise, the choice comes down to graph size, with P-
In deg. nonzero std. ≤ 1.531 In deg. nonzero std. ≤ 1.59
ARD doing better for the smaller graphs, P-PPR being faster for
No Yes No Yes
Serial Mean degree ≤ 19.804 P-GridCut Serial
the medium-sized ones, and Liu-Sun performing the best for the
No Yes
largest graphs. However, as Table 6 shows, the best strategy is
No. nonzero nbor. arcs ≤ 28325491 Serial again to test on a number of graphs from the problem family at
No Yes hand.
P-ARD In cap. no. nonzero ≤ 45024258
No Yes
P-PPR Liu-Sun 8 C ONCLUSIONS AND P ERSPECTIVES
We now summarize our findings for the serial and parallel
Fig. 13: Decision tree trained to select the best parallel algorithm. algorithms tested in this work. We also provide perspectives on
‘Serial’ means a serial algorithm would be the best option. Note possible future developments of min-cut/max-flow methods, as
that capacity statistics are normalized, c.f . Section 6. well as how these may fit into the future of computer vision.

algorithm variants varies, and the choice of variant can significantly 8.1 Serial Algorithms
affect the run time. Optimizing for cache efficiency seems to be For the serial min-cut/max-flow algorithms, we have tested a
of particular importance, since optimizations such as arc packing total of 12 different variants across five of the fastest and most
and smaller data structures have large effects on the solve times for popular algorithms: PPR, BK, EIBFS, HPF, and GridCut. These
both BK and EIBFS. include representatives for the three families of min-cut/max-flow
As shown in Section 6, for non-grid problems, the best algorithms: augmenting paths, push-relabel, and pseudoflow.
algorithm most often comes down to a choice between EIBFS Our results clearly show that, for simple grid graphs, GridCut
or HPF. From Fig. 12 it seems that HPF is faster when the sink has the best performance. In most other cases, the two pseudoflow
(or, more likely, terminal) arc capacities vary a lot. As expected, algorithms, EIBFS and HPF, are significantly faster than the other
EIBFS-I-NR is preferred for small graphs, while the preferred algorithms and thus should be the first choice for anyone looking
HPF variant for small graphs appears to be HPF-L-L, which aligns for a fast serial min-cut/max-flow algorithm for static computer
with the results in Table 5. However, the best strategy is to test vision problems. For dynamic problems, we refer to [36].
several algorithms on a set of problems from the family at hand. Contrary to existing literature, we recommend the HPF algo-
rithm in the H-LIFO configuration as the default, since it has
7.2 Parallel Algorithms the best overall performance. However, the EIBFS algorithm
(EIBFS-I implementation) is a very close contender and can
P-GridCut provides the best performance of the parallel algorithms easily replace HPF with little impact on performance — and indeed
for 6-connected grid graph problems and scales well with many may perform better on some problem families. If memory usage is
threads. Of the other parallel algorithms, Liu-Sun is overall the best, of chief concern, the MBK and EIBFS-I-NR implementations are
closely followed by P-PPR, which aligns with previous results [75] both good options, as they use significantly less memory than the
and expectations [89, 91]. However, all the block-based algorithms reference EIBFS and HPF implementations.
only scale well for large graphs. For small to medium problems, Furthermore, we think significant performance improvements
they do not scale to many threads, but seem to peak at 8-12 may be gained from further improving the algorithm implementa-
threads, c.f . Fig. 10. This also means that choosing an optimal tions — especially with a focus on memory use and cache efficiency.
thread count may be difficult. Only P-PPR scaled consistently with In particular, faster and more memory efficient methods for arc (and
up to 32 threads. In addition, all parallel algorithms were often node) packing could result in significant benefits, since the extra
outperformed by a serial algorithm except on large graphs. In fact, initialization step incurs a large memory and run time overhead.
as Table 4 shows, selecting a good serial algorithm has better We would like to see a reimplementation of HPF with a half-arc
expected performance than selecting any of the parallel algorithms. data structure and arc packing.
For practical use, only the Liu-Sun, P-PPR, and P-ARD Finally, we found significant gains through automatic algorithm
algorithms seem to be relevant as is. However, the block-based selection. Based on our results, it seems likely that one could train
algorithms have the additional challenge of dividing the graph a robust classifier for selecting the appropriate algorithm based on
into blocks — the result of which significantly affects the run the min-cut/max-flow problem to be solved. By selecting the right
time of the algorithms. This was also shown in [89], where it algorithm for the job, run time could in many cases be significantly
was noticed that the multiview problems would scale better with reduced without the need for new algorithms or implementations.
more processors when partitioned on vertex numbers vs. the grid. In general, we find it unlikely that a single algorithm will ever be
While the graphs tested in this work have a natural way to be split, dominant for all types of graphs.
this may not always be the case. Meanwhile, even though this
problem is avoided with P-PPR, it does not perform as well as the
block-based algorithms overall, as shown in Fig. 9. 8.2 Parallel Algorithms
Finally, while all parallel algorithms had datasets where they We tested five different parallel algorithms for min-cut/max-flow
were best, selecting the best parallel algorithm is difficult (except problems: parallel PPR (P-PPR), adaptive bottom-up merging (Liu-
for 6-connected grid graphs). No algorithm showed dominant Sun), dual decomposition (Strandmark-Kahl), region discharge
performance — neither globally nor per problem family. Further- (P-ARD), and parallel GridCut (P-GridCut).
more, using the decision tree only gives a small improvement over If the graph is a simple grid, P-GridCut significantly outper-
selecting the best overall algorithm. Fig. 13 indicates that for grid forms all other algorithms. For other graphs, we found adaptive
17

bottom-up merging, as proposed by Liu and Sun [75], to be the Finally, as mentioned in [80], there is agreement that the
best overall parallel approach. However, each parallel algorithm performance of deep learning-based segmentation methods has
had an area in which it was the best, and it is difficult to predict started to plateau, and investigating how to integrate CNNs with
the best parallel algorithm for a graph (except for 6-connected grid ‘classical’ approaches should be pursued. Already, combinations
graphs). with active contours have shown promising results [41, 85, 99]
Of the parallel algorithms, only P-GridCut and P-PPR improved and a combination of CNNs and min-cut/max-flow methods could
consistently with more threads. All block-based algorithms failed lead to new advances. As deep learning involves repeated forward
to scale beyond 12 threads, except on large graphs. Furthermore, ex- and backward passes through a model, it is crucial that the min-
cept for P-GridCut, all parallel algorithms were often outperformed cut/max-flow algorithms are fast and efficient. While not the focus
by a serial algorithm, and consistent improvements over serial of this work, this is also an area where dynamic min-cut/max-flow
algorithms were obtained only for large graphs. These issues reveal algorithms can be of great importance, as they are effective at
a major deficiency in the state of current parallel min-cut/max-flow handling repeated solves of graphs where capacities do not change
algorithms and deserve further study. While providing good scaling drastically between successive solves.
on any type of graph may be unreachable as min-cut/max-flow is
P-complete and therefore hard to parallelize [39], computer vision
R EFERENCES
graphs often come with additional structure. Therefore, it seems
highly likely that further improvements in practical performance [1] Hassan Abu Alhaija et al. “Graphflow–6D large displace-
can be achieved. However, at this time, we only recommend using ment scene flow via graph matching”. In: German Confer-
a parallel algorithm for graphs with more than 5 M nodes or where ence on Pattern Recognition. Springer. 2015, pp. 285–296.
a serial algorithm uses at least 5 seconds. [2] Richard Anderson and Joao C. Setubal. “A parallel imple-
To improve the parallel min-cut/max-flow algorithms, one mentation of the push-relabel algorithm for the maximum
could try to replace BK, which is currently used in all the tested flow problem”. In: Journal of Parallel and Distributed
block-based parallel algorithms, with a pseudoflow algorithm. Computing 29.1 (1995), pp. 17–26.
However, this may not be trivial. In [56], results for a Liu-Sun [3] Chetan Arora et al. “An efficient graph cut algorithm for
implementation using EIBFS instead of BK showed a significant computer vision problems”. In: European Conference on
performance decrease compared to serial EIBFS. Still, given the Computer Vision. 2010, pp. 552–565.
superior performance of pseudoflow algorithms, this is an important [4] David A Bader and Vipin Sachdeva. A cache-aware parallel
area to investigate. Furthermore, parallelized graph construction implementation of the push-relabel network flow algorithm
is currently only available for P-GridCut. As the build time is and experimental evaluation of the gap relabeling heuristic.
a significant part of the total time, reducing build time will be Tech. rep. Georgia Institute of Technology, 2006.
important — especially as solve time decreases. [5] Niklas Baumstark, Guy Blelloch, and Julian Shun. “Effi-
Finally, choosing an optimal blocking strategy remains an open cient implementation of a synchronous parallel push-relabel
problem. Generally, when nodes correspond to spatial positions algorithm”. In: European Symposium on Algorithms. 2015,
(e.g., pixels or mesh vertices), we find that grouping based pp. 106–117.
on spatial distance works well. However, we recommend that [6] Endre Boros, Peter L Hammer, and Xiaorong Sun. Net-
practitioners experiment with different blocking strategies since it work flows and minimization of quadratic pseudo-Boolean
can significantly affect the performance. Furthermore, a general functions. Tech. rep. 17-1991, RUTCOR, 1991.
method that only considers the graph structure would be of high [7] Stephen Boyd and Lieven Vandenberghe. Convex optimiza-
interest, as this would also make the algorithms more accessible tion. Cambridge University Press, 2004.
to the average user. An alternative would be to focus on P-PPR [8] Y. Boykov, O. Veksler, and R. Zabih. “Fast approximate
algorithms that do not rely on blocking. Further improvements in energy minimization via graph cuts”. In: IEEE Transactions
these areas could also open the door to GPU-based implementations on Pattern Analysis and Machine Intelligence 23.11 (2001),
for solving general min-cut/max-flow problems. pp. 1222–1239.
[9] Yuri Y Boykov and M-P Jolly. “Interactive graph cuts for
optimal boundary & region segmentation of objects in ND
8.3 Min-Cut/Max-Flow in Modern Computer Vision images”. In: International Conference on Computer Vision.
It is no secret that the field of computer vision is currently Vol. 1. 2001, pp. 105–112.
dominated by deep learning. In this context, it is highly relevant to [10] Yuri Boykov and Gareth Funka-Lea. “Graph cuts and effi-
consider the future role of traditional computer vision tools, such cient ND image segmentation”. In: International Journal
as min-cut/max-flow algorithms. of Computer Vision 70 (2006), pp. 109–131.
For 3D images used in medical imaging and materials science [11] Yuri Boykov and Vladimir Kolmogorov. “An Experimental
research [77], it is common to have images where no relevant Comparison of Min-Cut/Max-Flow Algorithms for Energy
training data are available. Here, segmentation methods based on Minimization in Vision”. In: IEEE Transactions on Pattern
min-cut/max-flow continue to play an important role, as they work Analysis and Machine Intelligence 26.9 (2004), pp. 1124–
without training data and allow geometric prior knowledge to be 1137.
incorporated. Furthermore, while modern 3D images can already be [12] Yuri Boykov and Vladimir Kolmogorov. “Computing
very large (many GB per image), dynamic imaging (3D + time) with geodesics and minimal surfaces via graph cuts.” In: In-
high acquisition rates is now also possible [31, 81]. Computational ternational Conference on Computer Vision. Vol. 3. 2003,
efficiency is paramount to be able to process this ever increasing pp. 26–33.
amount of data, and for this, parallel min-cut/max-flow algorithms
could prove particularly useful.
18

[13] Yuri Boykov and Victor S Lempitsky. “From Photohulls [31] Francisco Garcı́a-Moreno et al. “Using X-ray tomoscopy
to Photoflux Optimization.” In: British Machine Vision to explore the dynamics of foaming metal”. In: Nature
Conference. Vol. 3. 2006, p. 27. communications 10.1 (2019), pp. 1–9.
[14] Yuri Boykov, Olga Veksler, and Ramin Zabih. “Markov [32] Andrew V Goldberg. “Processor-efficient implementation
random fields with efficient approximations”. In: IEEE of a maximum flow algorithm”. In: Information Processing
Conference on Computer Vision and Pattern Recognition. Letters 38.4 (1991), pp. 179–185.
1998, pp. 648–655. [33] Andrew V Goldberg. “The partial augment–relabel al-
[15] Leo Breiman et al. Classification and regression trees. gorithm for the maximum flow problem”. In: European
Routledge, 2017. Symposium on Algorithms. 2008, pp. 466–477.
[16] Tibério S Caetano et al. “Learning graph matching”. In: [34] Andrew V Goldberg. “Two-level push-relabel algorithm
IEEE Transactions on Pattern Analysis and Machine for the maximum flow problem”. In: International Confer-
Intelligence 31 (2009), pp. 1048–1058. ence on Algorithmic Applications in Management. 2009,
[17] Bala G Chandran and Dorit S Hochbaum. “A computational pp. 212–225.
study of the pseudoflow and push-relabel algorithms for [35] Andrew V Goldberg and Robert E Tarjan. “A new approach
the maximum flow problem”. In: Operations Research 57.2 to the maximum-flow problem”. In: Journal of the ACM
(2009), pp. 358–376. 35.4 (1988), pp. 921–940.
[18] Xinjian Chen and Lingjiao Pan. “A survey of graph [36] Andrew V Goldberg et al. “Faster and More Dynamic
cuts/graph search based medical image segmentation”. Maximum Flow by Incremental Breadth-First Search”. In:
In: IEEE Reviews in Biomedical Engineering (RBME) 11 European Symposium on Algorithms. 2015, pp. 619–630.
(2018), pp. 112–124. [37] Andrew V Goldberg et al. “Maximum Flows by Incre-
[19] Boris V Cherkassky and Andrew V Goldberg. “On imple- mental Breadth-First Search”. In: European Symposium on
menting the push—relabel method for the maximum flow Algorithms. 2011, pp. 457–468.
problem”. In: Algorithmica 19.4 (1997), pp. 390–410. [38] Vicente Grau, J Crawford Downs, and Claude F Bur-
[20] Özgün Çiçek et al. “3D U-Net: learning dense volumetric goyne. “Segmentation of trabeculated structures using
segmentation from sparse annotation”. In: International an anisotropic Markov random field: application to the
Conference on Medical Image Computing and Computer study of the optic nerve head in glaucoma”. In: IEEE
Assisted Intervention. 2016, pp. 424–432. Transactions on Medical Imaging 25 (2006), pp. 245–255.
[21] Thomas H Cormen et al. Introduction to algorithms. MIT [39] Raymond Greenlaw, H. James Hoover, and Walter L.
press, 2009. Ruzzo. Limits to parallel computation: P-completeness
[22] Andrew Delong and Yuri Boykov. “A scalable graph-cut theory. Oxford University Press on Demand, 1995.
algorithm for ND grids”. In: IEEE Conference on Computer [40] D. M. Greig, B. T. Porteous, and A. H. Seheult. “Exact
Vision and Pattern Recognition. 2008, pp. 1–8. Maximum A Posteriori Estimation for Binary Images”.
[23] DTU Computing Center. DTU Computing Center resources. In: Journal of the Royal Statistical Society. Series B
2021. DOI: 10.48714/DTU.HPC.0001. URL: https://doi.org/ (Methodological) 51.2 (1989), pp. 271–279.
10.48714/DTU.HPC.0001. [41] Lihong Guo et al. “Learned snakes for 3D image segmen-
[24] Jan Egger et al. “Nugget-cut: a segmentation scheme for tation”. In: Signal Processing 183 (2021), p. 108013.
spherically-and elliptically-shaped 3D objects”. In: Joint [42] Zhihui Guo et al. “Deep LOGISMOS: deep learning graph-
Pattern Recognition Symposium. 2010, pp. 373–382. based 3D segmentation of pancreatic tumors on CT scans”.
[25] M. Everingham et al. The PASCAL Visual Object Classes In: IEEE International Symposium on Biomedical Imaging.
Challenge 2007 (VOC2007) Results. http://www.pascal- 2018, pp. 1230–1233.
network.org/challenges/VOC/voc2007/workshop/index. [43] Felix Halim, Roland HC Yap, and Yongzheng Wu. “A
html. MapReduce-based maximum-flow algorithm for large
[26] M. Everingham et al. The PASCAL Visual Object Classes small-world network graphs”. In: International Conference
Challenge 2010 (VOC2010) Results. http://www.pascal- on Distributed Computing Systems. 2011, pp. 192–202.
network.org/challenges/VOC/voc2010/workshop/index. [44] Peter L Hammer, Pierre Hansen, and Bruno Simeone.
html. “Roof duality, complementation and persistency in quadratic
[27] B. Fishbain, Dorit S. Hochbaum, and Stefan Mueller. 0–1 optimization”. In: Mathematical Programming 28.2
“A competitive study of the pseudoflow algorithm for (1984), pp. 121–155.
the minimum s–t cut problem in vision applications”. [45] Dorit S. Hochbaum. “The pseudoflow algorithm: A new
In: Journal of Real-Time Image Processing 11.3 (2016), algorithm for the maximum-flow problem”. In: Operations
pp. 589–609. Research 56.4 (2008), pp. 992–1009.
[28] Lester Randolph Ford Jr and Delbert Ray Fulkerson. Flows [46] Dorit S. Hochbaum and James B. Orlin. “Simplifications
in networks. Princeton university press, 1962. and speedups of the pseudoflow algorithm”. In: Networks
[29] Daniel Freedman and Petros Drineas. “Energy minimiza- 61.1 (2013), pp. 40–57.
tion via graph cuts: Settling what is possible”. In: IEEE [47] Bo Hong and Zhengyu He. “An asynchronous multi-
Conference on Computer Vision and Pattern Recognition. threaded algorithm for the maximum network flow problem
2005, pp. 939–946. with nonblocking global relabeling heuristic”. In: IEEE
[30] William T Freeman, Egon C Pasztor, and Owen T Transactions on Parallel and Distributed Systems 22.6
Carmichael. “Learning low-level vision”. In: International (2010), pp. 1025–1033.
Journal of Computer Vision 40 (2000), pp. 25–47.
19

[48] Lisa Hutschenreiter et al. “Fusion Moves for Graph Match- In: International Conference on Computer Vision. Vol. 2.
ing”. In: International Conference on Computer Vision. 2001, pp. 508–515.
2021, pp. 6270–6279. [65] Vladimir Kolmogorov and Ramin Zabin. “What energy
[49] Hossam Isack et al. “Efficient optimization for functions can be minimized via graph cuts?” In: IEEE
hierarchically-structured interacting segments (HINTS)”. Transactions on Pattern Analysis and Machine Intelligence
In: IEEE Conference on Computer Vision and Pattern 26.2 (2004), pp. 147–159.
Recognition. 2017, pp. 1445–1453. [66] Nikos Komodakis and Nikos Paragios. “Beyond loose LP-
[50] Hiroshi Ishikawa. “Exact optimization for Markov random relaxations: Optimizing MRFs by repairing cycles”. In:
fields with convex priors”. In: IEEE Transactions on European Conference on Computer Vision. 2008, pp. 806–
Pattern Analysis and Machine Intelligence 25.10 (2003), 820.
pp. 1333–1336. [67] L’ubor Ladický and Philip HS Torr. The automatic labelling
[51] Ondřej Jamriška and Daniel Sýkora. GridCut. Version 1.3. environment. https://www.robots.ox.ac.uk/∼phst/ale.htm.
https://gridcut.com. Accessed 2020-06-12. 2015. Accessed 2021-11-24.
[52] Ondřej Jamriška, Daniel Sýkora, and Alexander Hornung. [68] L’ubor Ladický et al. “Associative hierarchical crfs for
“Cache-efficient Graph Cuts on Structured Grids”. In: IEEE object class image segmentation”. In: International Confer-
Conference on Computer Vision and Pattern Recognition. ence on Computer Vision. 2009, pp. 739–746.
2012, pp. 3673–3680. [69] L’ubor Ladický et al. “Graph cut based inference with
[53] Patrick M. Jensen, Anders B. Dahl, and Vedrana A. co-occurrence statistics”. In: European Conference on
Dahl. “Multi-Object Graph-Based Segmentation With Non- Computer Vision. 2010, pp. 239–253.
Overlapping Surfaces”. In: IEEE Conference on Computer [70] Kyungmoo Lee et al. “Multiresolution LOGISMOS graph
Vision and Pattern Recognition Workshops. 2020, pp. 976– search for automated choroidal layer segmentation of 3D
977. macular OCT scans”. In: Medical Imaging 2020: Image
[54] Patrick M. Jensen and Niels Jeppesen. Max-Flow/Min-Cut Processing. Vol. 11313. International Society for Optics
Algorithms. https : / / doi . org / 10 . 5281 / zenodo . 4903945. and Photonics. 2020, 113130B.
Accessed 2021-06-08. DOI: 10.5281/zenodo.4903945. [71] Victor Lempitsky and Yuri Boykov. “Global optimization
[55] Patrick M. Jensen et al. Min-Cut/Max-Flow Problem for shape fitting”. In: IEEE Conference on Computer Vision
Instances for Benchmarking. https://doi.org/10.11583/ and Pattern Recognition. 2007, pp. 1–8.
DTU.17091101. Accessed 2021-11-29. DOI: 10.11583/ [72] Victor Lempitsky, Yuri Boykov, and Denis Ivanov. “Ori-
DTU.17091101. ented visibility for multiview reconstruction”. In: European
[56] Niels Jeppesen et al. “Faster Multi-Object Segmentation Conference on Computer Vision. 2006, pp. 226–238.
Using Parallel Quadratic Pseudo-Boolean Optimization”. [73] Marius Leordeanu, Rahul Sukthankar, and Martial Hebert.
In: International Conference on Computer Vision. Oct. “Unsupervised learning for graph matching”. In: Interna-
2021, pp. 6260–6269. tional Journal of Computer Vision 96 (2012), pp. 28–45.
[57] Niels Jeppesen et al. “Sparse Layered Graphs for Multi- [74] Kang Li et al. “Optimal surface segmentation in volumetric
Object Segmentation”. In: IEEE Conference on Computer images-a graph-theoretic approach”. In: IEEE Transactions
Vision and Pattern Recognition. 2020, pp. 12777–12785. on Pattern Analysis and Machine Intelligence 28.1 (2005),
[58] Dagmar Kainmueller et al. “Active graph matching for pp. 119–134.
automatic joint segmentation and annotation of C. elegans”. [75] Jiangyu Liu and Jian Sun. “Parallel Graph-cuts by Adaptive
In: International Conference on Medical Image Computing Bottom-up Merging”. In: IEEE Conference on Computer
and Computer Assisted Intervention. 2014, pp. 81–88. Vision and Pattern Recognition. 2010, pp. 2181–2188.
[59] Jörg H Kappes et al. “A comparative study of modern infer- [76] Lei Liu et al. “Graph cut based mesh segmentation using
ence techniques for structured discrete energy minimization feature points and geodesic distance”. In: Proceedings of
problems”. In: International Journal of Computer Vision the International Conference on Cyberworlds (CW). 2015,
115 (2015), pp. 155–184. pp. 115–120.
[60] S Kashyap, H Zhang, and M Sonka. “Accurate Fully [77] Eric Maire and Philip John Withers. “Quantitative X-
Automated 4D Segmentation of Osteoarthritic Knee MRI”. ray tomography”. In: International materials reviews 59.1
In: Osteoarthritis and Cartilage 25 (2017), S227–S228. (2014), pp. 1–43.
[61] Anna Khoreva et al. “Simple does it: Weakly supervised [78] Leland McInnes, John Healy, and James Melville. “Umap:
instance and semantic segmentation”. In: IEEE Confer- Uniform manifold approximation and projection for dimen-
ence on Computer Vision and Pattern Recognition. 2017, sion reduction”. In: arXiv:1802.03426 (2018).
pp. 876–885. [79] Fausto Milletari, Nassir Navab, and Seyed-Ahmad Ahmadi.
[62] Pushmeet Kohli and Philip H.S. Torr. “Dynamic graph cuts “V-net: Fully convolutional neural networks for volumetric
for efficient inference in Markov random fields”. In: IEEE medical image segmentation”. In: International Conference
Transactions on Pattern Analysis and Machine Intelligence on 3D Vision. 2016, pp. 565–571.
29.12 (2007), pp. 2079–2088. [80] Shervin Minaee et al. “Image segmentation using deep
[63] Vladimir Kolmogorov and Carsten Rother. “Minimizing learning: A survey”. In: IEEE Transactions on Pattern
nonsubmodular functions with graph cuts-a review”. In: Analysis and Machine Intelligence (2021).
IEEE Transactions on Pattern Analysis and Machine [81] Rajmund Mokso et al. “GigaFRoST: the gigabit fast
Intelligence 29.7 (2007), pp. 1274–1279. readout system for tomography”. In: Journal of synchrotron
[64] Vladimir Kolmogorov and Ramin Zabih. “Computing radiation 24.6 (2017), pp. 1250–1259.
visual correspondence with occlusions using graph cuts”.
20

[82] Sebastian Nowozin et al. “Decision tree fields”. In: Inter- [100] Xiaodong Wu and Danny Z Chen. “Optimal net sur-
national Conference on Computer Vision. 2011, pp. 1668– face problems with applications”. In: International Collo-
1675. quium on Automata, Languages, and Programming. 2002,
[83] F. Pedregosa et al. “Scikit-learn: Machine Learning in pp. 1029–1042.
Python”. In: Journal of Machine Learning and Research [101] Yin Yin et al. “LOGISMOS—layered optimal graph image
12 (2011), pp. 2825–2830. segmentation of multiple objects and surfaces: cartilage
[84] Bo Peng, Lei Zhang, and David Zhang. “A survey of graph segmentation in the knee joint”. In: IEEE Transactions on
theoretical approaches to image segmentation”. In: Pattern Medical Imaging 29.12 (2010), pp. 2023–2037.
Recognition 46.3 (2013), pp. 1020–1038. [102] Miao Yu, Shuhan Shen, and Zhanyi Hu. “Dynamic Graph
[85] Sida Peng et al. “Deep snake for real-time instance Cuts in Parallel”. In: IEEE Transactions on Image Process-
segmentation”. In: IEEE Conference on Computer Vision ing 26.8 (2017).
and Pattern Recognition. 2020, pp. 8533–8542. [103] Miao Yu, Shuhan Shen, and Zhanyi Hu. “Dynamic Parallel
[86] Yi Peng et al. “JF-Cut: A parallel graph cut approach for and Distributed Graph Cuts”. In: IEEE Transactions on
large-scale image and video”. In: IEEE Transactions on Image Processing 25.12 (2015), pp. 5511–5525.
Image Processing 24.2 (2015), pp. 655–666.
[87] Marius Reichardt et al. “3D virtual Histopathology of
Cardiac Tissue from Covid-19 Patients based on Phase-
Contrast X-ray Tomography”. In: eLife. (2021). Patrick M. Jensen was born in Copenhagen,
[88] Carsten Rother et al. “Optimizing binary MRFs via ex- Denmark, in 1994. He received his B.Sc.Eng
degree in 2017 and M.Sc.Eng degree in 2019,
tended roof duality”. In: IEEE Conference on Computer
both in applied mathematics, at the Technical Uni-
Vision and Pattern Recognition. 2007, pp. 1–8. versity of Denmark (DTU), Kgs. Lyngby, Denmark.
[89] Alexander Shekhovtsov and Václav Hlaváč. “A distributed He is currently pursuing a Ph.D. in 3D image
mincut/maxflow algorithm combining path augmentation analysis at the Visual Computing group at the De-
partment of Applied Mathematics and Computer
and push-relabel”. In: International Journal of Computer Science, Technical University of Denmark. His
Vision 104.3 (2013), pp. 315–342. research interests lie in 3D image segmentation.
[90] Amber L Simpson et al. “A large annotated medical image
dataset for the development and evaluation of segmentation
algorithms”. In: arXiv:1902.09063 (2019).
[91] Petter Strandmark and Fredrik Kahl. “Parallel and Dis-
tributed Graph Cuts by Dual Decomposition”. In: IEEE Niels Jeppesen is an image analysis and ma-
chine learning specialist at FORCE Technology
Conference on Computer Vision and Pattern Recognition. with a Ph.D. degree in image analysis of 3D
2010, pp. 2085–2092. structures from the Department of Applied Mathe-
[92] Paul Swoboda et al. “A study of lagrangean decompositions matics and Computer Science, Technical Univer-
and dual ascent solvers for graph matching”. In: IEEE sity of Denmark (DTU), Kgs. Lyngby, Denmark.
His research interests lie in min-cut/max-flow
Conference on Computer Vision and Pattern Recognition. algorithms and quantitative analysis of structures
2017, pp. 1607–1616. in 3D images. He applies these methods for auto-
[93] Nima Tajbakhsh et al. “Embracing imperfect datasets: mated quality control of structures and materials,
in particular, in the wind turbine industry.
A review of deep learning solutions for medical image
segmentation”. In: Medical Image Analysis 63 (2020),
p. 101693.
[94] Lorenzo Torresani, Vladimir Kolmogorov, and Carsten
Anders Bjorholm Dahl is professor in 3D image
Rother. “Feature correspondence via graph matching: analysis, and a head of the Section for Visual
Models and global optimization”. In: European Conference Computing at the Department of Applied Mathe-
on Computer Vision. 2008, pp. 596–609. matics and Computer Science, Technical Univer-
sity of Denmark (DTU), Kgs. Lyngby, Denmark.
[95] Tanmay Verma and Dhruv Batra. “MaxFlow Revisited: An He is heading The Center for Quantification of
Empirical Comparison of Maxflow Algorithms for Dense Imaging Data from MAX IV, focusing on quan-
Vision Problems”. In: British Machine Vision Conference. titative analysis of 3D images. His research is
2012, pp. 1–12. focused on image segmentation and its applica-
tions.
[96] Vibhav Vineet and P J Narayanan. “CUDA cuts: Fast graph
cuts on the GPU”. In: IEEE Conference on Computer
Vision and Pattern Recognition Workshops. 2008, pp. 1–8.
[97] Yao Wang and Reinhard Beichel. “Graph-based segmen-
tation of lymph nodes in CT data”. In: International Vedrana Andersen Dahl is an associate profes-
Symposium on Visual Computing. 2010, pp. 312–321. sor at the Department of Applied Mathematics
[98] University of Waterloo. Max-flow problem instances in and Computer Science, Technical University of
Denmark (DTU), Kgs. Lyngby, Denmark. Her pri-
vision. https : / / vision . cs . uwaterloo . ca / data / maxflow. mary research interest is in the use of geometric
Accessed 2021-02-05. models for the analysis of volumetric data. This
[99] Udaranga Wickramasinghe et al. “Voxel2mesh: 3d mesh includes volumetric segmentation and methods
based on deformable meshes. She developed
model generation from volumetric data”. In: International analysis tools with applications in material sci-
Conference on Medical Image Computing and Computer ence, industrial inspection, and biomedicine.
Assisted Intervention. Springer. 2020, pp. 299–308.

Download full Foundations of Computer Science 4th Edition Behrouz Forouzan ebook all chapters
100% (5)
Download full Foundations of Computer Science 4th Edition Behrouz Forouzan ebook all chapters
66 pages
OpenSea Investment Memo (April 2018)
No ratings yet
OpenSea Investment Memo (April 2018)
3 pages
Practice Cisco NetRiders
No ratings yet
Practice Cisco NetRiders
8 pages
An Experimental Comparison of Min-CutMax-Flow Algorithms for Energy Minimization in Vision
No ratings yet
An Experimental Comparison of Min-CutMax-Flow Algorithms for Energy Minimization in Vision
34 pages
l4
No ratings yet
l4
11 pages
United States Patent: (21) Appl. N0.: 13/226,109 (Continued)
No ratings yet
United States Patent: (21) Appl. N0.: 13/226,109 (Continued)
17 pages
ICCV07 Tutorial Yuri
No ratings yet
ICCV07 Tutorial Yuri
36 pages
Ulti Region Segmentation Using Graph Cuts
No ratings yet
Ulti Region Segmentation Using Graph Cuts
8 pages
Msms 2
No ratings yet
Msms 2
19 pages
ECCV06 Tutorial PartI Yuri
No ratings yet
ECCV06 Tutorial PartI Yuri
64 pages
Image Segmentation Graph Cut
No ratings yet
Image Segmentation Graph Cut
35 pages
Interactive Image Segmentation: Mahesh Jagtap
No ratings yet
Interactive Image Segmentation: Mahesh Jagtap
43 pages
Hutten Loc Her
No ratings yet
Hutten Loc Her
9 pages
Graph Cut, Convex Relaxation and Continuous Max-Flow Problems) Tai Xue-Cheng, University of Bergen
No ratings yet
Graph Cut, Convex Relaxation and Continuous Max-Flow Problems) Tai Xue-Cheng, University of Bergen
115 pages
MAP Estimation Algorithms In: Computer Vision
No ratings yet
MAP Estimation Algorithms In: Computer Vision
134 pages
A Survey of Graph Theoretical Approaches To Image Segmentation
No ratings yet
A Survey of Graph Theoretical Approaches To Image Segmentation
45 pages
Computer Vision Graph Cuts: Exploring Graph Cuts in Computer Vision
From Everand
Computer Vision Graph Cuts: Exploring Graph Cuts in Computer Vision
Fouad Sabry
No ratings yet
Dynamic Programming and Graph Algorithms in Computer Vision
No ratings yet
Dynamic Programming and Graph Algorithms in Computer Vision
20 pages
Seminar Report
No ratings yet
Seminar Report
21 pages
Lecture12 - Graph-Based Segmentation
No ratings yet
Lecture12 - Graph-Based Segmentation
35 pages
Maxflow - Segmentation: 1 Image Restoration / Segmentation Project
No ratings yet
Maxflow - Segmentation: 1 Image Restoration / Segmentation Project
4 pages
Image Segmentation: Ross Whitaker SCI Institute, School of Computing University of Utah
No ratings yet
Image Segmentation: Ross Whitaker SCI Institute, School of Computing University of Utah
49 pages
What Energy Functions Can Be Minimized Via Graph Cuts?
No ratings yet
What Energy Functions Can Be Minimized Via Graph Cuts?
18 pages
Coding With OpenCV
No ratings yet
Coding With OpenCV
36 pages
Lec25-CS345
No ratings yet
Lec25-CS345
29 pages
Image Segmentation Algorithm by Piecewise Smooth Approximation
No ratings yet
Image Segmentation Algorithm by Piecewise Smooth Approximation
13 pages
Project Report On Image Segmentation
No ratings yet
Project Report On Image Segmentation
6 pages
TSIM - Graph Cuts
No ratings yet
TSIM - Graph Cuts
37 pages
Graph Cut Segment
No ratings yet
Graph Cut Segment
23 pages
CS131 Computer Vision: Foundations and Applications Practice Final (Solution) Stanford University December 11, 2017
No ratings yet
CS131 Computer Vision: Foundations and Applications Practice Final (Solution) Stanford University December 11, 2017
15 pages
T-pyramid of an Image-2
No ratings yet
T-pyramid of an Image-2
21 pages
Unit II - Chapter 5 - Segmentation
No ratings yet
Unit II - Chapter 5 - Segmentation
64 pages
A Review On Graph Based Segmentation
No ratings yet
A Review On Graph Based Segmentation
13 pages
L21 ImageSegmentation 2
No ratings yet
L21 ImageSegmentation 2
35 pages
CVT Assignment
No ratings yet
CVT Assignment
28 pages
An Iterative Dynamic Programming Approach To 2-D Phase Unwrapping
No ratings yet
An Iterative Dynamic Programming Approach To 2-D Phase Unwrapping
4 pages
International Journal of Computational Engineering Research (IJCER)
No ratings yet
International Journal of Computational Engineering Research (IJCER)
6 pages
Unit II
No ratings yet
Unit II
9 pages
Advanced Algorithms Course. Lecture Notes. Part 6: A Simplified Airline Scheduling Problem
No ratings yet
Advanced Algorithms Course. Lecture Notes. Part 6: A Simplified Airline Scheduling Problem
4 pages
Feature Description & Extraction: FAST (Features From Accelerated Segment Test)
No ratings yet
Feature Description & Extraction: FAST (Features From Accelerated Segment Test)
11 pages
Lesson 8-Image Segmentation - Traditional Approaches
No ratings yet
Lesson 8-Image Segmentation - Traditional Approaches
35 pages
Image_Approximation_from_Gray_Scale_Medial_Axes
No ratings yet
Image_Approximation_from_Gray_Scale_Medial_Axes
10 pages
What Energy Functions Can Be Minimized Using Graph Cuts?: Shai Bagon
No ratings yet
What Energy Functions Can Be Minimized Using Graph Cuts?: Shai Bagon
42 pages
Pavan K. Turaga, Anuj Srivastava (Eds.) - Riemannian Computing in Computer Vision-Springer International Publishing (2016)
No ratings yet
Pavan K. Turaga, Anuj Srivastava (Eds.) - Riemannian Computing in Computer Vision-Springer International Publishing (2016)
382 pages
Features 1 B
No ratings yet
Features 1 B
94 pages
Edge Based Segmentation
No ratings yet
Edge Based Segmentation
4 pages
A Comparative Study of Energy Minimization Methods For Markov Random Fields
No ratings yet
A Comparative Study of Energy Minimization Methods For Markov Random Fields
14 pages
Robotics Vision
No ratings yet
Robotics Vision
21 pages
Coursehero 34726625 PDF
No ratings yet
Coursehero 34726625 PDF
11 pages
Computer and Machine Vision 4e - Solution Manual
No ratings yet
Computer and Machine Vision 4e - Solution Manual
44 pages
Mivar NETs and logical inference with the linear complexity
From Everand
Mivar NETs and logical inference with the linear complexity
Varlamov, Oleg O.
No ratings yet
Lecture 4 Merged
No ratings yet
Lecture 4 Merged
11 pages
0eef690adf10b98bffd65d5415516412_lec18
No ratings yet
0eef690adf10b98bffd65d5415516412_lec18
8 pages
Project Report
No ratings yet
Project Report
62 pages
Iterated Graph Cuts For Image Segmentation
No ratings yet
Iterated Graph Cuts For Image Segmentation
10 pages
ETE-DIP Solution
No ratings yet
ETE-DIP Solution
15 pages
A Matching Algorithm For Content-Based Image Retrieval: Base, Query by Sketch, Matching, Similarity
No ratings yet
A Matching Algorithm For Content-Based Image Retrieval: Base, Query by Sketch, Matching, Similarity
6 pages
Two Types of Image Segmentation Exist:: Semantic Segmentation. Objects Shown in An Image Are Grouped Based On
No ratings yet
Two Types of Image Segmentation Exist:: Semantic Segmentation. Objects Shown in An Image Are Grouped Based On
25 pages
A Modified 2D Chain Code Algorithm For Object Segmentation and Contour Tracing
No ratings yet
A Modified 2D Chain Code Algorithm For Object Segmentation and Contour Tracing
10 pages
Be Summer 2022
No ratings yet
Be Summer 2022
2 pages
(Computing Supplement 11) Dr. K. Daniilidis (Auth.), Prof. Dr. W. Kropatsch, Prof. Dr. R. Klette, Prof. Dr. F. Solina, Prof. Dr. R. Albrecht (Eds.) - Theoretical Foundations of Computer Vision-Springe
No ratings yet
(Computing Supplement 11) Dr. K. Daniilidis (Auth.), Prof. Dr. W. Kropatsch, Prof. Dr. R. Klette, Prof. Dr. F. Solina, Prof. Dr. R. Albrecht (Eds.) - Theoretical Foundations of Computer Vision-Springe
259 pages
Novel Representations, Methods, and Algorithms in Computer Vision
No ratings yet
Novel Representations, Methods, and Algorithms in Computer Vision
1 page
Mesh Generation: Advances and Applications in Computer Vision Mesh Generation
From Everand
Mesh Generation: Advances and Applications in Computer Vision Mesh Generation
Fouad Sabry
No ratings yet
All Amms Are Cfmms. All Defi Markets Have Invariants
No ratings yet
All Amms Are Cfmms. All Defi Markets Have Invariants
24 pages
HR 1988
No ratings yet
HR 1988
164 pages
Account Abstraction (AA)
No ratings yet
Account Abstraction (AA)
14 pages
SSRN Id3705500
No ratings yet
SSRN Id3705500
85 pages
Ico Truebit Vitalik
No ratings yet
Ico Truebit Vitalik
21 pages
SuperRare Investment Memo (July 2020)
50% (2)
SuperRare Investment Memo (July 2020)
2 pages
Cis Reviewer
No ratings yet
Cis Reviewer
8 pages
DMs Guild Cover Image Guide - WORD
No ratings yet
DMs Guild Cover Image Guide - WORD
2 pages
TOPIC 2 - Collection Development Policy
No ratings yet
TOPIC 2 - Collection Development Policy
12 pages
2.1 - LP Modelling
No ratings yet
2.1 - LP Modelling
18 pages
Object Constraint Language For Modelling Computer Games
No ratings yet
Object Constraint Language For Modelling Computer Games
8 pages
Master Subnetting: 300+ Solved Subnetting QA Exercises
No ratings yet
Master Subnetting: 300+ Solved Subnetting QA Exercises
38 pages
Mendeley Quick Start Guide: UCL Library Services
No ratings yet
Mendeley Quick Start Guide: UCL Library Services
7 pages
Type Codes: Training Made by SEW-EURODRIVE
No ratings yet
Type Codes: Training Made by SEW-EURODRIVE
100 pages
A1-Registration Form (On-Campus) : Terms & Conditions
No ratings yet
A1-Registration Form (On-Campus) : Terms & Conditions
1 page
Digital Forensics Project - Proposal - Edited2
No ratings yet
Digital Forensics Project - Proposal - Edited2
11 pages
Areospace KETEMA
No ratings yet
Areospace KETEMA
2 pages
CS614 MIDTERM SOLVED MCQS by JUNAID
No ratings yet
CS614 MIDTERM SOLVED MCQS by JUNAID
39 pages
REAPER - Wikipedia
No ratings yet
REAPER - Wikipedia
18 pages
Case Paper Inventory Management
No ratings yet
Case Paper Inventory Management
28 pages
Red Team Course Content v1
No ratings yet
Red Team Course Content v1
14 pages
q1 Module 2 Emptech
No ratings yet
q1 Module 2 Emptech
8 pages
Working With Microsoft PowerPoint Objects
No ratings yet
Working With Microsoft PowerPoint Objects
4 pages
SRE Architect-NomiSo SG
No ratings yet
SRE Architect-NomiSo SG
4 pages
Master Alarm Service Manual 550731
No ratings yet
Master Alarm Service Manual 550731
24 pages
CSE707 - 449 DCS Resources
No ratings yet
CSE707 - 449 DCS Resources
8 pages
SystemVerilog-module5-Randomization
No ratings yet
SystemVerilog-module5-Randomization
43 pages
Dheeraj Yadav_Resume
No ratings yet
Dheeraj Yadav_Resume
2 pages
Valmet-IQ-Fiber-Orientation-Control-with-Slice-Actuators-Operator-Manual_DC115298_01
No ratings yet
Valmet-IQ-Fiber-Orientation-Control-with-Slice-Actuators-Operator-Manual_DC115298_01
22 pages
Dissertation Sur Le Theatre Comique Et Tragique
100% (3)
Dissertation Sur Le Theatre Comique Et Tragique
5 pages
58 02 Monitor Smoothness
No ratings yet
58 02 Monitor Smoothness
1 page
2. BCME Mech Part (123 Units)
No ratings yet
2. BCME Mech Part (123 Units)
84 pages
POINTERS in C Programming Language
No ratings yet
POINTERS in C Programming Language
31 pages
FTTX
No ratings yet
FTTX
44 pages

Review of Serial and Parallel Min-Cut/Max-Flow Algorithms For Computer Vision

Uploaded by

Review of Serial and Parallel Min-Cut/Max-Flow Algorithms For Computer Vision

Uploaded by

1

Review of Serial and Parallel Min-Cut/Max-Flow

M IN - CUT / MAX - FLOW algorithms are ubiquitous in computer

3.4.1 Data Structures and Data Types 3.4.3 Arc Packing

(a) (b) (c) (d)

Fig. 5: Illustration of the dual decomposition approach. Termi-

2. Orignally from http://www.avglab.com/andrew/soft.html, but the link is no 3. Originally from https://www1.maths.lth.se/matematiklth/personal/petter/

1.00 5.3.1 Algorithm Variants

0.75 measuring relative to a specific implementation, rather than the

8.00 BK 16.00 HPF-H-F

min-cut/max-flow algorithms depends on the problem to be solved,

compared to choosing the fastest algorithm.

when moving from 6-connected to 26-connected graphs. Actually,

Grid graph and low-degree graphs, a serial algorithm or GridCut performs

You might also like