0% found this document useful (0 votes)
90 views23 pages

On The Application of The Minimum Degree Algorithm To Finite Element Systems

ok
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
90 views23 pages

On The Application of The Minimum Degree Algorithm To Finite Element Systems

ok
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 23

SIAM J. NUMER. ANAL.

Vol. 15, No. 1, February 1978


Downloaded 12/27/12 to 150.135.135.70. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

ON THE APPLICATION OF THE MINIMUM DEGREE


ALGORITHM TO FINITE ELEMENT SYSTEMS*
ALAN GEORGE AND DAVID R. MclNTYRE"

Abstract. We describe an efficient implementation of the so-called minimum degree algorithm,


which experience has shown to produce efficient orderings for sparse positive definite systems. Our
algorithm is a modification of the original, tailored to finite element problems, and is shown to induce a
partitioning in a natural way. The partitioning is then refined so as to significantly reduce the number of
nonnull off-diagonal blocks. This refinement is important in practical terms because it reduces storage
overhead in our linear equation solver, which utilizes the ordering and partitioning produced by our
algorithm. Finally, we provide some numerical experiments comparing our ordering/solver package to
a more conventional band-oriented package.
1. Introduction. In this paper we consider the problem of directly solving the
linear equations
(1.1) Ax =b,
where A is a sparse N by N positive definite matrix arising in certain finite element
applications. We solve (1.1) using Cholesky’s method or symmetric Gaussian
r,
elimination by first factoring A into the product LL where L is lower triangular,
and then solving the triangular systems Ly b and L y. rx
It is well known that when a sparse matrix is factored using Cholesky’s
method, the matrix normally suffers fill; that is, the triangular factor L will
typically have nonzeros in some of the positions which are zero in A. Thus, with
the usual assumption that exact numerical cancellation does not occur, L + L r is
usually fuller than A.
For any N by N permutation matrix P, the matrix PAP r remains sparse and
positive definite, so Cholesky’s method still applies. Thus, we could instead solve
(1.2) (pApT)(px ) Pb.
In general, the permuted matrix PAP T fills in differently, and a judicious choice of
P can often drastically reduce fill. If zeros are exploited, this can in turn imply a
reduction in storage requirements and/or arithmetic requirements for the linear
equation solver. The permutation P may also be chosen to ease data management,
simplifying coding etc. In general, these varying desiderata conflict, and various
compromises must be made.
A heuristic algorithm which has been found to be very effective in finding
efficient orderings (in the low-fill and low-arithmetic sense) is the so-called
minimum degree algorithm [ 10]. The ordering algorithm we propose in thi paper
is a modification of this algorithm, changed in a number of ways to improve its
performance for finite element matrix problems, which we now characterize.
Let M be any mesh formed by subdividing a planar region R with boundary
OR by a number of lines, all of which terminate on a line or on OR. The mesh so
formed consists of a set of regions enclosed by lines, which we call elements. The
* Received by the editors May 7, 1976.

3G1. " Department of Computer Science, University of Waterloo, Waterloo, Ontario, Canada, N2L

90
APPLICATION OF THE MINIMUM DEGREE ALGORITHM 91
mesh M has a node at each vertex (a point of intersection of lines and/or OR), and
Downloaded 12/27/12 to 150.135.135.70. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

may also have nodes on the lines, on OR, and in the interiors of the elements. An
example of such a finite element mesh is given in Figure 1.1.

FIG. 1.1. An example of a finite elementmesh.

Now let M have N nodes, labeled from 1 to N, and associate a variable xi with
the ith node.
DEFINITION 1.1 [4]. A finite element system of equations associated with the
finite element mesh M is any N by N symmetric positive definite system Ax b
having the property that Ai # 0 if and only if x and xi are associated with nodes of
the same element. !-1
In one respect this definition is more general than required to characterize
matrix problems arising in actual finite element applications in two dimensions.
Usually M is restricted to consist of triangles or quadrilaterals, with adjacent
elements having a common side. However, just as in [4], we intend to associate
such meshes with matrices which arise when Gaussian elimination is applied to A,
and these matrices require meshes having a less restricted topology in order that
the correspondence be correct in the sense of Definition 1.1.
In a second respect, the above definition is not quite general enough to cover
many matrix problems which arise in finite element applications because more
than one variable is often associated with each node. However, the extension of
our ideas to this situation is immediate, so to simplify the presentation we assume
only one variable is associated with each node. (The code which produced the
numerical results of 5 works for this more general case with no changes.) Thus,
we make no distinction between nodes and variables in this paper.
An outline of the paper is as follows. In 2 we review two closely related
models for symmetric Gaussian elimination, and describe the basic minimum
degree algorithm. In 3 we show how the special structure of finite element matrix
problems can be exploited in the implementation of the minimum degree
algorithm. We also show how the algorithm induces a natural partitioning of the
matrix. In 4 we describe a refinement of the partitioning produced by our version
of the minimum degree algorithm which normally leads to a considerable reduc-
tion in the number of nonnull off-diagonal blocks of the partitioning. This
refinement is important since it reduces storage overhead when the matrix is
processed as a block matrix, with only the nonnull blocks being stored. Section 5
92 ALAN GEORGE AND DAVID R. McINTYRE

contains a brief description of the method of computer implementation of the


Downloaded 12/27/12 to 150.135.135.70. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

ideas of 3 and 4, along with some numerical results. Section 6 contains our
concluding remarks.
2. Models for the analysis of sparse symmetric elimination. Following
George [4] and Rose [10], we now review for completeness some basics of the
elimination process for sparse positive definite matrices. We begin with some
basic graph theory notions and define some quantities we need in subsequent
sections. Much of the notation and the correspondence between symmetric
Gaussian elimination and graph theory is due to the work of Parter [9] and
Rose 10].
An undirected graph G (X, E) consists of a finite nonempty set X of nodes
together with a set E of edges, which are unordered pairs of distinct nodes of X. A
graph G’= (X’, E’) is a subgraph of G (X, E) if X’ c X and E’ c E. The nodes x
and y of G are adjacent (connected) if {x, y} E. For Y X, the adjacent set of Y,
denoted by Adj (Y), is
Adj (Y)={xlx XYand :ly Y(x, y) E}.

When Y is a single node y, we write Adj (y) rather than Adj ({y}). The degree of a
node x, denoted by deg (x), is the number IAdj (x)l, where ]SI denotes the
cardinality of the finite set S. The incidence set of Y, Y c X, is denoted by Inc (Y)
and defined by
Inc (Y)=t{{x, Y}IY e Y and x e Adj (Y)}.
For a graph G (X, E) with IX[ N, an ordering (numbering, labeling) of G is
a bijective mapping a" {1, 2, , N} X. We denote the labeled graph and node
set by G and X" respectively.
A path in G is a sequence of distinct edges {Xo, Xx}, {Xl, x2},’’’, {X-x, x,}
where all nodes except possibly Xo and x, are distinct. If Xo x,, then the path is a
cycle. The distance between two nodes is the length of the shortest path joining
them. A graph G is connected if every pair of nodes is connected by at least one
path. If G is disconnected, it consists of two or more maximal connected
components.
We now establish a correspondence between graphs and matrices. Let A be
an N by N symmetric matrix. The labeled undirected graph corresponding to A is
denoted by G A (XA, E A), and is one for which XA is labeled as the rows of A,
and {x, x}eEAceAi 0, ij. The unlabeled graph corresponding to A is
simply Gawith its labels removed. Obviously, for any N by N permutation matrix
P I, the unlabeled graphs of A and PAP T are identical but the associated
labelings differ. Thus, finding a good ordering of A can be viewed as finding a good
labeling for the graph associated with A.
A symmetric matrix A is reducible if there exists a permutation matrix P such
that PAP T is block diagonal, with more than one diagonal block. This implies G A
is disconnected. In terms of solving linear equations, this means that solving
Ax b can be reduced to that of solving two or more smaller problems. Thus, in
this paper we assume A is irreducible, which means that the graph associated with
A is connected.
APPLICATION OF THE MINIMUM DEGREE ALGORITHM 93

Following Rose [10], we now make the connection between symmetric


Downloaded 12/27/12 to 150.135.135.70. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

Gaussian elimination applied to A, and the. corresponding graph transformations


on G A. Symmetric Gaussian elimination applied to A can be described by the
following equations.

A=Ao=Ho=Idl I"lJvlT]=[VI 01 I10


/)1

L1A IL’,
1/1 IN-

(2.1) AI=
[i 0] d.
v2
v r2
1212

v/4 I_ 0
L:A .Lf,

A,, =L
It is easy to verify that A LL wherer,
(2.2) L L (N- 1)I.

_
Consider now the labeled graph G A, with the labeling denoted by the
mapping a. The deficiency Def (x) is the set of all pairs of distinct nodes in Adj (x)
which are not themselves adjacent. Thus,
Def (x)= {{y, z }[y, z e Adj (x), y z, y Adj (z)}.
For a graph G (X, E) and a subset C X, the section graph G(C) is the subgraph
G (C, E (C)), where
E(C)={{x, y}l{x, y}eE, x e C, y e C}.
Given a vertex y of a graph G, define the y-elimination graph Gy by
Gy {X\{y}, E\Inc (y) U Def (y)}.
The sequence of elimination graphs G, G,. ,
Gr- is then defined by GI Gx
and Gi (Gi-),, 2, 3,. ,
N- 1.
The elimination graph G,, 0 < < N, is simply the graph associated with the
matrix HI; i.e., G G H’. We define Go G a, and note that G_ consists of a
single node. The recipe for obtaining G from Gi-x, which is to delete x and its
incident edges, and to then add edges so that Adj (x) is a clique, is due to Parter
[9]. A clique is a set of nodes all of which are adjacent to each other.
This graph model has many advantages for describing and analyzing sparse
matrix computations. However, except for rather small examples, it is not easy to
94 ALAN GEORGE AND DAVID R. McINTYRE

visualize; although Go may sometimes be planar, the G rapidly become non-


Downloaded 12/27/12 to 150.135.135.70. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

planar with increasing and become difficult to draw and interpret. For our class of
matrix problems, which are associated with planar mesh problems, we can define a
sequence of finite element meshes M Mo, M1, , Mn such that G can easily
be constructed from M, 0, 1, , N- 1.
Formally, a mesh M (X, S) is an ordered pair of sets, with X a (possibly
empty) set of nodes and $ a set of mesh lines, where each mesh line either joins
two nodes, is incident to only one node (forming a loop), or else forms a nodeless
loop. Let M0 (Xo, So) be the original finite element mesh M, with nodes of the
mesh forming the set Xo, and the lines joining the nodes comprising the set So. A
boundary mesh line is a member of S shared by only one element.
Starting with Mo, the mesh Mi (Xi, S), i= 1, 2,..., N, is obtained from
Mi-l (Xi-l, Si-1) by
(a) deleting node x and its nonboundary incident mesh lines,
(b) repeatedly deleting mesh lines incident to a node having degree equal to
one.
Here the degree of a node y in a mesh is the number of times mesh lines are
incident to y. When x is eliminated from M_I and xi has incident boundary lines,
these boundary lines are simply "fused" to form a new line of S, as depicted in the
transformations Ms --> M6 and M12 --> M13 in Fig. 2.1. The application of step (b) is
illustrated in the transformation M6 M7 and M --> M12.
We now describe how to obtain Gi from Mi. Since the node sets are identical,
we need only describe how to construct E. Since the sequence M is generated by
removing nodes and/or mesh lines, the meshes of the sequence are all planar,
having faces (elements) with nodes on their periphery and/or in their interior as
shown in Fig. 2.1. Also recall that by Definition 1.1, Go is a graph such that each
set of nodes Co_Xo associated with an element (interior to and/or on the
periphery of a face of M0) forms a clique. This construction is illustrated in Fig.
2.2.
The construction of G from M is essentially the same. The graph G is one
having the same node set Xi as M, and which has an edge set Ei such that each set
of nodes C X associated with the same mesh element forms a clique.
Using this mesh model, every numbering of M Mo determines a sequence
of meshes M, 1, 2, , N which precisely reflect the structure of the part of
the matrix remaining to be factored (Hi). The mesh Mr consists of a single
element, devoid of nodes, whose boundary is OR. Thus, symmetric Gaussian
elimination on finite element matrices can be viewed in terms transforming finite
element meshes, by a sequence of node and line removals, to a single elemett
As mentioned before, our mesh model has the advantage that the meshes
are planar and therefore easy to visualize and interpret. In addition, people
involved in the application of the finite element method are accustomed to
thinking in terms of elements, super-elements, substructures etc. [1], [13]. The
graph model, on the other hand, even if initially planar, rapidly becomes non-
planar as the elimination proceeds, and is less easy to visualize, it has the
advantage that there is some well established notation and terminology for the
description and manipulation of graphs. Our attitude is that the connection
between the models is so close that statements about one can immediately be
recast in terms of the other. We will therefore use both models in this paper,
APPLICATION OF THE MINIMUM DEGREE ALGORITHM 95

choosing the one which appears to transmit the information in the most lucid
Downloaded 12/27/12 to 150.135.135.70. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

manner.
We now describe the minimum degree algorithm using the graph theory
model. Let Go (Xo, Eo) be an unlabeled graph. The minimum degree algorithm
labels X (determines the mapping a) according to the following pseudo-ALGOL
program:
for i= 1, 2,...,N do
1) Find y Xi-1 such that deg (y)-<_ deg (x) for all x Xi-1.
2) Set a-l(y) and if < N form Gi from

3 7 6

13

10 14
16 11
M

I
-!
M M3 M4

Mlo

M13 M14 M15 M16


FIG. 2.1. Meshes M1 through MI6.
96 ALAN GEORGE AND DAVID R. McINTYRE
Downloaded 12/27/12 to 150.135.135.70. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

FIG. 2.2. A mesh Mo and corresponding graph Go.


Various strategies for breaking ties have been proposed, but in our application we
found they made little difference to the quality of the ordering produced. Thus, we
break ties arbitrarily.
Notice that the ordering algorithm requires knowledge about the current
state of the factorization; that is, the choice of the ith variables is a function of the
structure of the partially factored matrix. Thus, one could argue that the ordering
algorithm (i.e., the pivot selection) should be imbedded in the actual numerical
factorization code. (Of course, this is more or less essential when the pivot
selection is partially determined by numerical stability considerations.)
However, in our situation we have the option of isolating the ordering and the
factorization in separate modules, and we prefer to do so for the following
reasons:
1. If the ordering is done a priori, the factorization code can utilize a static
data structure, since the positions where fill will occur can be determined during
the ordering. If the ordering is imbedded in the factorization, the data structure
must be adaptive to accommodate fill as it occurs. By isolating the ordering and
factorization, data structures can be tailored specifically for each function.
2. In some engineering design applications, many problems with the same
matrix A, or with coefficient matrices having the same structure, must be solved.
In these situations, it makes sense to find an efficient ordering and set up the
appropriate data structures only once.
3. The minimum degree algorithm. From the graph model developed in 2,
we see that symmetric Gaussian elimination can be viewed as transforming GA by
a sequence of graph transformation rules to one having a single node and no
edges. Using the mesh model, we see that the algorithm can also be viewed as
transforming the original mesh according to some precise node and line removal
rules so that the final mesh Mv consists of a single nodeless element. We also
established a correspondence between the two models, so that Gi can be con-
structed from M, 0-< < N. In this section we show that for our finite element
APPLICATION OF THE MINIMUM DEGREE ALGORITHM 97

matrix problems, the local behavior of the minimum degree algorithm can be

_
Downloaded 12/27/12 to 150.135.135.70. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

precisely characterized. In addition, we show that the algorithm induces a natural


partitioning of the node set X, or equivalently, of the recordered matrix A. In the
next section we show how this ordering can be refined so that the triangular factor
L or A can be efficiently stored.
We begin with some definitions. For a graph G (X, E), let C X and let
G(C) (C, E(C)) be the corresponding section graph of G determined by C (see
2). The node x is an interior node of C if x C and Adj (x) C. If x C but
Adj (x)t C, then x is a boundary node of C.
As we have described before, nodes associated with a mesh element corres-
pond to a clique in the corresponding graph. Generally, interior nodes of a clique
in Gi correspond to interior nodes of an element in M. However, there are
exceptions. The nodes associated with a single element and which lie on OR are
interior nodes of the corresponding graph clique. Also, when all the nodes form a
clique, in the element model they could be interior nodes and/or boundary nodes.
Thus although the element model is very helpful in visualizing the changing
structure of the matrix during the decomposition, its meaning in terms of the
corresponding graph must be interpreted carefully near OR and for the last clique
of nodes.
In what follows, the sequence Gi, =0, 1,... ,N-1, will refer to the
elimination graph sequence generated by the minimum degree algorithm.
LEMMA 3.1. Let Cbe a clique in G(X, E), letx be an interior node o]:C, and let
y be a boundary node of C. Then deg (x)< deg (y).
The proof is trivial and is omitted.
LEMMA 3.2. Let C be a clique in G (X, E), 0 <=i < N, and let x be an
interior node of C. Then if x is eliminated, the degree of all nodes in C is reduced by
one, and the degrees of all other nodes in G remain the same.
Proof. Since Adj (x) C, the elimination of x cannot affect nodes y C. Thus
deg (y), y X/I\C, is the same as itwas in Gi. On the other hand, since x C and
C is a clique, the elimination of x causes no fill. Thus, E/I is obtained from E by
deleting Inc (x) from E, thereby reducing the degree of all nodes in C\{x} by
one.
THEOREM 3.3. Let C be a clique in G (Xi, E ), 0 <= < N, and let Q be the set
of interior nodes o1 C. Then if the minimum degree algorithm chooses x Q, at
all
the i-th step, then it numbers the remaining IQ, I- 1 nodes of Qi next.
Proof. Let deg (x) d, and note that deg (y) _-> d, y Q, with deg (y) > d t
y C\ Q. By Lemma 3.2, after x is eliminated, the degree of all nodes remaining

Thus, if Q/ Q\{x} ,
in C will be reduced by one, and nodes not in C will have their degrees unchanged.
then the node of minimum degree in G/I is in Q/I.
Repeatedly using Lemma 3.2, we conclude that the minimum degree algorithm
will choose nodes in Q until it is exhausted. I-1
COROLLARY 3.4. In terms of the mesh model Lemma 3.2 and Theorem 3.3
imply that nodes in the interior of an element, or those on the boundary o]’ an element
which forms part of OR and are not in any other element, will all be eliminated before
other nodes associated with that element.
THEOREM 3.5. Let C1 and C2 be two cliques in G, O<-i <N, with
98 ALAN GEORGE AND DAVID R. McINTYRE

K C1 f3 C2 : , C1 : C2, and C2 C1. Let


,
Downloaded 12/27/12 to 150.135.135.70. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

{Y lY Ki and Adj (y) C LI C}.


Then if the minimum degree algorithm chooses x Ki at the i-th step, then it numbers
the remaining nodes of Ki+l Ki\{x} next, in arbitrary order.
Proo[. First, note that C1 and C2 connot have any interior nodes, since
otherwise the minimum degree algorithm would not choose a node from Ki. Thus,
for all y /i, deg (y)= d [C1 [,..J C21-1. Also, note that if y Ki\i, then
deg (y)> d, because it is connected to all other nodes in Cx C2, and at least one
other node. (Otherwise, it would be in Ki.) Finally, if y s Xi\Ki, then deg (y)->_ d,
because otherwise the minimum degree algorithm would not choose x. We now

First, if Ki+l ,
d- 1 if y Ki+l, and deg (y)_->d for y sXi+l\Ki+x.
there is nothing to prove, so suppose Ki+x .
want to show that after x is eliminated, yielding Gi+ (X/+l, Ei+I), that deg (y)=

Now the
elimination of x renders {C1 LI C2}\{x} a clique, but since Adj (x) C1 [.J C2, nodes

p + IC2\CxI- 1 >= d, since IC2\Cxl .


y s X/\{C1 [_J C2} are not affected by the elimination of x, so deg (y) =>d as before.
Suppose y s C\C2, and let its degree in Gi be p => d. Then after elimination it is
Similarly, after elimination of x, deg (y)_-> d
for y C2\C. Now consider yKi+l=Ki\{x}. Before elimination of x, y is
connected to all nodes in {C1U C2}\{y}, since C1 and C2 are cliques. Since the
elimination of x only involves nodes in C1 LI C2, y cannot be connected to any new
variables. Moreover, after elimination of x, y is no longer connected to x, so
deg (y) is reduced by one for y s Ki+x. Thus, in Gi+l, deg (y)= d- 1 for y Ki+l
and deg (y)_-> d for y s Ki+x\Ki+x.
The minimum degree algorithm will now choose a node in Ki+x, and by
Theorem 3.3, it will continue to choose nodes from Ki+I until it is exhausted.
,
COROLLARY 3.6. Let Ck, k 1, 2,..., r, be cliques in Gi, {0, 1,..., N 1},
Ki rk= Ck : ( and G ["){Xi\{[.Jk#l Ck}} # l= 1, 2,... r. Let

Then if the minimum degree algorithm chooses x Ki at the i-th step, then it numbers
the remaining nodes of K/I K \{x} next, in arbitrary order.
The proof is similar to that of Theorem 3.5 and is omitted.
Theorems 3.3 and 3.5, and Corollary 3.6 have important practical implica-
tions. It is easy to recognize when their hypotheses apply, and they allow us to
immediately number sets of nodes by doing only one minimum degree search.
Thus, after each minimum degree search, a set of r_>-1 nodes will be
numbered, inducing a partitioning of X. We will denote this partitioning by
{[1, i2, /3, ", p}, where LI’= i X and i [") 0OtSj for # j. Figures
3.1 a b contain an L-shaped triangular mesh Mo together with a few of the meshes
M, 0 =< -< N, generated by the minimum degree algorithm. Figure 3.2 shows the
matrix structure of L +L corresponding to the ordering, with the column
partitioning indicated by the vertical lines.
APPLICATION OF THE MINIMUM DEGREE ALGORITHM 99
Downloaded 12/27/12 to 150.135.135.70. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

Original mesh Mo Mlo

MI8 M25

M3o M37
FIG. 3.1a. A selectionolmeshes/rom the sequence Mk, k 0, 1,. , N.
Downloaded 12/27/12 to 150.135.135.70. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

100

M47

M6o

FIG. 3. lb. A selection ofmeshesfrom the sequence Mk, k


Mso

M57

0, 1,.
ALAN GEORGE AND DAVID R. MclNTYRE

, N.
APPLICATION OF THE MINIMUM DEGREE ALGORITHM 101
Downloaded 12/27/12 to 150.135.135.70. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

FIG. 3.2. Structure o[L + L 7 corresponding to the ordering produced ]’or the mesh o]’Fig. 3.1. The
partitioning is indicated by vertical lines. The character * represents nonzeros o]’ L which correspond
to nonzeros in A, while represents Jill.

4. A refinement of the minimum degree ordering. In the last section we saw

the members of the partitioning .


how the minimum degree algorithm naturally induces a partitioning of the matrix
A (and therefore also of L). Consider each "block column" of L, determined by
Within each such block column the rows are
either empty or full, and we can partition the rows according to contiguous sets of
nonnull rows, as indicated in Fig. 3.2.
Basically, the storage scheme we use in our linear equation solver is one
which stores the dense submatrices determined by this row-within-(block) column
partitioning. Now each nonnull block incurs a certain amount of storage over-
head, so we would like the number of these blocks to be as small as possible. The
102 ALAN GEORGE AND DAVID R. McINTYRE

purpose of this section is to describe a way of reordering the members of each


Downloaded 12/27/12 to 150.135.135.70. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

partition member so as to reduce the number of nonnull blocks in this


row-within-column partitioning.
Our reordering scheme is most easily motivated using the mesh model of
elimination we introduced in 2. First, note that most of the partition members
(except for some of the initial ones) will correspond to nodes lying on a side of an
element in some mesh Mk, 0<= k <-N. It is clear that the order in which these
models are numbered is irrelevant as far as fill or operation counts are concerned,
and their relative order is not specified by the minimum degree algorithm. Thus,
we are free to choose the ordering within the partitions to reduce the number of
individual blocks of L we must store.
How do we achieve this? Consider the schematic drawing in Fig. 4.1,
indicating a subsequence of meshes taken from the sequence Mk, k O, 1, N. ,

FIG. 4.1. A subsequence of meshes from the sequence Mk, k 0, 1,. , N.


From what we have established about the behavior of the minimum degree
algorithm through Theorems 3.3 and 3.5, it is clear that the mesh line segments
(element sides) denoted by ( through () correspond to dense diagonal blocks
of L. Now consider the block column of L corresponding to (]). Obviously, block
() is connected to part of block (), but in general the connection will not be
reflected as a dense off-diagonal block of L, unless the nodes of ( which are
connected to () are numbered consecutively. Similarly, the connections of block ()
to. block ( will correspond to a dense block of L only if nodes of () which are
connected to () a-re numbered consecutively.
APPLICATION OF THE MINIMUM DEGREE ALGORITHM 103

This motivates our reordering algorithm for each partition i. We reorder the
Downloaded 12/27/12 to 150.135.135.70. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

nodes of each i in a way which corresponds to numbering nodes on an element


side consecutively, beginning at one end. Figure 4.2 shows the structure of L + L T
corresponding to the reordering obtained by applying our reordering scheme to
the problem which produced the Figures 3.1 and 3.2. The reduction in the number
of off-diagonal blocks is apparent, but it is not particularly impressive because the
problem is quite small. For larger problems, however, the reduction is usually very
substantial.
Now that we have established what we want done, how do we achieve it?
Obviously, if I ,1 -< 2, there is nothing to do. For I ,1> 2, the nodes in will
typically all lie on an element side in some mesh Mk, 0 <= k < N, such as indicated in
Fig. 4.3. (For p, the last partition, the situation is somewhat more complicated,
104 ALAN GEORGE AND DAVID R. MclNTYRE

since often three element sides are involved, as in M60, Fig. 3. lb. We consider this
Downloaded 12/27/12 to 150.135.135.70. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

problem later.)

Mk Mk +1,1
FIG. 4.3. An example of a typical i, <- <= p.
Let G (X, E) be the unlabeled graph corresponding to A, and let (
(X, ) be the subgraph of G obtained by interpreting the mesh M as a graph. Let
G, be the section graph G(i) (see 2). Now the element model makes it
abundantly clear that in general Ge,, consists of a single node, or a simple chain,
usually the latter. The graph Gyp is a special case, typically consisting of three
chains connected by virtue of a small shared clique or two chains connected by a
cycle.
Our reordering algorithm is straightforward, and although we do not know
how it can be improved, we have no proof that it is optimal. It essentially
involves the generation of two rooted spanning trees of Ge,,, the first of which is
generated in such a way that the distance from any node x to the root r in the
tree is the same as the distance from x to r in G,. This can easily be done by
generating the tree in a breadth-first manner, rather than in a depth-first manner
[8].
Our reordering algorithm consists of two general phases, which we now
informally describe. Here rl, r:,. , rl,l are the consecutive integers assigned to
the members of i by the minimum degree algorithm.
Phase 1. Choose any node x in G, and generate a breadth-first spanning
tree T1 for Ge,,, rooted at x. Any node y at the last level of the tree is chosen as a
starting node for stage 2.
Phase 2. In this phase, 9 is a stack which is originally empty, and is only
utilized if Ge,, is not a chain.
1) Label the node y provided by Phase 1 as r.
2) For each 2, 3,. , I[ do the following:
a) If x,-1, the last labeled node, has only one unlabeled node y adjacent
to it, then label it r.
b) If Adj (x,,-1) has more than one unlabeled node, of those not already in
9, label one r and place the remainder on the stack 9. If all nodes in
Adj (x,-1) are also in 9, choose one of those and label it r.
c) If the members of Adj (x,_) are all numbered, pop the stack until an
unlabeled node y is popped, and label it r.
APPLICATION OF THE MINIMUM DEGREE ALGORITHM 105

Figure 4.4 illustrates the reordering algorithm. Phase I generates the tree T1
Downloaded 12/27/12 to 150.135.135.70. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

rooted at x, and chooses the node y as the starting node for Phase 2. Step a) of
Phase 2 is executed until node g is labeled. At the next step node h is placed in Yt
and c is labeled. At the next step, the. unlabeled nodes of Adj (c) are {h, x}, but
since h is already in the stack, node x is labeled, and then step a) of Phase 2
operates until node a is labeled. Since Adj (a) is all labeled, node h is obtained
from Yt and labeled, followed by nodes i, j and k via steps a) and c) of Phase 2.

g ! e Tt
d Y
Mesh nodes

13 12 11 10 5
Starting node for Phase 2
FIG. 4.4. Relabeling of Gi.

5. Remarks on implementation and some numerical experiments. We saw in


2 how cliques naturally arise during symmetric Gaussian elimination. In matrix
problems associated with the use of the finite element method, cliques of size
larger than one exist in GA, and persist for some time during the elimination,
typically growing in size by merging with other cliques before finally disappearing
through elimination. Moreover, Theorem 3.5 operates for a considerable propor-
tion of the total node numberings.
These observations make it natural to represent the elimination graph
sequence through its clique structure, since elimination of variables typically leads
to merging of two or more cliques into a new clique. Our approach then, is as
follows. The graph Gi (Xi, Ei) is represented by the set of cliques {CI} alongc
with a clique membership list for each node. An example appears in Fig. 5.1.
Now our actual implementation does not represent the entire sequence of
graphs G, 0, 1, 2,..., N-1, during its execution. Only those graphs which
would be obtained after each is determined are actually created. That is, we
repeatedly apply Theorems 3.3 and 3.5.1 The general step of our algorithm,
described below, is executed p times, where p and {1, 2," p}. ,
These results are also used in the subroutine SORDER contained in the Yale Sparse Matrix
Package 11 ].
106 ALAN GEORGE AND DAVID R. MclNTYRE

General Step r, r 1, 2,. p. ,

.
Downloaded 12/27/12 to 150.135.135.70. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

1) Find an unnumbered node x of minimum degree. If all nodes are


numbered, stop.
2) Let Or {llx C-1}, and determine the set of nodes in
which are connected only to nodes in
Llo, C , -
3
Graph Go
6
10

Clique
Node Membership Clique Set o
2 2, 5 C: {1, 7, 6}
3 2, 6 C: {2, 3, 4}
4 2, 4 C: {5, 8, 9, 10}
5 3, 5, 7 C4: {4, 10}
6 1, 6, 7 C: {2, 5}
7 1, 8 C6: {3, 6}
8 3, 8 C7: {5, 6}
9 3 C8: {7, 8}
10 3,4

FIG. 5.1. The graph Go represented by its clique set o and clique membership list.
3) Set c, {(r_\{c[-lll O.r}}U{r\r}.
4) Update the degrees of the nodes in / (the new clique), and their clique
membership lists.
5) Increment r and go to step 1).
Our code consists of two phases, the first is simply the minimum degree
algorithm, modified to exploit what we know about the behavior of the algorithm,
as described by Theorems 3.3 and 3.5. The second phase performs the reordering
of each partition member as described in 4. Although this splitting into two
phases is not necessary (since each could be reordered as it is generated), it was
done to keep the code modular, and to ease maintenance and subsequent possible
enhancements.
Our code accepts as initial input a collection of node sets corresponding to the
elements (cliques)of the finite element mesh. This mesh changes as the algorithm
proceeds, so its representation must be such that merging cliques (elements) is
reasonably efficient and convenient. The data structure we used to represent the
graphs is depicted in Fig. 5.2. At any stage of the algorithm, the nodes of each
clique along with some storage management information are stored in consecutive
locations in a storage pool (POOL). A pointer array HDR of length -< NCLQS
(the initial number of elements) is used to point to the locations of the elements in
POOL. Finally, a rectangular array C is used to store the clique membership lists;
APPLICATION OF THE MINIMUM DEGREE ALGORITHM 107

row of C contains pointers into HDR corresponding to cliques which have node
Downloaded 12/27/12 to 150.135.135.70. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

as a member.

8 9

MESH 3

() () PO()L

123

134

2 5

234567

47
3
516 8 311 5
6
361 5
678 5
8 4
5
7
5
6
5
7
8
5
7
5
6
8
5
8
7
8
9

FIG. 5.2. Example showing the basic data structure for storing cliques of a finite element mesh.

Step 3)of the algorithm above obviously implies an updating operation of the
arrays C, HDR and POOL to reflect the new clique structure of the graph which
has seen some of its cliques coalesce into a single new one, along with the removal
of some nodes. In general, the node-sets corresponding to each clique to be
merged will be scattered throughout POOL, and none of them may occupy
enough space so that the new clique to be created could overwrite them. To avoid
108 ALAN GEORGE AND DAVID R. McINTYRE

excessive shuffling of data, we simply allocate space for the new clique from the
Downloaded 12/27/12 to 150.135.135.70. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

last-used position in POOL, and mark the space occupied by the coalesced cliques
as free. When space for a new element can no longer be found in POOL, a storage
compaction is performed. See [8, pp. 435-451 for a description of these standard
storage management techniques.
Our first objective is to study the behavior of our ordering algorithm. We ran
our code on N by N finite element matrix problems arising from n by n
right-triangular meshes of the form shown in Fig. 5.3. We ran our code for
n 5(5)35 to study the behavior of various quantities as a function of N (n + 1)2.

FIG. 5.3. A 5 by 5 right-triangular mesh yielding N 36.

The results of our runs are summarized in Tables 5.1-5.3. The "overhead"
column in Table 5.1 refers to the number of pointers etc. used by our data
structure for L. In our implementation of an IBM 360/75, we used a 32 bit word
for both pointers and data. On many machines with a larger wordlength, it would
make sense to pack two or perhaps more pointers per word. Thus, in other
implementations the overhead for our data structure compared to the storage
required for the actual components of L would be much less than appears in Table
5.1.
The overhead and primary column entries in Table 5.1 do not quite add up to
the corresponding entry in the total column because we included various other
auxiliary vectors and space for the right side b in the total storage count.
The following observations are apparent from the data in Tables 5.1-5.3.
1) The overhead storage appears to grow linearly with N, and the total
storage requirement for all data associated with solving the matrix problem grows
as N log N. This has two important practical implications. First, it implies that
(overhead storage)/ (total storage) 0 as N c, in contrast to most sparse matrix
solvers for which this ratio is some constant a, usually with a > 1. The second
implication is perhaps even more important. It is well known that for this problem,
the use of bandmatrix methods (i.e., a banded ordering) implies that total storage
requirements grow as N3/2. Indeed, the best ordering known to the authors (the
APPLICATION OF THE MINIMUM DEGREE ALGORITHM 109

so-called diagonal dissection ordering [3]) would imply a storage requirement of


Downloaded 12/27/12 to 150.135.135.70. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

O(NlogN).

TABLE 5.1
Storage statistics for the ordering produced by the minimum degree algorithm followed by the
improvement described in 4.

Overhead Overhead Total


N Overhead Primary Total
Total N N log N

5 36 316 185 537 .59 8,78 4.16


10 121 1174 1039 2334 .50 9.70 4.02
15 256 2457 2889 5612 .44 9.60 3.95
20 441 4195 5959 10595 .40 9.51 3.95
25 676 6501 10092 17269 .38 9.62 3.92
30 961 9153 17190 27304 .34 9.52 4.14
35 1296 12425 24252 37973 .33 9.59 4.09

2) The entries in Table 5.2 suggest rather strongly that the execution time of
our ordering code for this problem grows no faster than N log N. Similar experi-
ments with other mesh problems demonstrate the same behavior.
3) Table 5.3 contains some interesting statistics about the partitioning
induced by the repeated applications of Theorems 3.3 and 3.5 in our ordering
,
algorithm. It appears that is approaching a "limit" near N/2, and that the
number of off-diagonal blocks in each "block column" is approaching about 4.
Again, similar experiments with other mesh problems indicate that this behavior
is not unique to our test problem.
We now turn to a comparison of our ordering algorithm with an alternative.
For comparison, we used the recently developed ordering algorithm due to Gibbs
et al. [5], along with a solver which exploits the variation in the bandwidth of the

TABLE 5.2
Execution time in seconds on an IBM 360/75 for the ordering algorithm described in 3 and 4.

Ordering
Back- Back- Total Soln.* plus
Time time
Time for Time for Total Fact. Fact. solve solve Soln. solution
N Phase Phase time N log N mult. time mult. time time Nx/ time

5 36 .35 .20 .55 .0043 578 .04 370 .03 .07 .324 .62
10 121 1.15 .50 1.65 .0028 5739 .22 2078 .09 .31 .233 1.96
15 256 2.37 1.08 3.45 .0024 21919 .61 5798 .20 .81 .198 4.26
20 441 4.23 1.84 6.07 .0023 56501 1.30 11918 .34 1.64 .177 7.71
25 676 6.57 2.84 9.41 .0021 107474 2.36 20184 .54 2.90 .165 12.31
30 961 9.69 4.01 13.70 .0021 242548 4.36 34380 .77 5.13 .172 18.83
35 1296 13.23 5.39 18.62 .0020 360937 6.14 48504 1.06 7.20 .154 25.82

Scaled by 10 -3.
110 ALAN GEORGE AND DAVID R. McINTYRE

TABLE 5.3
Downloaded 12/27/12 to 150.135.135.70. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

Statistics on as a function o.f N ]:or the ordering produced by the algorithm


described in 3 and 4.

No. of off- off-diagonal blocks


N diagonal blocks [l I1
5 36 59 28 2.1 .777
10 121 242 82 3.0 .678
15 256 510 156 3.3 .609
20 441 871 256 3.4 .580
25 676 1362 384 3.5 .568
30 961 1912 531 3.6 .553
35 1296 2608 710 3.7 .548

matrix, as suggested by Jennings [7]. In our tables, we denote results for this
ordering-solver combination by BAND, as opposed to the results of our ordering
algorithm/linear equation solver package, which we denote by BMD (block-
minimum-degree).
From Table 5.4 the total storage for the solution of the test problem, using the
band ordering, appears to grow as O(Nx/--N), as expected for these meshes. The
storage overhead is only =N. However, in contrast, the total storage used for the
solver which uses the BMD ordering appears to grow only as O(N log N), despite
the larger overhead. Extrapolating the results of the tables suggests that the
storage for the BMD ordering will be less than band storage for N > 2000, with
the saving reaching 50% by the time N is around 15,000.

TABLE 5.4
Storage statistics for BAND program.

Overhead Overhead Total


N Overhead Primary Total Total N Nx/t

5 36 39 191 267 .15 1.08 1.236


10 121 124 1,056 1302 .10 1.02 .978
15 256 259 3096 3612 .07 1.01 .882
20 441 444 6811 7697 .06 1.01 .831
25 676 679 12701 14057 .05 1.00 .800
30 961 964 21266 23192 .04 1.00 .778
35 1296 1299 33006 35602 .04 1.00 .763

It should be noted that the BMD ordering algorithm is implemented in =30N


storage (i.e., linear in N). This is important because for N-> 1000 the ordering can
be done in the space used later for the factorization.
The entries in Table 5.5 suggest that the band ordering time is O(N), for
p 1.05, and the solution time is O(N2). A look at the operations for the
factorization time for the BMD and band orderings in Tables 5.2 and 5.5 confirm
APPLICATION OF THE MINIMUM DEGREE ALGORITHM 111
that the apparent differences in factorization times are indeed due to differences in
Downloaded 12/27/12 to 150.135.135.70. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

operation counts and not to program complexity.


Least squares approximations to the total execution times were found for the
BMD and band algorithms using as basis functions the orders suggested in Tables
5.2 and 5.5. The results suggest that the BMD algorithm will execute faster than
the band algorithm for N=>20,000. For N-60,000 the results imply that the
BMD algorithm is twice as fast. Thus, our ordering/solution package is unlikely to
be attractive as a one shot scheme.
However, in many situations involving mildly nonlinear and/or time depen-
dent problems, many matrix problems having the same structure, or even the
same coefficient matrix, must be solved. In these situations it makes sense to
ignore ordering time and compare the methods with respect to factorization time
or solution time. If we do this we see from Table 5.6 that the cross-over point for
factorization time is when N 1,500, and for solution time the cross-over point is
about N 2,200.

TABLE 5.5
Execution time in seconds on an IBM 360/75 for BAND program.

Ordering Ordering
Total
Soln.*
time Back- Back- plus
Ordering Fact. Fact. solve solve soln. time solution
N time N 1.05 mult. time mult. time time _N time

5 36 .03 .697 610 .02 382 .01 .03 .0231 .06


10 121 .11 .715 5445 .09 2112 .03 .12 .082 .23
15 256 .23 .681 21880 .29 6192 .06 .35 .053 .58
20 441 .41 .686 61040 .75 13622 .13 .88 .045 1.29
25 676 .66 .705 137800 1.48 25402 .22 1.70 .037 2.36
30 961 .93 .686 270785 2.73 42532 .38 3.11 .034 4.04
35 1296 1.31 .706 482370 4.64 66012 .54 5.18 .031 6.49

Scaled by 10- 3.

TABLE 5.6
Ratio ofBMD/ BAND for various quantities.

N Total time Total Store Fact. time Soln. time

5 36 10.33 2.01 2.00 3.00


10 121 8.52 1.79 2.44 3.00
15 256 7.34 1.55 2.10 3.33
20 441 5.98 1.38 1.73 2.62
25 676 5.22 1.23 1.59 2.45
30 961 4.66 1.18 1.60 2.03
35 1296 3.98 1.07 1.32 1.96

Concluding remarks. In terms of execution time, our numerical experiments


suggest that our ordering algorithm/solution package is attractive for "one-of"
problems only if N is extremely large. However, in terms of storage requirements,
112 ALAN GEORGE AND DAVID R. McINTYRE

and if only factorization and solution time is considered, our scheme looks
Downloaded 12/27/12 to 150.135.135.70. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

attractive compared to band oriented schemes if N is larger than a few thousand.


Our experiments suggest that for our class of finite element problems, the
ordering code executes in O(N log N) time, and the ordering produced for this
problem yields storage and operation counts of O(N log N) and O(N3/2) respec-
tively. For the square mesh problem, these counts are known to be optimal, in the
order of magnitude sense [4]. It is interesting to observe that the partitioning
produced by the minimum degree algorithm prescribes dissecting sets similar in
flavor to those for dissection orderings [3], [4]. This leads us to speculate whether.
the minimum degree algorithm generates asymptotically optimal orderings for
general finite element matrix problems. Further research in this area seems
appropriate.

REFERENCES
[1] P. O. ARALDSEN. The application of the superelement method in analysis and design of ship
structures and machinery components, National Symposium on Computerized Structural
Analysis and Design, George Washington University, Washington, D.C., 1972.
[2] C. BERGE, The Theory of Graphs and its Applications, John Wiley, New York, 1962.
[3] GARRETT BIRKHOFF AND ALAN GEORGE, Elimination by nested dissection, Complexity of
Sequential and Parallel Algorithms, J. F. Traub, ed., Academic Press, New York, 1973, pp.
221-269.
[4] ALAN GEORGE, Nested dissection of a regular finite element mesh, this Journal, 10 (1073), pp.
345-363.
[5] N. E. GIBBS, W. G. POOLE AND P. K. STOCKMEYER, An algorithm ]’or reducing the bandwidth
and profile of a sparse matrix, this Journal, 13 (1976), pp. 235-251.
[6] M. J. L. HUSSEY, R. W. THATCHER AND M. J. M. BERNAL, Construction and use of finite
elements, J. Inst. Math. Appl., 6 (1970), pp. 262-283.
[7] A. JENNINGS, A compact storage scheme/’or the solution of symmetric linear simultaneous
equations, Computer J., 9 (1966), pp. 281-285.
[8] D. E. KNUTH, The Art of Computer Programming, vol. I, Fundamental Algorithms, Addison-
Wesley, Reading, MA, 1968.
9] S.V. PARTER, The use ollinear graphs in Gauss elimination, SIAM Rev., 3 (1961), pp. 364-369.
10] D. J. ROSE, A graph-theoretic study of the numerical solution of sparse positive definite systems of
linear equations, Graph Theory and Computing, R. C. Read, ed., Academic Press, New York,
1972.
[11] ANDREW H. SHERMAN, Yale sparse matrix package user’s guide, Lawrence Livermore
Laboratory Rept. UCID-30114, Livermore, CA, 1975.
[12] B. SPEELPENNING, The generalized element method, unpublished manuscript.
[13] K. L. STEWART AND J. BATY, Dissection of structures, J. Struct. Div., ASCE, Proc. paper No.
4665 (1966), pp. 75-88.
[14] JAMES H. WILKINSON, The Algebraic Eigenvalue Problem, Clarendon Press, Oxford, England,
1965.
15] O.C. ZIENKIEWICZ, The Finite ElementMethod in Engineering Science, McGraw-Hill, London,
1970.

You might also like