Updating The QR Factorization and The Least Squares Problem (2008)
Updating The QR Factorization and The Least Squares Problem (2008)
squares problem
2008
ISSN 1749-9097
Updating the QR Factorization and the Least
Squares Problem
Sven Hammarling∗ Craig Lucas†
November 12, 2008
Abstract
In this paper we treat the problem of updating the QR factorization,
with applications to the least squares problem. Algorithms are presented
that compute the factorization A e=Q eRe where Ae is the matrix A = QR
after it has had a number of rows or columns added or deleted. This is
achieved by updating the factors Q and R, and we show this can be much
e from scratch. We consider al-
faster than computing the factorization of A
gorithms that exploit the Level 3 BLAS where possible and place no restric-
tion on the dimensions of A or the number of rows and columns added or
deleted. For some of our algorithms we present Fortran 77 LAPACK-style
code and show the backward error of our updated factors is comparable to
e
the error bounds of the QR factorization of A.
1 Introduction
1.1 The QR Factorization
For A ∈ Rm×n the QR factorization is given by
A = QR, (1.1)
1
1.1.1 Computing the QR Factorization
We compute the QR factorization of A ∈ Rm×n , m ≥ n and of full rank, by
applying an orthogonal transformation matrix QT so that
QT A = R,
i j
I
c s i
G(i, j) =
I
−s c j
I
so only the ith and jth elements are affected. We can compute c and s by the
following algorithm.
2
if abs(b) ≥ abs(a)
t = −a/b
√
s = 1/ 1 + t2
c = st
else
t = −b/a
√
c = 1/ 1 + t2
s = ct
end
end
Here the computation of c and s has been rearranged to avoid possible overflow.
Now to transform A to an upper trapezoidal matrix we require a Givens
matrix for each subdiagonal element of A, and apply each one in a suitable order
such as
which is a QR factorization.
The matrix Q need not be formed explicitly. It is possible to encode c and s
in a single scalar (see [9, Sec. 5.1.11]), which can then be stored in the eliminated
aij .
The primary use of Givens matrices is to eliminate particular elements in a
matrix. A more efficient approach for a QR factorization is to use Householder
matrices which introduce zeros in all the subdiagonal elements of a column si-
multaneously.
Householder matrices, H ∈ Rn×n , are of the form
2
H = I − τ vv T , τ= ,
vT v
where the Householder vector, v ∈ Rn , is nonzero. It is easy to see that H is
symmetric and orthogonal. If y and z are distinct vectors such that kyk2 = kzk2
then there exists an H such that
Hy = z.
3
where x ∈ Rn−1 , and α and β are scalars. By setting
α
α
v= ±
x
e1 . (1.2)
x 2
We then have
α
α
H = ∓
x
e1 .
x 2
In we choose the sign in (1.2) to be negative then β is positive. However, if
[ α xT ] is close to a positive multiple of e1 , then this can give large cancellation
error. So we use the formula [14]
α2 − k [ α xT ] k22 −kxk22
v1 = α − k [ α xT ] k2 = =
α + k [ α xT ] k2 α + k [ α xT ] k2
to avoid this in the case when α > 0. This is adopted in the following algorithm.
Algorithm 1.2 For α ∈ R and x ∈ Rn−1 this function returns a vector v ∈ Rn−1
1
and a scalar τ such that ṽ = is a Householder vector, scaled so ṽ(1) = 1
v
1 α β
and H = I − τ [ 1 v T ] is orthogonal, with H = , where β ∈ R.
v x 0
4
Thus if we apply n Householder matrices, Hj , to introduce zeros in the sub-
diagonal columns one by one, we have the QR factorization
Hn . . . H1 A = QT A = R,
and the Hj are such that their vectors vj are of the form
vj (1: j − 1) = 0,
vj (j) = 1,
vj (j + 1: m) : as v in Algorithm 1.2.
The algorithm requires 2n2 (m − n/3) flops. The essential part of the House-
holder vectors can be stored in the subdiagonal, and we refer to Q being in
factored form.
If Q is to be formed explicitly we can do so with the backward accumulation
method by computing
(H1 . . . (Hn−2 (Hn−1 Hn ))).
This exploits the fact that the leading (j − 1)-by-(j − 1) part of Hj is the identity.
5
1.2 The Least Squares Problem
The linear system,
Ax = b,
where A ∈ Rm×n , x ∈ Rn and b ∈ Rn is overdetermined if m ≥ n. We can solve
the least squares problem
min kAx − bk2 ,
x
with A having full rank. We then have with the QR factorization A = QR and
with d = QT b,
g(ti) ≈ bi , i = 1: m.
The value bi has been observed at time ti . We wish to find the function g that ap-
proximates the value bi . In least squares fitting we restrict ourselves to functions
of the form
g(t) = x1 g1 (t) + x2 g2 (t) + · · · + xn gn (t),
where the functions gi (t) we call basis functions, and the coefficients xi are to be
determined. We find the coefficients by solving the least squares problem with
g1 (t1 ) g2 (t1 ) · · · gn (t1 ) b1
g (t ) g (t ) · · · g (t ) b2
1 2 2 2 n 2
A = .. .. .. , b =
... .
. . .
g1 (tm ) g2 (tm ) · · · gn (tm ) bm
Now, it may be required to update the least squares solution in the case where
one or more observations (rows of A) are added or deleted. For instance we could
have a sliding window where for each new observation recorded the oldest one is
deleted. The observations for a particular time period may be found to be faulty,
6
thus a block of rows of A would need to be deleted. Also, variables (columns of
A) may be added or omitted to compare the different solutions. Updating after
rows and columns have been deleted is also known as downdating.
To solve these updated least squares problems efficiently we have the problem
of updating the QR factorization efficiently, that is we wish to find A e= Q eR,
e
where Ae is the updated A, without recomputing the factorization from scratch.
We assume that A e has full rank. We also need to compute d˜ such that
e − b̃k = kRx
kAx ˜
e − dk,
e and d˜ = Q
where b̃ is the updated b corresponding to A eT b̃.
2 Updating Algorithms
In this section we will examine all the cases where observations and variables are
added to or deleted from the least squares problem. We derive algorithms for
updating the solution of the least squares problem by updating the QR factoriza-
e in the case m ≥ n. For completeness we have also included discussion
tion of A,
and algorithms for updating the QR factorization only when m < n. In all cases
we give algorithms for computing Q e should it be required. We will assume that
A and A e have full rank.
Where possible we derive blocked algorithms to exploit the Level 3 BLAS and
existing Level 3 LAPACK routines. We include LAPACK style Fortran 77 code
for updating the QR factorization in the cases of adding and deleting blocks of
columns.
For clarity the sines and cosines for Givens matrices and the Householder
vectors are stored in separate vectors and matrices, but could be stored in the
elements they eliminate. Wherever possible new data overwrites original data.
All unnecessary computations have been avoided, unless otherwise stated.
We give floating point operation counts for our algorithms and compare them
to the counts for the Householder QR factorization of A. e
Some of the material is based on material in [1] and [9].
7
the kth row, aTk . We can write
e = A(1: k − 1, 1: n)
A
A(k + 1: m, 1: n)
and we interpret A(1: 0, 1: n) and A(m + 1: m, 1: n) as empty rows. We define a
permutation matrix P such that
aTk T
a
P A = A(1: k − 1, 1: n) = ek = P QR,
A
A(k + 1: m, 1: n)
and if q T is the first row of P Q then we can zero q(2: m) with m − 1 Givens
matrices, G(i, j) ∈ Rm×m , so that
G(1, 2)T . . . G(m − 1, m)T q = αe1 , |α| = 1, (2.1)
since the Givens matrices are orthogonal. And we also have
T
T T v
G(1, 2) . . . G(m − 1, m) R = e ,
R
e is upper trapezoidal.
which is upper Hessenberg, so R
So we have finally
T
a
PA = e = (P QG(m − 1, m) . . . G(1, 2))(G(1, 2)T . . . G(m − 1, m)T R)
A
T
α 0 v
= e e ,
0 Q R
and
e=Q
A eR.
e
Note that the zero column below α is forced by orthogonality. Also note the
choice of a sequence of Givens matrices over one Householder matrix. If we were
to use a Householder matrix then the transformed R would be full, as H is full,
and not upper Hessenberg. We update b by computing
T T T ν
G(1, 2) . . . G(m − 1, m) Q P b = ˜ .
d
This gives the following algorithm.
8
q T = Q(k, 1: m)
if k 6= 1
% Permute b
b(2: k) = b(1: k − 1)
end
d = QT b
for j = m − 1: −1: 1
[c(j), s(j)] = givens(q(j), q(j + 1))
% Update q
q(j) = c(j)q(j) − s(j)q(j + 1)
% Update R if there is a nonzero row
if j ≤ n
T
c(j) s(j)
R(j: j + 1, j: n) = R(j: j + 1, j: n)
−s(j) c(j)
end
% Update d
T
c(j) s(j)
d(j: j + 1) = d(j: j + 1)
−s(j) c(j)
end
Re = R(2: m, 1: n)
d˜ = d(2: m)
% Compute the residual
resid = kd(n˜ + 1: m − 1)k2
Computing R e requires 3n2 flops, versus 2n2 (m − n/3) for the Householder
e If Q
QR factorization of A. e is required, it can be computed with the following
algorithm.
Algorithm 2.2 Given vectors c and s from Algorithm 2.1 this algorithm forms
e ∈ R(m−1)×(m−1) such that A
an orthogonal matrix Q e=Q
eR,
e where A e is the matrix
A = QR with the kth row deleted.
if k 6= 1
% Permute Q
Q(2: k, 1: m) = Q(1: k − 1, 1: m)
end
for j = m − 1: −1: 2
c(j) s(j)
Q(2: m, j: j + 1) = Q(2: m, j: j + 1)
−s(j) c(j)
end
9
% Do not need to update 1st column of Q
Q(2: m, 2) = s(1)Q(2: m, 1) + c(1)Q(2: m, 2)
e = Q(2: m, 2: m)
Q
W = Q(k: k + p − 1, 1: m)
if k 6= 1
% Permute b
b(p + 1: k + p − 1) = b(1: k − 1)
end
d = QT b
for i = 1: p
for j = m − 1: −1: i
[C(i, j), S(i, j)] = givens(W (i, j), W (i, j + 1))
% Update W
W (i, j) = W (i, j)C(i, j) − W (i, j + 1)S(i, j)
C(i, j) S(i, j)
W (i + 1: p, j: j + 1) = W (i + 1: p, j: j + 1)
−S(i, j) C(i, j)
% Update R if there is a nonzero row
if j ≤ n + i − 1
10
R(j: j + 1, j − i + 1: n) =
T
C(i, j) S(i, j)
R(j: j + 1, j − i + 1: n)
−S(i, j) C(i, j)
end
% Update d
T
C(i, j) S(i, j)
d(j: j + 1) = d(j: j + 1)
−S(i, j) C(i, j)
end
end
Re = R(p + 1: m, 1: n)
d˜ = d(p + 1: m)
% Compute the residual
˜ + 1: m − p)k2
resid = kd(n
11
2.2 Alternative Methods for Deleting Rows
AT A = RT QT QR = RT R,
e such that R
Thus if we find R, eT R
e=A eT A,
e then we have computed R e for A
e being A
with the kth row deleted. This can be achieved with hyperbolic transformations.
We define W ∈ Rm×m as pseudo-orthogonal with respect to the signature
matrix
J = diag(±1) ∈ Rm×m
if
W T JW = J.
If we transform a matrix with W we say that this is a hyperbolic transformation.
Now from (2.2) we have
eT A
A e = AT A − ak aT
k
= RT R − ak aTk
T In 0 R
= [R ak ] ,
0 −1 aTk
12
is upper trapezoidal. It follows that
eT
e = [R T T R
A A ak ] W JW
aTk
e
eT R
= [R 0]J
0
eT R,
= R e
i n+1
I
c −s i
W (i, n + 1) =
I
−s c n+1
Algorithm
2.5This algorithm
generates scalars c and s such that
c −s x1 y
= where x1 , x2 and y are scalars and c2 − s2 = 1,
−s c x2 0
if a solution exists.
if x2 = 0
s=0
c=1
else
if |x2 | < |x1 |
t = x2 /x1
√
c = 1/ 1 − t2
s = ct
else
13
no solution exists
end
end
with
x1 −x2
c= p , s= p .
x21 + x22 x21 + x22
Now suppose we know x̃1 and want to recreate the vector x, then rearrang-
ing (2.4) we have
with p
x̃21 − x22 −x2
c= , s= .
x̃1 x̃1
Thus we can recreate the steps that would have updated R e had we added aT
k
e instead of deleting it from A. At the ith step, for i = 1: n + 1, with
to A,
x1 = R(i, i),
e i),
x̃1 = R(i,
(i−1)
x2 = ak (i),
14
we compute, for j = i: n + 1:
can be written
R1 R1
aTk =q T
= [ q1T q2T ] ,
0 0
where q1 ∈ Rn . We compute q1 by solving
R1T q1 = ak ,
and update R by
T vT T
G(1, 2) . . . G(n, n + 1) R = e .
R
This algorithm is implemented in LINPACK’s xCHDD.
15
2.2.4 Stability Issues
Stewart [20] shows that hyperbolic transformations are not backward stable.
However, Chamber’s and Saunder’s algorithms are relationally stable [3], [19],
that is if W represents the product of all the transformations then
T
T v
W R = e + E,
R
where
kEk ≤ cn ukRk,
and cn is a constant that depends on n.
Saunder’s algorithm can fail for certain data, see [2].
and then
QT 0 e= R .
PA (2.5)
0 1 uT
16
For example, with m = 8 and n = 6 the right-hand side of (2.5) looks like:
+ + + + + +
0 + + + + +
0 0 + + + +
0 0 0 + + +
0 0 0 0 + +
,
0 0 0 0 0 +
0 0 0 0 0 0
0 0 0 0 0 0
⊖ ⊖ ⊖ ⊖ ⊖ ⊖
so we have
e= T Q 0 e=Q
eR.
e
A P G(1, m + 1) . . . G(n, m + 1) R
0 1
where µ is the element inserted into b corresponding to uT . This gives the fol-
lowing algorithm.
d = QT b
for j = 1: n
[c(j), s(j)] = givens(R(j, j), u(j))
R(j, j) = c(j)R(j, j) − s(j)u(j)
17
% Update jth row of R and u
t1 = R(j, j + 1: n)
t2 = u(j + 1: n)
R(j, j + 1: n) = c(j)t1 − s(j)t2
u(j + 1: n) = s(j)t1 + c(j)t2
% Update jth row of d and µ
t1 = d(j)
t2 = µ
d(j) = c(j)t1 − s(j)t2
µ = s(j)t1 + c(j)t2
end
Re= R
0
d
d˜ =
µ
% Compute the residual
˜ + 1: m + 1)k2
resid = kd(n
Computing R e requires 3n2 flops, versus 2n2 (m − n/3) for the Householder
e If Q
QR factorization of A. e is required, it can be computed with the following
algorithm.
Algorithm 2.7 Given vectors c and s from Algorithm 2.6 this algorithm forms
an orthogonal matrix Q e ∈ R(m+1)×(m+1) such that A
e=Q
eR,
e where A
e is the matrix
A = QR with a row added in the kth position.
e Q 0
Set Q =
0 1
if k 6= m + 1
% Permute Q
Q(1: k − 1, 1: n)
Q = Q(m + 1, 1: n)
Q(k: m, 1: n)
end
for j = 1: n
e m + 1, j)
t1 = Q(1:
e m + 1, m + 1)
t2 = Q(1:
e m + 1, j) = c(j)t1 − s(j)t2
Q(1:
e m + 1, m + 1) = s(j)t1 + c(j)t2
Q(1:
end
18
2.3.2 Adding a Block of Rows
To add a block of p observations to our least squares problem we add a block
of p rows, U ∈ R(p×n) , in the kth to (k + p − 1)st positions, k = 1: m + 1, of
A = QR ∈ Rm×n , m ≥ n, we can then write
A(1: k − 1, 1: n)
e=
A U
A(k: m, 1: n)
and
QT 0 e R
PA = . (2.6)
0 Ip U
For example, with m = 8, n = 6 and p = 3 the right-hand side of Equation (2.6)
looks like:
+ + + + + +
0 + + + + +
0 0 + + + +
0 0 0 + + +
0 0 0 0 + +
0 0 0 0 0 + ,
0 0 0 0 0 0
0 0 0 0 0 0
⊖ ⊖ ⊖ ⊖ ⊖ ⊖
⊖ ⊖ ⊖ ⊖ ⊖ ⊖
⊖ ⊖ ⊖ ⊖ ⊖ ⊖
19
The Householder matrix, Hj ∈ R(m+p)×(m+p) , will zero the jth column of U.
Its associated Householder vector, vj ∈ R(m+p) , is such that
vj (1: j − 1) = 0,
vj (j) = 1,
(2.7)
vj (j + 1: m) = 0,
vj (m + 1: m + p) = x/(rjj − k [ rjj xT ] k2 ), where x = U(1: p, j).
d = QT b
for j = 1: n
[V (1: p, j), τ (j)] = householder(R(j, j), U(1: p, j))
% Remember old jth row of R
Rj = R(j, j + 1: n)
% Update jth row of R
R(j, j: n) = (1 − τ (j))R(j, j: n) − τ (j)V (1: p, j)T U(1: p, j: n)
% Update trailing part if U
if j < n
20
U(1: p, j + 1: n) = U(1: p, j + 1: n) − τ (j)V (1: p, j)Rj
−τ (j)V (1: p, j)(V (1: p, j)T U(1: p, j + 1: n))
end
% Remember old jth element of d
dj = R(j)
% Update jth element of d
d(j) = (1 − τ (j))d(j) − τ (j)V (1: p, j)T e(1: p)
% Update e
e(1: p) = e(1: p) − τ (j)V (1: p, j)dj
−τ (j)V (1: p, j)(V (1: p, j)T e(1: p))
end
Re= R
0
d
d˜ =
e
% Compute the residual
resid = kd(n˜ + 1: m + p)k2
Computing R e requires 2n2 p flops, versus 2n2 (m+p−n/3) for the Householder
e If Q
QR factorization of A. e is required, it can be computed with the following
algorithm.
Algorithm 2.9 Given the matrix V and vector τ from Algorithm 2.8 this algo-
rithm forms an orthogonal matrix Q e ∈ R(m+p)×(m+p) such that Ae=Q
eR,
e where A e
is the matrix A = QR with a block of rows inserted in the kth to (k + p − 1)st
positions.
e Q 0
Set Q =
0 I
if k 6= m + 1
% Permute
Q
e k − 1, 1: m + p)
Q(1:
Qe = Q(m
e + 1: m + p, 1: m + p)
e m, 1: m + p)
Q(k:
end
for j = 1: n
% Remember jth column of Q e
Qek = Q(1:
e m + p, j)
% Update jth column
Q(1: m + p, j) = Q(1: m + p, j)(1 − τ (j))−
e m + p, m + 1: m + p)τ (j)V (1: p, j)
Q(1:
21
% Update m + 1: p columns of Q e
e m + p, m + 1: m + p) = Q(1:
Q(1: e m + p, m + 1: m + p)
−τ (j)Qek V (1: p, j) T
22
where V 2 ∈ Rp×nb hold the essential part of the Householder vectors for the
current block column. Then
R23 Inb R23
R33 R33
[ I − V T T V T ]T Im+p−r − 0 T T [ In 0 0 V T2 ]
0 = 0 b 0
U13 V2 U13
T
(Inb − T )R23 − T T V 2 U13
T
R33
=
,
0
T T T
−V 2 T R23 + (I − V 2 T V 2 )U13
This approach leads to a blocked algorithm, where at the kth stage we fac-
T T T
torize [ R22 0 U12 ] , where R22 ∈ Rnb ×nb and U12 ∈ Rp×nb , then update
R23 ∈ Rnb ×(n−knb ) and U13
T
∈ Rp×(n−knb) as above. And to update QT b = d
we compute
d(1: (k − 1)nb )
(Inb − T T )d((k − 1)nb + 1: knb ) − T T V T2 e
= d .
d(knb + 1: m) e
T T T
−V 2 T d((k − 1)nb + 1: knb ) + (I − V 2 T V 2 )e
d = QT b
for k = 1: nb : n
% Check for the last column block
jb = min(nb , n − k + 1)
Factorize current block with Algorithm 2.8 where
V is V (1: p, k: k + jb − 1)
% If we are not in last block column build T
% and update trailing matrix
if k + jb ≤ n
for j = k: k + jb − 1
% Build T
23
if j = k
T (1, 1) = τ (j)
else
T (1: j − k, j − k + 1) = −τ (j)T (1: j − k, 1: j − k)
∗V (1: p, k: j − 1)T V (1: p, j)
T (j − k + 1, j − k + 1) = τ (j)
end
end
% Compute products we use more than once
TV = T T V (1: p, k: k + jb − 1)T
Te = TV e
TU = TV U(1: p, , k + jb: n)
% Remember old d and e
dk = d(k: k + jb − 1)
ek = e
% Update d and e
d(k: k + jb − 1) = dk − T T dk − Te
e = −V (1: p, k: k + jb − 1)T T dk + ek
−V (1: p, k: k + jb − 1)Te
% Remember old trailing parts of R and U
Rk = R(k: k + jb − 1, k + jb: n)
Uk = U(1: p, k + jb: n)
% Update trailing parts of R and U
R(k: k + jb − 1, k + jb: n) = Rk − T T Rk − TU
U(1: p, k + jb: n) = −V (1: p, k: k + jb − 1)T T Rk + Uk
−V (1: p, k: k + jb − 1)TU
end
end
Re= R
0
d
d˜ =
e
% Compute the residual
˜ + 1: m + p)k2
resid = kd(n
24
2.3.3 Updating the QR Factorization for any m and n
In the case where m < n after m steps of Algorithm 2.8 we have
T
e=P T Q 0 R11 R22
A H1 . . . Hn ,
0 Ip 0 V
where R11 is upper triangular and V is the transformed U(1: p, m + 1: n). Thus
if we compute the QR factorization V = QV RV , we than have
T
e T Q 0 Im 0 e=QeR.
e
A= P H1 . . . Hn R
0 Ip 0 QTV
This gives us the following algorithms to update the QR factorization for any m
and n.
lim = min(m, n)
for j = 1: lim
[V (1: p, j), τ (j)] = householder(R(j, j), U(1: p, j))
% Remember old jth row of R
Rk = R(j, j + 1: n)
% Update jth row of R
R(j, j: n) = (1 − τ (j))R(j, j: n) − τ (j)V (1: p, j)T U(1: p, j: n)
% Update trailing part if U
if j < n
U(1: p, j + 1: n) = U(1: p, j + 1: n) − τ (j)V (1: p, j)Rk
−τ (j)V (1: p, j)(V (1: p, j)T U(1: p, j + 1: n))
end
end
Re= R
0
if m < n
Perform the QR factorization U(: , m + 1: n) = QU RU
e + 1: m + p, m + 1: n) = RU
R(m
end
25
Algorithm 2.12 Given the matrices V and QU and vector τ from Algorithm 2.11
this algorithm forms an orthogonal matrix Qe ∈ R(m+p)×(m+p) such that A
e=Q
eR,
e
where A e is the matrix A = QR with a block of rows inserted in the kth to
(k + p − 1)st positions.
e Q 0
Set Q =
0 I
if k 6= m + 1
% Permute
Q
e
Q(1: k − 1, 1: m + p)
Qe = Q(m
e + 1: m + p, 1: m + p)
e m, 1: m + p)
Q(k:
end
lim = min(m, n)
for j = 1: lim
% Remember jth column of Q e
Qek = Q(1:
e m + p, j)
% Update jth column
Q(1: m + p, j) = Q(1: m + p, j)(1 − τ (j))
e m + p, m + 1: m + p)τ (j)V (1: p, j)
−Q(1:
% Update m + 1: p columns of Q e
e m + p, m + 1: m + p) = Q(1:
Q(1: e m + p, m + 1: m + p)
−τ (j)Qek V (1: p, j)T
e m + p, m + 1: m + p)V (1: p, j))V (1: p, j)T
−τ (j)(Q(1:
end
if m < n
Q(1: m + p, m + 1: m + p) = Q(1: m + p, m + 1: m + p)QU
end
26
For example, with m = 8, n = 6 and k = 3 the right-hand side of Equation (2.10)
looks like:
+ + + + +
0 + + + +
0 0 + + +
0 0 ⊖ + +
,
0 0 0 ⊖ +
0 0 0 0 ⊖
0 0 0 0 0
0 0 0 0 0
with the nonzero elements to remain represented with a + and the elements to
be eliminated are shown with a ⊖.
Thus we can define n − k Givens matrices, G(i, j) ∈ Rm×m , to eliminate the
e to give
subdiagonal elements of QT A
e=Q
(G(n, n + 1)T . . . G(k, k + 1)T QT )A eT A
e = R,
e
d˜ = QT b
set R(1: m, k: n − 1) = R(1: m, k + 1: n)
for j = k: n − 1
[c(j), s(j)] = givens(R(j, j), R(j + 1, j))
% Update R
R(j, j) = c(j)R(j, j) − s(j)R(j + 1, j)
T
c(j) s(j)
R(j: j + 1, j + 1: n − 1) = R(j: j + 1, j + 1: n − 1)
−s(j) c(j)
% Update d˜
T
˜ c(j) s(j) ˜ j + 1)
d(j: j + 1) = d(j:
−s(j) c(j)
end
27
e = upper triangular part of R(1: m, 1: n − 1)
R
% Compute the residual
˜ + 1: m)k2
resid = kd(n
Algorithm 2.14 Given vectors c and s from Algorithm 2.13 this algorithm forms
e ∈ Rm×m such that A
an orthogonal matrix Q e = Q e R,
e where A e is the matrix
A = QR with the kth column deleted.
for j = k: n − 1
c(j) s(j)
Q(1: m, j: j + 1) = Q(1: m, j: j + 1)
−s(j) c(j)
end
e=Q
Q
e = A(1: m, 1: k − 1),
A e = R(1: m, 1: k − 1),
R e = Q, and d˜ = QT b,
Q
e = [ A(1: m, 1: k − 1) A(1: m, k + p: n) ]
A
then
e = [ R(1: m, 1: k − 1) R(1: m, k + p: n) ] .
QT A (2.11)
For example, with m = 10, n = 8, k = 3 and p = 3 the right-hand side of
Equation (2.11) looks like:
28
+ + + + +
0 + + + +
0 0 + + +
0 0 ⊖ + +
0 0 ⊖ ⊖ +
,
0 0 ⊖ ⊖ ⊖
0 0 0 ⊖ ⊖
0 0 0 0 ⊖
0 0 0 0 0
0 0 0 0 0
with the nonzero elements to remain represented with a + and the elements to
be eliminated are shown with a ⊖.
Thus we can define n − p − k + 1 Householder matrices, Hj ∈ Rm×m , with
associated Householder vectors, vj ∈ R(p+1) such that
vj (1: j − 1) = 0,
vj (j) = 1,
eT A)
vj (j + 1: j + p) = x/((Q e jj − k [ (Q
eT A)
e jj xT ] k2 ),
e + 1: j + p, j),
where x = QT A(j
vj (j + p + 1: m) = 0.
e to give
and can be used to eliminate the subdiagonal of QT A
e=Q
(Hn−p . . . Hk QT )A eT A
e = R,
e
29
Algorithm 2.15 Given A = QR ∈ Rm×n , with m ≥ n, this algorithm computes
eT A
Q e=R e ∈ Rm×(n−p) where R e is upper trapezoidal, Qe is orthogonal and Ae is A
with the kth to (k + p − 1)st columns deleted, 1 ≤ k ≤ n − p, 1 ≤ p < n, and d˜
e − bk2 = kRx
such that kAx e − dk˜ 2 . The residual, kd(n
˜ + 1: m)k2 , is also computed.
d˜ = QT b
set R(1: m, k: n − p) = R(1: m, k + p: n)
for j = k: n − p
[V (1: p, j), τ (j)] = householder(R(j, j), R(j + 1: j + p, j))
% Update R
R(j, j) = R(j, j) − τ (j)R(j, j) − τ (j)V (1: p, j)T R(j + 1: j + p, j)
if j < n − p
R(j: j + p, j + 1: n − p) = R(j: j + p, j + 1: n − p)
1
−τ (j) ([ 1 V (1: p, j)T ] R(j: j + p, j + 1: n − p))
V (1: p, j)
end
% Update d˜
˜ j + p) = d(j:
d(j: ˜ j + p, j + 1)
1 ˜ j + p))
−τ (j) ([ 1 V (1: p, j)T ] d(j:
V (1: p, j)
end
Re = upper triangular part of R(1: m, 1: n − p)
% Compute the residual
resid = kd(n˜ + 1: m)k2
Algorithm 2.16 Given the matrix V and vector τ from Algorithm 2.15 this
e ∈ Rm×m such that A
algorithm forms an orthogonal matrix Q e=Q eR,
e where A
e is
the matrix A = QR with the kth to (k + p − 1)st columns deleted.
for j = k: n − p
Q(1: m, j: j + p) = Q(1: m, j: j + p)
1
−τ (j) Q(1: m, j: j + p) [ 1 V (1: p, j)T ]
V (1: p, j)
end
e=Q
Q
30
In the case when k = n − p + 1 then
e = A(1: m, 1: k − 1),
A e = R(1: m, 1: k − 1),
R e = Q, and d˜ = QT b,
Q
• Determine the last index of the Householder vectors, which cannot exceed
m.
This gives the following algorithms to update the QR factorization for any m and
n.
31
Algorithm 2.18 Given the matrix V and vector τ from Algorithm 2.17 this
e ∈ Rm×m such that A
algorithm forms an orthogonal matrix Q e=Q eR,
e where A
e is
the matrix A = QR with the kth to (k + p − 1)st columns deleted.
lim = min(m − 1, n − p)
for j = k: lim
last = min(j + p, m)
Q(1: m, j: last) = Q(1: m, j:
last)
1
−τ (j) Q(1: m, j: last) [ 1 V (1: last − j, j)T ]
V (1: last − j, j)
end
Qe=Q
e = [ R11
QT A R12 ] ,
e = [ A(1: m, 1: k − 1) u A(1: m, k: n) ]
A
then
e = [ R(1: m, 1: k − 1) v
QT A R(1: m, k: n) ] , (2.12)
where v = QT u. For example, with m = 8, n = 6 and k = 4 the right-hand side
of Equation (2.12) looks like:
32
+ + + + + + +
0 + + + + + +
0 0 + + + + +
0 0 0 + + + +
,
0 0 0 ⊖ ⊕ + +
0 0 0 ⊖ 0 ⊕ +
0 0 0 ⊖ 0 0 ⊕
0 0 0 ⊖ 0 0 0
e=Q
(G(k, k + 1)T . . . G(m − 1, m)T QT )A eT A
e = R,
e
˜
G(k, k + 1)T . . . G(m − 1, m)T QT b = d.
u = QT u
d˜ = QT b
for i = m: −1: k + 1
[c(i), s(i)] = givens(u(i − 1), u(i))
u(i − 1) = c(i)u(i − 1) − s(i)R(i)e
% Update R if there is a nonzero row
if i ≤ n + 1
T
c(i) s(i)
R(i − 1: i, i − 1: n) = R(i − 1: i, i − 1: n)
−s(i) c(i)
end
% Update R
33
T
˜ − 1: i) = c(i) s(i) ˜ − 1: i)
d(i d(i
−s(i) c(i)
end
if k = 1
Re = upper triangular part of [ u R ]
else if k = n + 1
Re = upper triangular part of [ R u ]
else
Re = upper triangular part of [ R(1: m, 1: k − 1) u R(1: m, k: n) ]
end
% Compute the residual
˜ + 1: m)k2
resid = kd(n
Algorithm 2.20 Given the vectors c and s from Algorithm 2.19 this algorithm
e ∈ Rm×m such that A
forms an orthogonal matrix Q e=Qe R,
e where A
e is the matrix
A = QR with a column inserted in the kth position.
for i = m: −1: k + 1
c(j) s(j)
Q(1: m, i − 1: i) = Q(1: m, i − 1: i)
−s(j) c(j)
end
e=Q
Q
e = [ A(1: m, 1: k − 1) U
A A(1: m, k: n) ]
then
e = [ R(1: m, 1: k − 1) V
QT A R(1: m, k: n) ] ,
34
where V = QT U. For example, with m = 12, n = 6, k = 3 and p = 3 the
right-hand side of Equation (2.12) looks like:
+ + + + + + + + +
0 + + + + + + + +
0 0 + + + + + + +
0 0 ⊖ + + ⊕ + + +
0 0 ⊖ ⊖ + ⊕ ⊕ + +
0 0 ⊖ ⊖ ⊖ ⊕ ⊕ ⊕ +
,
0 0 ⊖ ⊖ ⊖ 0 ⊕ ⊕ ⊕
0 0 ⊖ ⊖ ⊖ 0 0 ⊕ ⊕
0 0 ⊖ ⊖ ⊖ 0 0 0 0
0 0 ⊖ ⊖ ⊖ 0 0 0 0
0 0 ⊖ ⊖ ⊖ 0 0 0 0
0 0 ⊖ ⊖ ⊖ 0 0 0 0
U = QT U
d˜ = QT b
for j = 1: p
for i = m: −1: k + j
[C(i, j), S(i, j)] = givens(U(i − 1, j), U(i, j))
% Update U
U(i − 1, j) = C(i, j)U(i − 1, j) − S(i, j)U(i, j)
if j < p
U(i − 1: i, j + 1: p) =
35
T
C(i, j) S(i, j)
U(i − 1: i, j + 1: p)
−S(i, j) C(i, j)
end
% Update R if there is a nonzero row
if i ≤ n + j
R(i − 1: i, i − j: n) =
T
C(i, j) S(i, j)
R(i − 1: i, i − j: n)
−S(i, j) C(i, j)
end
% Update d˜
T
˜ − 1: i) = C(i, j) S(i, j) ˜ − 1: i)
d(i d(i
−S(i, j) C(i, j)
end
end
if k = 1
Re = upper triangular part of [ U R ]
else if k = n + 1
Re = upper triangular part of [ R U ]
else
Re = upper triangular part of [ R(1: m, 1: k − 1) U R(1: m, k: n) ]
end
% Compute the residual
resid = kd(n ˜ + 1: m)k2
Computing R e requires 6(mp(n+ p −m/2) −p2 (n/2 −k/2 −p/3) + kp(k/2 −n))
e
flops, versus 2(n + p)2 (m − (n + p)/3) for the Householder QR factorization of A.
e is required, it can be computed with the following algorithm.
If Q
Algorithm 2.22 Given matrices C and S from Algorithm 2.21 this algorithm
e ∈ Rm×m such that A
forms an orthogonal matrix Q e=Q e R,
e where Ae is the matrix
A = QR with a block of columns inserted in the kth to (k + p − 1)st positions.
for j = 1: p
for i = m: −1: k + j
C(i, j) S(i, j)
Q(1: m, i − 1: i) = Q(1: m, i − 1: i)
−S(i, j) C(i, j)
end
end
e=Q
Q
36
We can improve on this algorithm by including a Level 3 BLAS part by using
a blocked QR factorization of part of Ae before we finish the elimination process
with Givens matrices. That is, for our example:
+ + + + + + + + +
0 + + + + + + + +
0 0 + + + + + + +
0 0 ⊖ + + ⊕ + + +
0 0 ⊖ ⊖ + ⊕ ⊕ + +
0 0 ⊖ ⊖ ⊖ ⊕ ⊕ ⊕ +
0 0 ⊖ ⊖ ⊖ 0 ⊕ ⊕ ⊕ ,
0 0 ⊙ ⊖ ⊖ 0 0 ⊕ ⊕
0 0 ⊙ ⊙ ⊖ 0 0 0 ⊕
0 0 ⊙ ⊙ ⊙ 0 0 0 0
0 0 ⊙ ⊙ ⊙ 0 0 0 0
0 0 ⊙ ⊙ ⊙ 0 0 0 0
we eliminate the elements shown with a ⊙ with a QR factorization of the bottom
6 by 3 block of V and the remainder of the elements can be eliminated with
Givens matrices and are shown with a ⊖. The zero elements that can be filled
in are shown with a ⊕ and the nonzero elements to remain represented with a +
as before.
For the case of k 6= 1, n + 1 and m > n + 1, we have
R11 V12 R12
QT Ae = 0 V22 R23
0 V32 0
where R11 ∈ R(k−1)×(k−1) and R23 ∈ R(n−k+1)×(n−k+1) are upper triangular, then
if V32 has the QR factorization V32 = QV RV ∈ R(m−n)×p we have
R11 V12 R12
In 0 e = 0 V22 R23 .
QT A
0 QTV
0 RV 0
We then eliminate the upper triangular part of RV and the lower triangular part
of V22 with Givens matrices which makes R23 full and the bottom right block
upper trapezoidal. So we have finally
37
Algorithm 2.23 Given A = QR ∈ Rm×n , with m ≥ n, this algorithm computes
QeT A
e = R e ∈ Rm×(n+p) where R e is upper trapezoidal, Q
e is orthogonal and Ae
is A with a block of columns, U ∈ Rm×p , inserted in the kth to (k + p − 1)st
position, 1 ≤ k ≤ n + 1, p ≥ 1, and d˜ such that kAxe − bk2 = kRx ˜ 2 . The
e − dk
˜ + 1: m)k2 , is also computed. The algorithm incorporates a Level 3
residual, kd(n
QR factorization.
U = QT U
d˜ = QT b
if m > n + 1
% Factorize rows n + 1 to m of U if there are more than 1,
% with a Level 3 QR algorithm
U(n + 1: m, 1: p) = QU RU
% Update d˜
˜ + 1: m) = QT d(n
d(n ˜ + 1: m)
U
end
if k ≤ n
% Zero out the rest with Givens
for j = 1: p
% First iteration updates one column
upf irst = n
for i = n + j: −1: j + 1
[C(i, j), S(i, j)] = givens(U(i − 1, j), U(i, j))
% Update U
U(i − 1, j) = C(i, j)U(i − 1, j) − S(i, j)U(i, j)
if j < p
U(i − 1: i, j + 1: p) =
T
C(i, j) S(i, j)
U(i − 1: i, j + 1: p)
−S(i, j) C(i, j)
end
% Update R
R(i − 1: i, upf irst: n) =
T
C(i, j) S(i, j)
R(i − 1: i, upf irst: n)
−S(i, j) C(i, j)
% Update one more column next i step
upf irst = upf irst − 1
% Update d˜
T
˜ C(i, j) S(i, j) ˜ − 1: i)
d(i − 1: i) = d(i
−S(i, j) C(i, j)
end
38
end
end
if k = 1
Re = upper triangular part of [ U R ]
else if k = n + 1
Re = upper triangular part of [ R U ]
else
Re = upper triangular part of [ R(1: m, 1: k − 1) U R(1: m, k: n) ]
end
% Compute the residual
˜ + 1: m)k2
resid = kd(n
Algorithm 2.24 Given matrices QU , C and S and the vector τ from Algo-
rithm 2.23 this algorithm forms an orthogonal matrix Qe ∈ Rm×m such that
Ae= Q eR,
e where A e is the matrix A = QR with a block of columns inserted in
the kth to (k + p − 1)st positions.
if m > n + 1
Q(1: m, 1: m − n) = Q(1: m, 1: m − n)QU
end
if k ≤ n
for j = 1: p
for i = n + j: −1: j + 1
C(i, j) S(i, j)
Q(1: m, i − 1: i) = Q(1: m, i − 1: i)
−S(i, j) C(i, j)
end
end
end
e=Q
Q
• We introduce jstop which is the last index in the outer for loop. There may
be a situation where there are not elements to eliminate over the full width
39
e for m = 5, n = 6, k = 3
of U. For example QT A, and p = 3, looks like:
+ + + + + + + + +
0 + + + + + + + +
0 0 + + + + + + + ,
0 0 ⊖ + + ⊕ + + +
0 0 ⊖ ⊖ + ⊕ ⊕ + +
• The first column to be updated for jth step may no longer be n, so upf irst
is set accordingly.
U = QT U
if m > n + 1
% Factorize rows n + 1 to m of U if there are more than 1,
% with a Level 3 QR algorithm
U(n + 1: m, 1: p) = QU RU
end
if k ≤ n
% Zero out the rest with Givens, stop at the last column of
% U or the last row if that is reached first
jstop = min(p, m − k − 2)
for j = 1: jstop
% Start at first row to be eliminated in current column
istart = min(n + j, m)
% Index of first nonzero column in update of R
upf irst = max(istart − j − 1, 1)
for i = istart: −1: j + 1
[C(i, j), S(i, j)] = givens(U(i − 1, j), U(i, j))
40
% Update U
U(i − 1, j) = C(i, j)U(i − 1, j) − S(i, j)U(i, j)
if j < p
% Update U
U(i − 1: i, j + 1: p) =
T
C(i, j) S(i, j)
U(i − 1: i, j + 1: p)
−S(i, j) C(i, j)
end
% Update R
R(i − 1: i, upf irst: n) =
T
C(i, j) S(i, j)
R(i − 1: i, upf irst: n)
−S(i, j) C(i, j)
% Update one more column next i step
upf irst = upf irst − 1
end
end
end
if k = 1
Re = upper triangular part of [ U R ]
else if k = n + 1
Re = upper triangular part of [ R U ]
else
Re = upper triangular part of [ R(1: m, 1: k − 1) U R(1: m, k: n) ]
end
Algorithm 2.26 Given matrices QU , C and S and the vector τ from Algo-
rithm 2.25 this algorithm forms an orthogonal matrix Qe ∈ Rm×m such that
Ae= Q eR,
e where A e is the matrix A = QR with a block of columns inserted in
the kth to (k + p − 1)st positions.
if m > n + 1
Q(1: m, n + 1: m) = Q(1: m, n + 1: m)QU
end
if k ≤ n
jstop = min(p, m − k − 2)
for j = 1: jstop
istart = min(n + j, m)
for i = istart: −1: j + 1
41
C(i, j) S(i, j)
Q(1: m, i − 1: i) = Q(1: m, i − 1: i)
−S(i, j) C(i, j)
end
end
end
e=Q
Q
See Appendix 6.5 for Fortran codes addcols.f and addcolsq.f for updating
R and Q respectively.
3 Error Analysis
It is well known that orthogonal transformations are stable. We have the following
columnwise results [10], where
cku
γ̃k = ,
1 − cku
and u is the unit roundoff and c is a small integer constant.
B = Gr . . . G1 A = QT A ∈ Rn×n
b satisfies
where Gi is a Givens matrix, then the computed matrix B
b = ∆B,
QT A − B k∆bj k2 ≤ γ̃r kaj k2 , j = 1: n. (3.1)
B = Hr . . . H1 A = QT A ∈ Rn×n
b satisfies
where Hi is a Householder matrix, then the computed matrix B
b = ∆B,
QT A − B k∆bj k2 ≤ γ̃nr kaj k2 , j = 1: n. (3.2)
This result implies that Householder transformations are less accurate by a factor
of n, but this is not observed in practice. We then have
42
Theorem 3.1 (Householder QR Factorization) If
R = QT A
b satisfies
where Q is a product of Householder matrices, then the computed factor R
b = ∆R,
QT A − R k∆rj k2 ≤ γ̃mn kaj k2 , j = 1: n.
e by our algorithms.
We now give results for computing the factor R
be
and from (3.1) we have for the computed quantities R and v̂
T
be
v (j)
e
R = R + ∆R,
k∆rj k2 ≤ γ̃mp
, j = 1: n.
r̂(1: n, j)
2
b , 1: k − 1) R(:
Hn−p . . . Hk [ R(: b , k + p: n) ] = R,
e
43
and from (3.2) we have
be e+ 0
R =R , ∆R ∈ R(m−k+1)×n
∆R
k∆rj k2 = 0, j = 1: k − 1,
≤ γ̃(n−k−p+1)(n−k+1) kr̂(k: n, j)k2 , j = k: n − p.
k∆Hj k2 = 0, k − 1 ≥ j ≥ k + p,
≤ γ̃(n−k)p kVb (n + 1: m, j)k2 , j = k: k + p − 1,
k∆Gj k2 = 0,
j = 1: k − 1,
T 0
≤ γ̃(n−k)n
b
(Q V )(k: m, j) + ∆r̂j
, j = k: n + p.
2
Given these results we expect the normwise backward error
e− Q
kA eRk
e 2
,
e 2
kAk
e and R
when Q e are computed with our algorithms to be close to that with Q
e and
e computed directly from A.
R e We consider some examples in the next section.
4 Numerical Experiments
44
e is overwritten
the QR factorization of a matrix. The input matrix, in this case A,
e and Q
with R, e is returned in factored form in the same way as our codes do.
The tests were performed on a 1400MHz AMD Athlon running Red Hat Linux
version 6.2 with kernel 2.2.22. The unit roundoff u ≈ 1.1e-16.
We tested our code with
m = {1000, 2000, 3000, 4000, 5000}
and n = 0.3m, and the number of columns added or deleted was p = 100. We
generated our test matrices by populating an array with random double precision
numbers generated with the LAPACK auxiliary routine DLARAN. A = QR was
computed with DGEQRF, and A e was formed appropriately.
e the starting point for computing R,
We timed our codes acting on QT A, e and in
T
the case of adding columns we included the computation of Q U in our timings,
which we formed with the BLAS routine DGEMM. We also timed DGEQRF acting
on only the part of QT A e that needs to be updated, the nonzero part from row
and column k onwards. Here we can construct R e with this computation and the
original R. Finally, we compute DGEQRF acting on A. e We aim to show our codes
are faster than these alternatives. In all cases an average of three timings are
given.
To test our code DELCOLS we first chose k = 1, the position of the first
column deleted, where the maximum amount of work is required to update the
factorization. We have
e = A(1: m, p + 1: n),
A e = R(1: m, p + 1: n)
and QT A
and timed:
e
• DGEQRF on A.
e
• DGEQRF on (QT A)(k: n, k: n − p) which computes the nonzero entries of
e
R(k: m, p + 1: n).
e
• DELCOLS on QT A.
The results are given in Figure 1. Our code is clearly much faster than recom-
puting the factorization from scratch with DGEQRF, and for n = 5000 there is a
e
speedup of 20. Our code is also faster than using DGEQRF on (QT A)(k: n, k: n−p),
where there is a maximum speedup of over 3.
We then tested for k = n/2 where much less work is required to perform the
updating, we have
e = [ A(1: m, 1: k − 1) A(1: m, k + p: n) ] ,
A and
e = [ R(1: m, 1: k − 1) R(1: m, k + p: n) ]
QT A
45
45
DGEQRF on Ae
e
DGEQRF on (QT A)(k: n, k: n − p)
40
DELCOLS on QT Ae
35
30
Times (secs)
25
20
15
10
0
1000 2000 3000 4000 5000
Number of rows, m, of matrix
and timed:
e
• DGEQRF on (QT A)(k: n, k: n − p) which computes the nonzero entries of
e
R(k: m, k: n − p).
e
• DELCOLS on QT A.
The results are given in Figure 2. The timings for DGEQFR on A e would,
of course, be the same as for k = 1, giving a maximum speedup of over 100
in this case. We achieve a speedup of approximately 3 over using DGEQRF on
e
(QT A)(k: n, k: n − p).
We then considered the effect of varying p with DELCOLS for fixed m = 3000,
n = 1000 and k = 1. As we delete more columns from A there are less columns
to update, but more work is required for each one. We chose
and timed:
e
• DGEQRF on A
46
0.9
e
DGEQRF on (QT A)(k: n, k: n − p)
T
DELCOLS on Q A e
0.8
0.7
0.6
Times (secs)
0.5
0.4
0.3
0.2
0.1
0
1000 2000 3000 4000 5000
Number of rows, m, of matrix
e
• DGEQRF on (QT A)(k: n, k: n − p) which computes the nonzero entries of
e
R(k: m, k: n − p).
e
• DELCOLS on QT A.
The results are given in Figure 3. The timings for DELCOLS are relatively level
and peak at p = 300, whereas the timings for the other codes obviously decrease
with p. The speedup of our code decreases with p, and from p = 300 there is
e
little difference between our code and DGEQRF on (QT A)(k: n, k: n − p).
To test ADDCOLS we generated random matrices A ∈ Rm×n and U ∈ Rm×p ,
and again use
m = {1000, 2000, 3000, 4000, 5000}
n = 0.3m, and p = 100. We first set k = 1 where maximum updating is required.
We have
Ae = [ U A ] , and QT A e = [ QT U R ]
and timed:
e
• DGEQRF on A.
47
12
DGEQRF on Ae
e
DGEQRF on (QT A)(k: n, k: n − p)
T
DELCOLS on Q A e
10
8
Times (secs)
0
100 200 300 400 500 600 700 800
Number of columns, p , deleted from matrix
The results are given in Figure 4. Here our code achieves a speedup of over 3
for m = 5000 over the complete factorization of A.e
We then tested for k = n/2, where less work is required to do the updating.
We have
and timed:
e as above.
• DGEQRF on A,
e
• DGEQRF on (QT A)(k: e m, k: n+p), including
m, k: n+p) which computes R(k:
the computation of QT U for which we again use DGEMM.
e including the computation of QT U.
• ADDCOLS on QT A,
48
60
e
DGEQRF on A
e
ADDCOLS on QT A
50
40
Times (secs)
30
20
10
0
1000 2000 3000 4000 5000
Number of rows, m, of matrix
49
60
DGEQRF on Ae
e
DGEQRF on (QT A)(k: m, k: n + p)
e
DELCOLS on QT A
50
40
Times (secs)
30
20
10
0
1000 2000 3000 4000 5000
Number of rows, m, of matrix
• Next, for
e = [ A1
A A2 ] ,
we form
T
Q(0) Ae = [ R1 R2 ] ,
e form-
and call DELCOLS and DELCOLSQ to update the QR factorization of A,
ing
e=Q
A eRe = [R
e1 R e2 ] ,
e1 ∈ Rm×(k−1) , R
where R e2 ∈ Rm×(n−k−p+1) .
50
e and R.
• We now compute the QR factorization of A(0) by updating Q e We
call ADDCOLS on
[Re1 Q eT U R e2 ]
to form R(1) and then call ADDCOLSQ to form Q(1) , so we have, in exact
arithmetic
A(0) = Q(1) R(1) .
We then repeat this rep times and measure the normwise backward error
m = 500
n = {400, 500, 600}
p = {50, 100, 150}
k = {1, 51, . . . , n − p + 1}.
The results are given in Table 1 and Table 2. The error increases with the number
of repeats which is expected. However, the value is not effected significantly by
the value of kUkF .
The smallest value of the error in every case was approximately of order 10u.
The worse case was still only of order 2 ∗ rep ∗ u.
rep 5 50 500
Smallest error over all tests 1.146e-15 1.212e-15 1.223e-15
Largest error over all tests 5.031e-15 2.399e-14 1.252e-13
51
Table 2: Normwise backward error for kUkF order 1e+9.
rep 5 50 500
Smallest error over all tests 8.298e-16 9.309e-16 9.576e-16
Largest error over all tests 4.381e-15 2.055e-14 1.014e-13
5 Conclusions
The speed tests show that our updating algorithms are faster than computing
the QR factorization from scratch or using the factorization to update columns
k onward, the only columns needing updating.
Furthermore, the normwise backward error tests show that the errors are
within the bound for computing the Householder QR factorization of A. e Thus,
within the parameters of our experiments, the increase of speed is not at the
detriment of accuracy.
We propose the double precision Fortran 77 codes delcols.f, delcolsq.f,
addcols.f and addcolsq.f, and their single precision and complex equivalents,
be included in LAPACK.
6 Software Available
Here we list some software that is available to update the QR factorization and
least squares problem. An ’x’ in a routine indicates more than one routine for
different precisions or for real or complex data.
6.1 LINPACK
LINPACK [7] has three routines that update the least squares problem and the
QR factorization.
• xCHUD updates the least squares problem when a row has been added in the
(m + 1)st position.
• xCHDD updates the least squares problem when a row has been deleted from
the mth position, an implementation of Saunder’s algorithm.
52
• xCHEX update the least squares problem when the rows of A have been
permuted.
In all cases the transformation matrices are represented by a vectors of sines
and cosines, and Q e is not constructed.
6.2 MATLAB
MATLAB [11] supply three routines for updating the QR factorization only.
• qrdelete updates when one row or column is deleted from any position.
e = A + uv T ,
A u ∈ Rm , v ∈ Rn .
e and R
In all cases both Q e are returned.
e1 ,
αuv T + R1 = QR
where e
R1 e R1 e1 ∈ Rn×n ,
R= , R= , R1 , R
0 0
and
e = QQ.
Q
Q is represented by vectors of sines and cosines.
e and Q
where R, R e are as above.
53
6.4 Reichel and Gragg’s Algorithms
Reichel and Gragg [16] provide several Fortran 77 implementations of the algo-
rithms discussed in [6] for updating the QR factorization, returning both Q e and
e In all cases only m ≥ n is handled. The routines use BLAS like routines for
R.
matrix and vector operations written for optimal performance on the test ma-
chine used in [16]. No error results are given for the Fortran routines, although
some results are given for the Algol implementations in [6].
• DDELR updates after one row is deleted; this algorithm varies from ours and
uses a Gram-Schmidt re-orthogonalization process.
• DINSR updates when one row is added, and is similar to our algorithm.
• DDELC updates when one column is deleted, and is similar to our algorithm.
• DINSC updates after one column is added; this algorithm varies from ours
and again uses a Gram-Schmidt re-orthogonalization process.
54
e
In these Fortran files the dimension n refers to the number of columns in A,
and not A.
A delcols.f
SUBROUTINE DELCOLS( M, N, A, LDA, K, P, TAU, WORK, INFO )
*
* Craig Lucas, University of Manchester
* March, 2004
*
* .. Scalar Arguments ..
INTEGER INFO, K, LDA, M, N, P
* ..
* .. Array Arguments ..
DOUBLE PRECISION A( LDA, * ), TAU( * ), WORK( * )
* ..
*
* Purpose
* =======
*
* Given a real m by (n+p) matrix, B, and the QR factorization
* B = Q_B * R_B, DELCOLS computes the QR factorization
* C = Q * R where C is the matrix B with p columns deleted
* from the kth column onwards.
*
* The input to this routine is Q_B’ * C
*
* Arguments
* =========
*
* M (input) INTEGER
* The number of rows of the matrix C. M >= 0.
*
* N (input) INTEGER
* The number of columns of the matrix C. N >= 0.
*
* A (input/output) DOUBLE PRECISION array, dimension (LDA,N)
* On entry, the matrix Q_B’ * C. The elements in columns
* 1:K-1 are not referenced.
*
* On exit, the elements on and above the diagonal contain
* the n by n upper triangular part of the matrix R. The
* elements below the diagonal in columns k:n, together with
* TAU represent the orthogonal matrix Q as a product of
55
* elementary reflectors (see Further Details).
*
* LDA (input) INTEGER
* The leading dimension of the array A. LDA >= max(1,M).
*
* K (input) INTEGER
* The position of the first column deleted from B.
* 0 < K <= N+P.
*
* P (input) INTEGER
* The number of columns deleted from B. P > 0.
*
* TAU (output) DOUBLE PRECISION array, dimension(N-K+1)
* The scalar factors of the elementary reflectors
* (see Further Details).
*
* WORK DOUBLE PRECISION array, dimension (P+1)
* Work space.
*
* INFO (output) INTEGER
* = 0: successful exit
* < 0: if INFO = -I, the I-th argument had an illegal value.
*
* Further Details
* ===============
*
* The matrix Q is represented as a product of Q_B and elementary
* reflectors
*
* Q = Q_B * H(k) * H(k+1) *...* H(last), last = min( m-1, n ).
*
* Each H(j) has the form
*
* H(j) = I - tau*v*v’
*
* where tau is a real scalar, and v is a real vector with
* v(1:j-1) = 0, v(j) = 1, v(j+1:j+lenh-1), lenh = min( p+1, m-j+1 ),
* stored on exit in A(j+1:j+lenh-1,j) and v(j+lenh:m) = 0, tau is
* stored in TAU(j).
*
* The matrix Q can be formed with DELCOLSQ
*
* =====================================================================
*
* .. Parameters ..
56
DOUBLE PRECISION ONE
PARAMETER ( ONE = 1.0D+0 )
* ..
* .. Local Scalars ..
DOUBLE PRECISION AJJ
INTEGER J, LAST, LENH
* ..
* .. External Subroutines ..
EXTERNAL DLARF, DLARFG, XERBLA
* ..
* .. Intrinsic Functions ..
INTRINSIC MAX, MIN
* ..
*
* Test the input parameters.
*
INFO = 0
IF( M.LT.0 ) THEN
INFO = -1
ELSE IF( N.LT.0 ) THEN
INFO = -2
ELSE IF( LDA.LT.MAX( 1, M ) ) THEN
INFO = -4
ELSE IF( K.GT.N+P .OR. K.LE.0 ) THEN
INFO = -5
ELSE IF( P.LE.0 ) THEN
INFO = -6
END IF
IF( INFO.NE.0 ) THEN
CALL XERBLA( ’DELCOLS’, -INFO )
RETURN
END IF
*
LAST = MIN( M-1, N )
*
DO 10 J = K, LAST
*
* Generate elementary reflector H(J) to annihilate the nonzero
* entries below A(J,J)
*
LENH = MIN( P+1, M-J+1 )
CALL DLARFG( LENH, A( J, J ), A( J+1, J ), 1, TAU( J-K+1 ) )
*
IF( J.LT.N ) THEN
*
57
* Apply H(J) to trailing matrix from left
*
AJJ = A( J, J )
A( J, J ) = ONE
CALL DLARF( ’L’, LENH, N-J, A( J, J ), 1, TAU( J-K+1 ),
$ A( J, J+1 ), LDA, WORK )
A( J, J ) = AJJ
*
END IF
*
10 CONTINUE
*
RETURN
*
* End of DELCOLS
*
END
58
B delcolsq.f
SUBROUTINE DELCOLSQ( M, N, A, LDA, Q, LDQ, K, P, TAU, WORK, INFO )
*
* Craig Lucas, University of Manchester
* March, 2004
*
* .. Scalar Arguments ..
INTEGER INFO, K, LDA, LDQ, M, N, P
* ..
* .. Array Arguments ..
DOUBLE PRECISION A( LDA, * ), Q( LDQ, * ), TAU( * ), WORK( * )
* ..
*
* Purpose
* =======
*
* DELCOLSQ generates an m by m real matrix Q with orthogonal columns,
* which is defined as the product of Q_B and elementary reflectors
*
* Q = Q_B * H(k) * H(k+1) *...* H(last), last = min( m-1, n ) .
*
* where the H(j) are as returned by DELCOLSQ, such that C = Q * R and
* C is the matrix B = Q_B * R_B, with p columns deleted from the
* kth column onwards.
*
* Arguments
* =========
*
* M (input) INTEGER
* The number of rows of the matrix A. M >= 0.
*
* N (input) INTEGER
* The number of columns of the matrix A. N >= 0.
*
* A (input) DOUBLE PRECISION array, dimension (LDA,N)
* On entry, the elements below the diagonal in columns k:n
* must contain the vector which defines the elementary
* reflector H(J) as returned by DELCOLS.
*
* LDA (input) INTEGER
* The leading dimension of the array A. LDA >= max(1,M).
*
* Q (input/output) DOUBLE PRECISION array, dimension (LDA,N)
* On entry, the matrix Q_B.
59
* On exit, the matrix Q.
*
* LDQ (input) INTEGER
* The leading dimension of the array Q. LDQ >= M.
*
* K (input) INTEGER
* The position of the first column deleted from B.
* 0 < K <= N+P.
*
* P (input) INTEGER
* The number of columns deleted from B. P > 0.
*
* TAU (input) DOUBLE PRECISION array, dimension(N-K+1)
* TAU(J) must contain the scalar factor of the elementary
* reflector H(J), as returned by DELCOLS.
*
* WORK DOUBLE PRECISION array, dimension (P+1)
* Work space.
*
* INFO (output) INTEGER
* = 0: successful exit
* < 0: if INFO = -I, the I-th argument had an illegal value.
*
* =====================================================================
*
* .. Parameters ..
DOUBLE PRECISION ONE
PARAMETER ( ONE = 1.0D+0 )
* ..
* .. Local Scalars ..
DOUBLE PRECISION AJJ
INTEGER J, LAST, LENH
* ..
* .. External Subroutines ..
EXTERNAL DLARF, XERBLA
* ..
* .. Intrinsic Functions ..
INTRINSIC MAX, MIN
* ..
*
* Test the input parameters.
*
INFO = 0
IF( M.LT.0 ) THEN
INFO = -1
60
ELSE IF( N.LT.0 ) THEN
INFO = -2
ELSE IF( LDA.LT.MAX( 1, M ) ) THEN
INFO = -4
ELSE IF( K.GT.N+P .OR. K.LE.0 ) THEN
INFO = -5
ELSE IF( P.LE.0 ) THEN
INFO = -6
END IF
IF( INFO.NE.0 ) THEN
CALL XERBLA( ’DELCOLSQ’, -INFO )
RETURN
END IF
*
LAST = MIN( M-1, N )
*
DO 10 J = K, LAST
*
LENH = MIN( P+1, M-J+1 )
*
* Apply H(J) from right
*
AJJ = A( J, J )
A( J, J ) = ONE
*
CALL DLARF( ’R’, M, LENH, A( J, J ), 1, TAU( J-K+1 ),
$ Q( 1, J ), LDQ, WORK )
*
A( J, J ) = AJJ
*
10 CONTINUE
*
RETURN
*
* End of DELCOLSQ
*
END
61
C addcols.f
SUBROUTINE ADDCOLS( M, N, A, LDA, K, P, TAU, WORK, LWORK, INFO )
*
* Craig Lucas, University of Manchester
* March, 2004
*
* .. Scalar Arguments ..
INTEGER INFO, K, LDA, LWORK, M, N, P
* ..
* .. Array Arguments ..
DOUBLE PRECISION A( LDA, * ), TAU( * ), WORK( * )
* ..
*
* Purpose
* =======
*
* Given a real m by (n-p) matrix, B, and the QR factorization
* B = Q_B * R_B, ADDCOLS computes the QR factorization
* C = Q * R where C is the matrix B with p columns added
* in the kth column onwards.
*
* The input to this routine is Q_B’ * C
*
* Arguments
* =========
*
* M (input) INTEGER
* The number of rows of the matrix C. M >= 0.
*
* N (input) INTEGER
* The number of columns of the matrix C. N >= 0.
*
* A (input/output) DOUBLE PRECISION array, dimension (LDA,N)
* On entry, the matrix Q_B’ * C. The elements in columns
* 1:K-1 are not referenced.
*
* On exit, the elements on and above the diagonal contain
* the n by n upper triangular part of the matrix R. The
* elements below the diagonal in columns K:N, together with
* TAU represent the orthogonal matrix Q as a product of
* elementary reflectors and Givens rotations.
* (see Further Details).
*
* LDA (input) INTEGER
62
* The leading dimension of the array A. LDA >= max(1,M).
*
* K (input) INTEGER
* The position of the first column added to B.
* 0 < K <= N-P+1.
*
* P (input) INTEGER
* The number of columns added to B. P > 0.
*
* TAU (output) DOUBLE PRECISION array, dimension(P)
* The scalar factors of the elementary reflectors
* (see Further Details).
*
* WORK (workspace) DOUBLE PRECISION array, dimension ( LWORK )
* Work space.
*
* LWORK (input) INTEGER
* The dimension of the array WORK. LWORK >= P.
* For optimal performance LWORK >= P*NB, where NB is the
* optimal block size.
*
* INFO (output) INTEGER
* = 0: successful exit
* < 0: if INFO = -I, the I-th argument had an illegal value.
*
* Further Details
* ===============
*
* The matrix Q is represented as a product of Q_B, elementary
* reflectors and Givens rotations
*
* Q = Q_B * H(k) * H(k+1) *...* H(k+p-1) * G(k+p-1,k+p) *...
* *G(k,k+1) * G(k+p,k+p+1) *...* G(k+2p-2,k+2p-1)
*
* Each H(j) has the form
*
* H(j) = I - tau*v*v’
*
* where tau is a real scalar, and v is a real vector with
* v(1:n-p-j+1) = 0, v(j) = 1, and v(j+1:m) stored on exit in
* A(j+1:m,j), tau is stored in TAU(j).
*
* Each G(i,j) has the form
*
* i-1 i
63
* [ I ]
* [ c -s ] i-1
* G(i,j) = [ s c ] i
* [ I ]
*
* and zero A(i,j), where c and s are encoded in scalar and
* stored in A(i,j) and
*
* IF A(i,j) = 1, c = 0, s = 1
* ELSE IF | A(i,j) | < 1, s = A(i,j), c = sqrt(1-s**2)
* ELSE c = 1 / A(i,j), s = sqrt(1-c**2)
*
* The matrix Q can be formed with ADDCOLSQ
*
* =====================================================================
*
* .. Local Scalars ..
DOUBLE PRECISION C, S
INTEGER I, INC, ISTART, J, JSTOP, UPLEN
* ..
* .. External Subroutines ..
EXTERNAL DGEQRF, DLASR, DROT, DROTG, XERBLA
* ..
* .. Intrinsic Functions ..
INTRINSIC MAX, MIN
* ..
*
* Test the input parameters.
*
INFO = 0
IF( M.LT.0 ) THEN
INFO = -1
ELSE IF( N.LT.0 ) THEN
INFO = -2
ELSE IF( LDA.LT.MAX( 1, M ) ) THEN
INFO = -4
ELSE IF( K.GT.N-P+1 .OR. K.LE.0 ) THEN
INFO = -5
ELSE IF( P.LE.0 ) THEN
INFO = -6
END IF
IF( INFO.NE.0 ) THEN
CALL XERBLA( ’ADDCOLS’, -INFO )
RETURN
END IF
64
*
* Do a QR factorization on rows below N-P, if there is more than one
*
IF( M.GT.N-P+1 ) THEN
*
* Level 3 QR factorization
*
CALL DGEQRF( M-N+P, P, A( N-P+1, K ), LDA, TAU, WORK, LWORK,
$ INFO )
*
END IF
*
* If K not equal to number of columns in B and not <= M-1 then
* there is some elimination by Givens to do
*
IF( K+P-1.NE.N .AND. K.LE.M-1) THEN
*
* Zero out the rest with Givens
* Allow for M < N
*
JSTOP = MIN( P+K-1, M-1 )
DO 20 J = K, JSTOP
*
* Allow for M < N
*
ISTART = MIN( N-P+J-K+1, M )
UPLEN = N - K - P - ISTART + J + 1
*
INC = ISTART - J
*
DO 10 I = ISTART, J + 1, -1
*
* Recall DROTG updates A( I-1, J ) and
* stores C and S encoded as scalar in A( I, J )
*
CALL DROTG( A( I-1, J ), A( I, J ), C, S )
WORK( INC ) = C
WORK( N+INC ) = S
*
* Update nonzero rows of R
* Do the next two line this way round because
* A( I-1, N-UPLEN+1 ) gets updated
*
A( I, N-UPLEN ) = -S*A( I-1, N-UPLEN )
A( I-1, N-UPLEN ) = C*A( I-1, N-UPLEN )
65
*
CALL DROT( UPLEN, A( I-1, N-UPLEN+1 ), LDA,
$ A( I, N-UPLEN+1 ), LDA, C, S )
*
UPLEN = UPLEN + 1
INC = INC - 1
*
10 CONTINUE
*
* Update inserted columns in one go
* Max number of rotations is N-1, we’ve allowed N
*
IF( J.LT.P+K-1 ) THEN
*
66
D addcolsq.f
SUBROUTINE ADDCOLSQ( M, N, A, LDA, Q, LDQ, K, P, TAU, WORK, INFO)
*
* Craig Lucas, University of Manchester
* March, 2004
*
* .. Scalar Arguments ..
INTEGER INFO, K, LDA, LDQ, M, N, P
* ..
* .. Array Arguments ..
DOUBLE PRECISION A( LDA, * ), Q( LDQ, * ), TAU( * ), WORK( * )
* ..
*
* Purpose
* =======
*
* ADDCOLSQ generates an m by m real matrix Q with orthogonal columns,
* which is defined as the product of Q_B, elementary reflectors and
* Givens rotations
*
* Q = Q_B * H(k) * H(k+1) *...* H(k+p-1) * G(k+p-1,k+p) *...
* *G(k,k+1) * G(k+p,k+p+1) *...* G(k+2p-2,k+2p-1)
*
* where the H(j) and G(i,j) are as returned by ADDCOLS, such that
* C = Q * R and C is the matrix B = Q_B * R_B, with p columns added
* from the kth column onwards.
*
* Arguments
* =========
*
* M (input) INTEGER
* The number of rows of the matrix A. M >= 0.
*
* N (input) INTEGER
* The number of columns of the matrix A. N >= 0.
*
* A (input) DOUBLE PRECISION array, dimension (LDA,N)
* On entry, the elements below the diagonal in columns
* K:K+P-1 (if M > M-P+1) must contain the vector which defines
* the elementary reflector H(J). The elements above these
* vectors and below the diagonal store the scalars such that
* the Givens rotations can be constructed, as returned by
* ADDCOLS.
*
67
* LDA (input) INTEGER
* The leading dimension of the array A. LDA >= max(1,M).
*
* Q (input/output) DOUBLE PRECISION array, dimension (LDA,N)
* On entry, the matrix Q_B.
* On exit, the matrix Q.
*
* LDQ (input) INTEGER
* The leading dimension of the array Q. LDQ >= M.
*
* K (input) INTEGER
* The postion of first column added to B.
* 0 < K <= N-P+1.
*
* P (input) INTEGER
* The number columns added. P > 0.
*
* TAU (output) DOUBLE PRECISION array, dimension(N-K+1)
* The scalar factors of the elementary reflectors.
*
* WORK (workspace) DOUBLE PRECISION array, dimension (2*N)
* Work space.
*
* INFO (output) INTEGER
* = 0: successful exit
* < 0: if INFO = -I, the I-th argument had an illegal value
*
* =====================================================================
*
* .. Parameters ..
DOUBLE PRECISION ONE, ZERO
PARAMETER ( ONE = 1.0D+0, ZERO = 0.0D+0 )
* ..
* .. Local Scalars ..
DOUBLE PRECISION DTEMP
INTEGER COL, I, INC, ISTART, J, JSTOP
* ..
* .. External Subroutines ..
EXTERNAL DLARF, DLASR, XERBLA
* ..
* .. Intrinsic Functions ..
INTRINSIC ABS, MAX, MIN, SQRT
*
* Test the input parameters.
*
68
INFO = 0
IF( M.LT.0 ) THEN
INFO = -1
ELSE IF( N.LT.0 ) THEN
INFO = -2
ELSE IF( LDA.LT.MAX( 1, M ) ) THEN
INFO = -4
ELSE IF( K.GT.N-P+1 .OR. K.LE.0 ) THEN
INFO = -5
ELSE IF( P.LE.0 ) THEN
INFO = -6
END IF
IF( INFO.NE.0 ) THEN
CALL XERBLA( ’ADDCLQ’, -INFO )
RETURN
END IF
*
* We did a QR factorization on rows below N-P+1
*
IF( M.GT.N-P+1 ) THEN
*
COL = N - P + 1
DO 10 J = K, K + P - 1
*
DTEMP = A( COL, J )
A( COL, J ) = ONE
*
* If N+P > M-N we have only factored the first M-N columns.
*
IF( M-COL+1.LE.0 )
$ GO TO 10
CALL DLARF( ’R’, M, M-COL+1, A( COL, J ), 1, TAU( J-K+1 ),
$ Q( 1, COL ), LDQ, WORK )
*
A( COL, J ) = DTEMP
COL = COL + 1
*
10 CONTINUE
END IF
*
* If K not equal to number of columns in B then there was
* some elimination by Givens
*
IF( K+P-1.LT.N .AND. K.LE.M-1 ) THEN
*
69
* Allow for M < N, i.e DO P wide unless hit the bottom first
*
JSTOP = MIN( P+K-1, M-1 )
DO 30 J = K, JSTOP
*
ISTART = MIN( N-P+J-K+1, M )
INC = ISTART - J
*
* Compute vectors of C and S for rotations
*
DO 20 I = ISTART, J + 1, -1
*
IF( A( I, J ).EQ.ONE ) THEN
WORK( INC ) = ZERO
WORK( N+INC ) = ONE
ELSE IF( ABS( A( I, J ) ).LT.ONE ) THEN
WORK( N+INC ) = A( I, J )
WORK( INC ) = SQRT( ( 1-A( I, J )**2 ) )
ELSE
WORK( INC ) = ONE / A( I, J )
WORK( N+INC ) = SQRT( ( 1-WORK( INC )**2 ) )
END IF
INC = INC - 1
20 CONTINUE
*
* Apply rotations to the Jth column from the right
*
CALL DLASR( ’R’, ’V’, ’b’, M, ISTART-I+1, WORK( 1 ),
$ WORK( N+1 ), Q( 1, I ), LDQ )
*
30 CONTINUE
*
END IF
RETURN
*
* End of ADDCOLS
*
END
70
References
[1] Å. Björck. Numerical Methods for Least Squares Problems. SIAM, Philadel-
phia, PA, USA, 1996.
[2] Å. Björck, L. Eldén, and H. Park. Accurate downdating of least squares
solutions. SIAM J. Matrix Anal. Appl., 15:549–568, 1994.
[3] A. W. Bojanczyk, R. P. Brent, P. Van Dooren, and F. R. De Hoog. A note
on downdating the Cholesky factorization. SIAM J. Sci. Stat. Comput., 8
(3):210–221, 1987.
[4] Adam Bojanczyk, Nicholas J. Higham, and Harikrishna Patel. Solving the
indefinite least squares problem by hyperbolic QR factorization. SIAM J.
Matrix Anal. Appl., 24(4):914–931, 2003.
[5] J. M. Chambers. Regression updating. Journal of the American Statistical
Association, 66(336):744–748, 1971.
[6] J. W. Daniel, W. B. Gragg, L. Kaufman, and G. W. Stewart. Reorthogonal-
ization and stable algorithms for updating the Gram–Schmidt QR factoriza-
tion. Mathematics of Computation, 30(136):772–795, 1976.
[7] J. J. Dongarra, C. B. Moler, J. R. Bunch, and G. W. Stewart. LINPACK
Users’ Guide. Society for Industrial and Applied Mathematics, Philadelphia,
PA, USA, 1979.
[8] L. Eldén and H. Park. Block downdating of least squares solutions. SIAM
J. Matrix Anal. Appl., 15:1018–1034, 1994.
[9] G. H. Golub and C. F. Van Loan. Matrix Computations. Third edition, The
Johns Hopkins University Press, Baltimore, MD, USA, 1996.
[10] Nicholas J. Higham. Accuracy and Stability of Numerical Algorithms. Second
edition, Society for Industrial and Applied Mathematics, Philadelphia, PA,
USA, 2002.
[11] MATLAB, version 6.5. The Mathworks Inc, Natick, MA, USA.
[12] NAG Fortran Library Manual, Mark 20. Numerical Algorithms Group, Ox-
ford, UK.
[13] S. J. Olszanskyj, J. M. Lebak, and A. W. Bojanczyk. Rank-k modificaton
methods for recursive least squares problems. Numerical Algorithms, 7:325–
354, 1994.
[14] B. N. Parlett. Analysis of algorithms for reflections in bisectors. SIAM
Review, 13:197–208, 1971.
71
[15] C. M. Rader and A. O. Steinhardt. Hyperbolic Householder transforms.
SIAM J. Matrix Anal. Appl., 9(2):269–290, 1988.
72