Proxscal 1
Proxscal 1
Notation
The following notation is used throughout this chapter, unless stated otherwise.
For the dimensions of the vectors and matrices are used:
n Number of objects
m Number of sources
p Number of dimensions
s Number of independent variables
h maximum(s, p)
l Length of transformation vector
r Degree of spline
t Number of interior knots for spline
1
2
ˆ
D n × n matrix with transformed proximities for source k
k
− wijk for i ≠ j
n × n matrix with elements {vijk } , where vijk = n
∑
Vk
wilk for i = j
l ≠i
Introduction
The following loss function is minimized by PROXSCAL,
1 m n
σ 2 ≡ ∑∑ wijk dˆijk − dij ( X k ) ,
2
(1.1)
m k =1 i < j
which is the weighted mean squared error between the transformed proximities
and the distances of n object within m sources. The transformation function for the
3
Preliminaries
At the start of the procedure, several preliminary computations are performed to
handle missing weights or proximities, and initialize the raw proximities.
Missings
On input, missing values may occur for both weights and proximities. If a weight
is missing, it is set equal to zero. If a proximity is missing, the corresponding
weight is set equal to zero.
Proximities
Only the upper or lower triangular part (without the diagonal) of the proximity
matrix is needed. In case both triangles are given, the weighted mean of both
triangles is used. Next, the raw proximities are transformed such that similarities
become dissimilarities by multiplying with −1 , taking into account the
conditionality, and setting the smallest dissimilarity equal to zero.
Transformations
For ordinal transformations, the nonmissing proximities are replaced by their
ascending rank numbers, also taking into account the conditionality. For spline
transformations, the spline basis S is computed.
4
Normalization
The proximities are normalized such that the weighted squared proximities equal
the sum of the weights, again, taking into account the conditionality.
Simplex Start
Torgerson Start
The proximities are aggregated over sources, squared, double centered and
multiplied with −0.5 , after which an eigenvalue decomposition is used to
determine the coordinate values, thus
−0.5JD* J = Q 4 T ,
User-Provided Start
The coordinate values provided by the user are used.
where
z ≡ vec ( Z ) ,
∑ ( A k A Tk ⊗ Vk ),
m
1
H≡ (3.2)
m k =1
1 m
t ≡ vec ∑ B ( X k ) X k A kT ,
k =1
m
for which a solution is found as
z = H−t (3.3)
Several special cases exist for which (3.3) can be simplified. First, the weights
matrices Wk may all be equal, or even all equal to one. In these cases H will
simplify, as will the pseudo-inverse of H . Another simplification is concerned
with the different models, reflected in restrictions for the space weights. Equation
6
(3.3) provides a solution for Z , where A k is of full rank. This model is the
generalized Euclidean model, also known as IDIOSCAL (Carroll and Chang,
1972). The weighted Euclidean model, or INDSCAL, restricts A k to be diagonal,
which does simplify H , but not the pseudo-inverse. The identity model requires
A k = I for all k, and does simplify H and its pseudo-inverse, for the kronecker
product ⊗ vanishes.
To avoid computing the pseudo-inverse of a large matrix, PROXSCAL uses
three technical simplifications when appropriate. First, the pseudo-inverse can be
replaced by a proper inverse by adding the nullspace, taking the proper inverse and
then subtracting the nullspace again as
H− = ( H + N) − N
−1
where
1 m
Va = ∑ Vk eTa A k A Tk ea ,
m k =1
where e a is the a-th column of an identity matrix, and
1 m
za = ∑ B(Xk )Xk A Tk − Vk Pa A k ATk ea ,
m k =1
with Pa an n × p matrix equal to Z , but with the a-th column containing zeros.
Still, the proper inverse of a n × n matrix is required. The final simplification is
concerned with a majorization function in which the largest eigenvalue of V
allows for an easy update (Heiser, 1987; Groenen, Heiser, and Meulman, 1999).
Instead of the largest eigenvalue itself, an upper bound is used for this scalar
(Wolkowicz and Styan, 1980).
An update for the space weights A k ( k = 1,..., m ) for the generalized Euclidean
model is given by
A k = ( ZT Vk Z ) (Z B (X ) X ) .
-1 T
k k (3.4)
7
The space weights for the identity model need no update, since A k = I for all k.
Simplifications can be obtained if all weights W are equal to one and for the
reduced rank model, which can be done in r dimensions, as explained in Heiser
and Stoop (1986).
Restrictions
Fixed coordinates
If some of the coordinates of Z are fixed by the user, then only the free
coordinates of Z need to be updated. The dimensionwise approach is taken one
step further, which results in an update for object i on dimension a as
1 1 m 1 p m 1
zia+ = T eiT ∑ B( X k ) X k A Tk e a − ∑ ∑ eTj A k A Tk e a Vk z j − T eiT Va z% ia
ei Va ei m k =1 m j ≠ a k =1 ei Va ei
where the ath column of Z is divided into z a = z% ia + zia ei , with ei the ith column of
1 m T
the identity matrix, and Va = ∑
m k =1
e j A k A Tk e a Vk .
This update procedure will only locally minimize (3.1) and repeatedly cycling
through all free coordinates until convergence is reached, will provide global
optimization. After all free coordinates have been updated, Z is centered on the
origin. On output, the configuration is adapted as to coincide with the initial fixed
coordinates.
Independent variables
Independent variables Q are used to express the coordinates of the common space
Z as a weighted sum of these independent variables as
8
h
Z = QB = ∑ q j b Tj .
j =1
1 m 1 m
2. Tj = C − ∑
m k =1
Vk U j A k A T
k , where C = ∑
m k =1
B ( Xk ) Xk A Tk
−1
1 m T
3. update b j as b j = ∑ q j Vk q j A k A k Tj q j
T
m k =1
4. optionally, compute optimally transformed variables by regressing
1 1 1 m
q% j = Tj b j + I − V j q j , where V j = ∑ b Tj A k A kTb j Vk and
k1 k1 m k =1
k1 is greater than or equal to the largest eigenvalue of V j , on the original
variable q j . Missing elements in the original variable are replaced with the
corresponding values from q% j .
h
Finally, set Z = QB = ∑ q j b Tj .
j =1
Independent variables restrictions were introduced for the MDS model in Bentler
and Weeks (1978), Bloxom (1978), de Leeuw and Heiser (1980) and Meulman
and Heiser (1984). If there are more dimensions (p) than independent variables (s),
p-s dummy variables are created and treated completely free in the analysis. The
transformations for the independent variables from Step 4 are identical to the
transformations of the proximities, except that the nonnegativety constraint does
not apply. After transformation, the variables q are centered on the origin,
normalized on n, and the reverse normalization is applied to the regression weights
b.
9
Transformation Functions
All transformation functions in PROXSCAL result in nonnegative values for the
transformed proximities. After the transformation, the transformed proximities are
normalized and the common space is optimally dilated accordingly. The following
transformations are available.
Ratio
ˆ =
D . No transformation is necessary, since the scale of D̂ is adjusted in the
normalization step.
Interval
Ordinal
Spline
( )
ˆ = Sb . PROXSCAL uses monotone spline transformations (Ramsay,
vec D
1988). In this case, the spline transformation gives a smooth nondecreasing
piecewise polynomial transformation. It is computed as a weighted regression of
D on the spline basis S . Regression weights b are restricted to be nonnegative
and computed using nonnegative alternating least squares (Groenen, van Os and
Meulman, 2000).
Normalization
After transformation, the transformed proximities are normalized such that the
sum-of-squares of the weighted transformed proximities are equal to mn(n-1)/2 in
the unconditional case and equal to n(n-1)/2 in the matrix-conditional case.
Step 4: Termination
After evaluation of the loss function, the old function value and new function
values are used to decide whether iterations should continue. If the new function
value is smaller than or equal to the minimum Stress value MINSTRESS, provided
by the user, iterations are terminated. Also, if the difference in consecutive Stress
values is smaller than or equal to the convergence criterion DIFFSTRESS,
provided by the user, iterations are terminated. Finally, iterations are terminated if
the current number of iterations, exceeds the maximum number of iterations
MAXITER, also provided by the user. In all other cases, iterations continue.
Remaining Issues
Acceleration
For the identity model without further restictions, the common space can be
updated with acceleration as Z new = 2Z update − Z old , also refered to as the
relaxed update.
Lowering dimensionality
For a restart in p-1 dimensions, the p-1 most important dimensions need to be
identified. For the identity model, the first p-1 principal axes are used. For the
11
weighted Euclidean model, the p-1 most important space weights are used, and for
the generalized Euclidean and reduced rank models, the p-1 largest singular values
of the space weights determine the remaining dimensions.
Stress measures
The following statistics are used for the computation of the Stress measures:
( )
m n
ˆ = ∑∑ w dˆ 2
η2 D ijk ijk
k =1 i < j
( )
m n
ˆ = ∑∑ w dˆ 4
η4 D ijk ijk
k =1 i < j
m n
η ( X ) = ∑∑ wijk d ij2 ( Xk )
2
k =1 i < j
m n
η 4 ( X ) = ∑∑ wijk d ij4 ( Xk )
k =1 i < j
m n
ρ ( X ) = ∑∑ wijk dˆijk dij ( X k )
k =1 i < j
m n
ρ 2 ( X ) = ∑∑ wijk dˆijk2 d ij2 ( Xk )
k =1 i < j
2
κ 2 ( X ) = ∑∑ wijk ( d ij ( Xk ) − d ( X ) )
m n
k =1 i < j
σ = 2
( )
ˆ + η 2 (α X ) − 2 ρ (α X )
η2 D
α=
ρ ( X)
( )
, with .
η D
ˆ 2
η 2 ( X)
Note that at a local minimum of X , α is equal to one. The other Fit and Stress
measures provided by PROXSCAL are given by:
Stress-I:
( )
ˆ + η 2 (α X ) − 2 ρ (α X )
η2 D
, with α=
( ).
η2 D
ˆ
η (α X )
2
ρ ( X)
12
Stress-II:
( )
ˆ + η 2 (α X ) − 2 ρ (α X )
η2 D
, with α=
( ).
η2 D
ˆ
κ (α X )
2
ρ ( X)
ˆ + η 4 (α X ) − 2 ρ 2 (α X ) , with α 2 = ρ ( X ) .
( )
2
S-Stress: η4 D
η 4 ( X)
Dispersion Accounted For (DAF): 1 − σ .
2
Transformations on output
On output, whenever fixed coordinates or independent variables do not apply, the
models are not unique. In these cases transformations of the common space and
the space weights are in order.
For the identity model, the common space Z is rotated to principal axes. For the
Z = nZ ( diag Z T Z )
−1/ 2
weighted Euclidean model, so that
References
Barlow, R. E., Bortholomew, D. J., Bremner, J. M. and Brunk, H. D. (1972).
Statistical inference under order restrictions. New York: Wiley.
Gower, J. C. (1966). Some distance properties of latent root and vector methods
used in multivariate analysis. Biometrika, 53, 325-338.
Stoop, I., Heiser, W. J. and De Leeuw, J. (1981). How to use SMACOF-IA. Leiden,
The Netherlands: Department of Data Theory, Leiden University.