0% found this document useful (0 votes)

73 views7 pages

17 Random Projections and Orthogonal Matching Pursuit

The document discusses random projections and orthogonal matching pursuit for dimensionality reduction and data recovery. It introduces the Johnson-Lindenstrauss lemma, which states that random projections can preserve distances between points in high-dimensional space when projected to a space with dimension that is logarithmic in the number of points. It then discusses compressed sensing, where sparse data is recovered from random linear measurements using orthogonal matching pursuit, a greedy algorithm that iteratively finds the best column to explain the measurement residuals.

Uploaded by

Arindam

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

73 views7 pages

17 Random Projections and Orthogonal Matching Pursuit

Uploaded by

Arindam

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 7

17 Random Projections and

Orthogonal Matching Pursuit

Again we will consider high-dimensional data P . Now we will consider the uses and effects of randomness.
We will use it to simplify P (put it in a lower dimensional space) and to recover data after random noise has
interfered with it.
The first approach will be through random projections, and we will discuss the Johnson-Lindenstrauss
Lemma, and the very simple algorithm implied by it.
Then to discuss recovery, we need to model the problem a bit more carefully. So we will define the
compressed sensing problem. Then we will discuss the simplest way to recover data orthogonal matching
pursuit. Although this technique does not have the best possible bounds, it is an extremely general approach
that can be used in many areas.

17.1 Random Projections

We again start by considering a point set P ⊂ Rd = {p1 , p2 , . . . , pn } with n points. Often the dimension d
can be quite large, and we want to find patterns that exist in much lower dimensional space. We will explore
a curious phenomenon, that in some sense the “true” dimension of the data depends only on the number of
points, not on the value d (we will see the “true” dimension is about ln n).
In particular we want a mapping µ : Rd → Rk (with k d) so it compressed all of Rd to Rk . And in
particular for any point set P ⊂ Rd we want all distance preserved so that for all p, p0 ∈ P

(1 − ε)kp − p0 k ≤ kµ(p) − µ(p0 )k ≤ (1 + ε)kp − p0 k. (17.1)

This is stricter that the requirements for PCA since we want all distance between pairs of points preserved,
whereas PCA was asking for more of an average error to be small. This allowed some points to have large
error as long as most did not.
The idea to create µ is very simple: choose one at random!
To create µ, we create k random unit vectors up 1 , u2 , . . . , uk , then project onto the subspace spanned by
these vectors. Finally we need to re-normalize by d/k so the expected norm is preserved.

Algorithm 17.1.1 Random Projection

for i = 1 to k do
Let ui be random p element from d-dimensional Gaussian (using d/2 Box-Mueller transforms).
Rescale ui = d/k · ui /kui k.
for each points pj ∈ P do
for i ∈ [k] do
qi = hp, ui i
p is mapped to q = (q1 , q2 , . . . , qk ) = µ(p) ∈ Rk
return Q

A classic theorem [1], known as the Johnson-Lindenstrauss Lemma, shows that if k = O((1/ε2 ) log(n/δ))
in Algorithm 17.1.1 then for all p, p0 ∈ P then equation (17.1) is satisfied with probability at least 1 − δ. The
proof can almost be seen as a Chernoff-Hoeffding bound plus Union bound, see L3.2. For each distance,
each random projection (after appropriate normalization) gives an unbiased estimate; this requires the 1/ε2

1
term
to make the difference from the unbiased estimate to be small. Then we take the union bound over all
n 2 ) distances (this yields the log n term).
2 = O(n

Interpretation of bounds. It is pretty amazing that this bound does not depend on d. Moreover, it is
essentially tight; that is, there are known point sets such that it requires Ω(1/ε2 ) dimensions to satisfy
equation 17.1.
Although the log n component can be quite reasonable, the 1/ε2 part can be quite onerous. For instance,
if we want error to be within 1% error, we may need k at about log n times 10,000. Its not often that d is
large enough that setting k = 10,000 is useful.
However, we can sometimes get about 10% error (recall this is the worst case error) when k = 200 or so.
Also the log n term may not be required if the data does actually lie in a lower dimensional space naturally
(or is very clustered); this component is really a worst case analysis.
A rule of thumb: use JL when d > 100,000 and the desired k > 500.
In conclusion, this may be useful when k = 200 ok, and not too much precision is needed, and PCA is
too slow. Otherwise, SVD or its approximations may be a better technique.

Extensions/Advantages. One can also combine this with PCA ideas [2] to get similar bounds and per-
formance as in L16.
Another advantage of this technique is that µ is defined independently of P , so if we don’t know P ahead
of time, we can still create µ and then use it in several different cases. But if we know something of P , then
again typically PCA is better.
Typically, the random Gaussian vector ui can also be replaced with a random vector in {−1, 0, +1}d .

17.2 Compressed Sensing

The compressed sensing problem is as follows. The data is S, a sparse d-dimension vector; that is, it only
has m non-zero entries for m d.
For example, let

S T = [0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 1 1 0 1 0 0 0 1 0 0 1 0 0]

for d = 32 and m = 8. (Perhaps in practice the non-zeros could be larger, and the “zeros” may be small,
such as < 0.05.)
Now the goal is to make only N = K · m log(d/m)) (random) measurements of S and recover it exactly
(or with high probability). In some settings K is 4, not more than 20 in general. For this to work, each
measurement needs to be of (nearly) the entire matrix, otherwise we can miss a non-zero and just never
witness it.
Let a measurement xi be a random vector in {−1, 0, +1}d . Example:

xTi = [-1 0 1 0 1 1 -1 1 0 -1 0 0 1 -1 -1 1 0 1 0 1 -1 -1 -1 0 1 0 0 -1 0 1 0 0]

and the result of the measurement on S is

yi = hS, xi i = 0+0+0+0+1+0+0+0+0+0+0+0+0+0+0+1+0+0+0+1-1+0-1+0+0+0+0+0+0+1+0+0 = 2.

In general we will have a d × N measurement matrix X = [x1 ; x2 ; . . . ; xN ] and a measurement y = XS.

The measurement vector y takes about N log N ≈ m log(d/m) log m space, which is much less than S.
Since it requires log d bits just to store the location of each non-zero in S, this is really only an extra log m
space factor over just storing the non-zero bits in S. Hence the great compression.

CS 6140 Data Mining; Spring 2015 Instructor: Jeff M. Phillips, University of Utah
Examples:

• single pixel camera: Instead of 10 Gigapixels (about 25MB), directly sense the 5MB jpg. This is hard,
but we can get kind of close. Take N measurements where each yi is the sum of all 10 Gigapixels
with a random mask xi . Each pixel is either taken in “as is” (a +1), is blocked (a 0), or is subtracted
(a −1).
Such cameras have been built - they work ok, not as well as regular camera :).
• Hubble Telescope: High resolution camera in space (less atmospheric interference). But communi-
cation to/from space is expensive. So with fixed (but initial random mask matrix X, that is already
know on Earth), can send compressed signals down.
• MRI on kids: They squirm a lot. So few angles voxels need to be sensed and this technique gets the
best images available on kids. Not as high resolution as on full MRI, but with much much less time.
• Noisy Data: Data is often noisy, and have more attributes than actually there. This helps find the true
structure. See more next lecture.

17.3 Orthogonal Matching Pursuit (OMP)

This is one of the simplest ways to recover. It is simple and greedy (with some chance to recover). Its like a
discrete L1 version of the technique for SVD, and can be useful for many other hard optimization problems.
We assume we know the measurement (d × N ) matrix X, and the N measurements y.

• First, find the measurement column Xj (not the row xi used to measure).

Xj = arg max |hy, Xj 0 i|.

Xj 0 ∈X

This represents the single index of S that explains the most about y.

• Next we find the weight

γ = arg min ky − Xj γk
γ∈R

that represents our guess of entry sj in S. If S is always 0 or 1, then we may enforce that γ = 1.

• Finally, we calculate the residual r = y −Xj γ. This is what remains to be explained by other elements
of S.

• Then we repeat for t rounds. We stop when the residual is small enough (nothing left to explain) or γ
is small enough (the additional explanation is not that useful).

Algorithm 17.3.1 Orthogonal Matching Pursuit

Set r = y.
for i = 1 to t do
Set Xj = arg maxXj 0 ∈X |hr, Xj 0 i|.
Set γj = arg minγ kr − Xj γk.
Set r = r − Xj γj .
Return Ŝ where ŝj = γj (or 0).

CS 6140 Data Mining; Spring 2015 Instructor: Jeff M. Phillips, University of Utah
Remarks

• Can add a regularization term into loss function

Xj = arg min kr − Xj γk + αkγk

and, as we will see next lecture, this will bias towards sparse solutions.

• Can re-solve for optimal least squares to get better estimate each round, but more work.

γ1,...,i = arg min ky − [Xj1 , . . . , Xji ]γ1,...,i k

γ1 ,...,γi

r = r − [Xj1 , . . . , Xji ]γ1,...,i

• This converges if in each step we restrict kri k < kri−1 k. A Frank-Wolfe analysis can show that its
within ε of optimal after t = 1/ε steps. Although it may not be a global optimum.

• Term “orthogonal” comes since each Xji in the ith step is always linear independent of [Xj1 . . . Xji−1 ].
Adds an orthogonal explanation of y.

• Roughly, the analysis of why d log(m/d) measurements is through the Coupon Collectors since we
need to hit each of the d measurements. And since X is random and N is large enough, then each
hXj , Xj 0 i (for j 6= j 0 ) should be small (they are close to orthogonal).

17.3.1 Orthogonal Matching Pursuit Example

We now consider a specific example for running Orthogonal Matching Pursuit, this has d = 10, m = 3 and
N = 6. Let the (unknown) input signal be

S = [0, 0, 1, 0, 0, 1, 0, 0, 1, 0].

Let the measurement matrix by

1 −1 −1 0 −1 0 −1 0
 
0 1
 −1 −1 0 1 −1 0 0 −1 0 1 
 
 1 −1 1 −1 0 −1 1 1 0 0 
X= 
 1
 0 −1 0 0 1 −1 −1 1 1 

 −1 0 0 0 1 0 1 0 1 −1 
0 0 −1 −1 −1 0 −1 1 −1 0

so for instance the first row x1 = (0, 1, 1, −1, −1, 0, −1, 0, −1, 0) yields measurement hS, x1 i = 0 + 0 +
1 + 0 + 0 + 0 + 0 + 0 + (−1) + 0 = 0.
The observed measurement vector is

y = XS T = [0, 0, 0, 1, 1, −2]T .

Columns 7 and 9 have the most explanatory power towards y, based on X. We let j1 = 9. Xj1 = X9 =
(−1, 0, 0, 1, 1, −1)T = X(:,9). Then 1 = γ1 = arg minγ ky − X9 γk. We can then set r = y − X9 ∗ γ1 =
(1, 0, 0, 0, 0, −1).
Next, we observe that columns 3, 4, 5, 7, and 9 have the most explanatory power for the new r. We
choose 3 arbitrarily, letting j2 = 3. Let Xj2 = X3 = (1, 0, 1, −1, 0, −1)T = X(:,3). Then 1 = γ2 =
arg minγ kr − X3 γk (actually any value in the range [0, 1] will give same minimum). Using γ2 we update

CS 6140 Data Mining; Spring 2015 Instructor: Jeff M. Phillips, University of Utah
r = r −X3 ∗γ2 = (0, 0, −1, 1, 0, 0). Note: This progress seemed sideways at best. It increased our non-zero
γi values, but did not decrease kr − yk.
Finally, we observe columns 1, 3, 6, 7, and 8 have the most explanatory power of the new r. We choose
6 arbitrarily, let j3 = 6. Note: we could have chosen 3, and then gone an updated our choice of γ2 . Let
Xj3 = X6 = (0, 0, −1, 1, 0, 0)T = X(:,6). Then 1 = γ3 = arg minγ kr − X6 γk. Then using γ3 we
update r = r − X6 γ3 = (0, 0, 0, 0, 0, 0). So we have completely explained y using only 3 data elements.

Remarks:

• This would not have worked so cleanly if we made other arbitrary choices. Using OMP typically
needs something like N = 20 × m log d measurements (instead of 6). Large measurements would
have made it much more likely that at each step we chose the correct variable ji as most explanatory.

• This still will not always converge to the correct solution. It might get stuck without explaining
everything exactly. In that case, we can often guess we still get a good enough explanation (although
slightly off) and leave it at that. With much larger d and m, getting a good guess of the m non-zero
bits might still be useful. There are other more complex minimization techniques we can alternatively
consider in the next lecture.

CS 6140 Data Mining; Spring 2015 Instructor: Jeff M. Phillips, University of Utah
CS 6140 Data Mining; Spring 2015 Instructor: Jeff M. Phillips, University of Utah
Bibliography

[1] William B. Johnson and Joram Lindenstrauss. Extensions of lipschitz mappings into a hilbert space.
Contemporary Mathematics, 26(189-206):1, 1984.

[2] Tamas Sarlos. Improved approximation algorithms for large matrices via random projections. In Foun-
dations of Computer Science, 2006. FOCS’06. 47th Annual IEEE Symposium on, pages 143–152. IEEE,
2006.

Class 3 Math Solution - Version 1
100% (1)
Class 3 Math Solution - Version 1
45 pages
Meshfree Approximation With MATLAB Lecture
No ratings yet
Meshfree Approximation With MATLAB Lecture
134 pages
Nguyen Princeton 0181D 11063
No ratings yet
Nguyen Princeton 0181D 11063
168 pages
Computational Inverse Problems
100% (1)
Computational Inverse Problems
67 pages
Wainwrightslides 1
No ratings yet
Wainwrightslides 1
67 pages
Introduction
No ratings yet
Introduction
64 pages
MAT-52506 Inverse Problems: Samuli Siltanen February 20, 2009
No ratings yet
MAT-52506 Inverse Problems: Samuli Siltanen February 20, 2009
58 pages
Introduction To Compressive Sensing 2.5
No ratings yet
Introduction To Compressive Sensing 2.5
118 pages
Compressed Sensing 091604
No ratings yet
Compressed Sensing 091604
35 pages
Machine Learning
No ratings yet
Machine Learning
29 pages
Mathematics of Signals, Networks, and Learning
No ratings yet
Mathematics of Signals, Networks, and Learning
68 pages
Survey Compressed Sensing
No ratings yet
Survey Compressed Sensing
23 pages
A Survey On Diffusion Models For Inverse Problems: 1.1 Problem Setting
No ratings yet
A Survey On Diffusion Models For Inverse Problems: 1.1 Problem Setting
38 pages
Mathematics 09 00329
No ratings yet
Mathematics 09 00329
19 pages
Monohar Etal2018
No ratings yet
Monohar Etal2018
34 pages
Diffusion Survey
No ratings yet
Diffusion Survey
38 pages
Compressed Sensing
No ratings yet
Compressed Sensing
18 pages
An Optimal Algorithm For Approximate Nearest
No ratings yet
An Optimal Algorithm For Approximate Nearest
33 pages
Lecture 9 - Image Restoration and Reconstruction
No ratings yet
Lecture 9 - Image Restoration and Reconstruction
20 pages
Sketching As A Tool For Numerical Linear Algebra
No ratings yet
Sketching As A Tool For Numerical Linear Algebra
139 pages
A Partial Derandomization of Phaselift Using Spherical Designs
No ratings yet
A Partial Derandomization of Phaselift Using Spherical Designs
38 pages
NN Notes PDF
No ratings yet
NN Notes PDF
126 pages
Gradient Descent With Sparsification: An Iterative Algorithm For Sparse Recovery With Restricted Isometry Property
No ratings yet
Gradient Descent With Sparsification: An Iterative Algorithm For Sparse Recovery With Restricted Isometry Property
8 pages
An Efficient Algorithm For Designing Projection Matrix in Compressive Sensing Based On Alternating Optimization
No ratings yet
An Efficient Algorithm For Designing Projection Matrix in Compressive Sensing Based On Alternating Optimization
13 pages
Dynamics of Fluid in Porous Media
No ratings yet
Dynamics of Fluid in Porous Media
70 pages
Compressive Sensing
No ratings yet
Compressive Sensing
6 pages
Abolghasemi 2012
No ratings yet
Abolghasemi 2012
11 pages
Chap 1
No ratings yet
Chap 1
10 pages
Candès and Romberg - 2007 - Sparsity and Incoherence in Compressive Sampling
No ratings yet
Candès and Romberg - 2007 - Sparsity and Incoherence in Compressive Sampling
18 pages
An Introduction To Compressive Sensing
No ratings yet
An Introduction To Compressive Sensing
118 pages
Face Recognition With Compressive Sensing: Mathias Lohne Spring, 2017
No ratings yet
Face Recognition With Compressive Sensing: Mathias Lohne Spring, 2017
14 pages
Robust Compressed Sensing
No ratings yet
Robust Compressed Sensing
45 pages
Compressed Sensing
No ratings yet
Compressed Sensing
118 pages
Extensions of Compressed Sensing: Yaakov Tsaig David L. Donoho October 22, 2004
No ratings yet
Extensions of Compressed Sensing: Yaakov Tsaig David L. Donoho October 22, 2004
20 pages
KLT DSP Part1
No ratings yet
KLT DSP Part1
39 pages
A Fusion Model For Enhancement of Range Images
No ratings yet
A Fusion Model For Enhancement of Range Images
39 pages
Sparse Regression and Dictionary Learning
No ratings yet
Sparse Regression and Dictionary Learning
14 pages
03 Sparse Approx Algs PDF
No ratings yet
03 Sparse Approx Algs PDF
12 pages
Analysis and Optimization of An Algorithm For Discrete Tomography
No ratings yet
Analysis and Optimization of An Algorithm For Discrete Tomography
32 pages
Property Testing of Data Dimensionality: ICSI and UC Berkeley
No ratings yet
Property Testing of Data Dimensionality: ICSI and UC Berkeley
26 pages
Computer Vision: Spring 2006 15-385,-685
No ratings yet
Computer Vision: Spring 2006 15-385,-685
58 pages
Robust Uncertainty Principles: Exact Signal Reconstruction From Highly Incomplete Frequency Information
No ratings yet
Robust Uncertainty Principles: Exact Signal Reconstruction From Highly Incomplete Frequency Information
39 pages
O4MD 01 Introduction
No ratings yet
O4MD 01 Introduction
10 pages
Prony
No ratings yet
Prony
7 pages
Research Article: Tree-Based Backtracking Orthogonal Matching Pursuit For Sparse Signal Reconstruction
No ratings yet
Research Article: Tree-Based Backtracking Orthogonal Matching Pursuit For Sparse Signal Reconstruction
9 pages
Dimension Reduction Techniques: For Efficiently Computing Distances in Massive Data
No ratings yet
Dimension Reduction Techniques: For Efficiently Computing Distances in Massive Data
47 pages
HW 4
No ratings yet
HW 4
6 pages
Support Recovery Survey
No ratings yet
Support Recovery Survey
10 pages
1971 US Army Vietnam War SPECIAL FORCES OPERATIONAL TECHNIQUES 261p
No ratings yet
1971 US Army Vietnam War SPECIAL FORCES OPERATIONAL TECHNIQUES 261p
260 pages
Very Sparse Random Projections: Ping Li Trevor J. Hastie Kenneth W. Church
No ratings yet
Very Sparse Random Projections: Ping Li Trevor J. Hastie Kenneth W. Church
10 pages
Ernst Jünger - Interwar Articles
100% (5)
Ernst Jünger - Interwar Articles
157 pages
Encoding The Ball From Limited Measurements
No ratings yet
Encoding The Ball From Limited Measurements
10 pages
HW 5
No ratings yet
HW 5
2 pages
hw6 2011 05 16 01 Solutions
No ratings yet
hw6 2011 05 16 01 Solutions
17 pages
n · maxk,j - Uk,j - - The smaller μ, the fewer samples needed
No ratings yet
n · maxk,j - Uk,j - - The smaller μ, the fewer samples needed
9 pages
Ee5551 Newproj Report
No ratings yet
Ee5551 Newproj Report
18 pages
07.01.approximate Nearest Neighbor Queries in Fixed Dimensions
No ratings yet
07.01.approximate Nearest Neighbor Queries in Fixed Dimensions
11 pages
ECE 410 Digital Signal Processing D. Munson University of Illinois
No ratings yet
ECE 410 Digital Signal Processing D. Munson University of Illinois
12 pages
Crispa
0% (1)
Crispa
30 pages
Compressive Sensing: Notes
No ratings yet
Compressive Sensing: Notes
4 pages
Worked Examples in Mathematics for Scientists and Engineers
From Everand
Worked Examples in Mathematics for Scientists and Engineers
G. Stephenson
No ratings yet
1 Applications of Nearest Neighbor
No ratings yet
1 Applications of Nearest Neighbor
5 pages
Digital Image Processing
No ratings yet
Digital Image Processing
2 pages
External Auditors' Reliance On Internal Audit in Sri Lanka
100% (2)
External Auditors' Reliance On Internal Audit in Sri Lanka
83 pages
12th English Important Questions Tamilaruvi
No ratings yet
12th English Important Questions Tamilaruvi
2 pages
Designing A Neural Network For Forecasting Financial and Economic Time Serie
No ratings yet
Designing A Neural Network For Forecasting Financial and Economic Time Serie
22 pages
Components of Fitness Presentation 2
No ratings yet
Components of Fitness Presentation 2
32 pages
PP M Pp01 Lecture 1 Scope of Purchasing 19-9-2011
No ratings yet
PP M Pp01 Lecture 1 Scope of Purchasing 19-9-2011
34 pages
DP PDF
No ratings yet
DP PDF
5 pages
Boomerang Generation
No ratings yet
Boomerang Generation
8 pages
Chromatograaphy
No ratings yet
Chromatograaphy
16 pages
Business Communication Synopsis ON "Organisation Structure of
No ratings yet
Business Communication Synopsis ON "Organisation Structure of
20 pages
The Spanish and Latin American Professional Relations of The Mining Academy of Selmecbánya (Academia Montanistica, Bergakademie)
No ratings yet
The Spanish and Latin American Professional Relations of The Mining Academy of Selmecbánya (Academia Montanistica, Bergakademie)
6 pages
Mathematical Functions
From Everand
Mathematical Functions
Oliver Linton
No ratings yet
Venkat Eswar An 2016
No ratings yet
Venkat Eswar An 2016
8 pages
News Listening 1 - Unit 7 - Handout
No ratings yet
News Listening 1 - Unit 7 - Handout
2 pages
Bootstrap Mock Test I
No ratings yet
Bootstrap Mock Test I
6 pages
Jorg City Analysis Final
No ratings yet
Jorg City Analysis Final
6 pages
PESTEL Analysis APAC Region
50% (4)
PESTEL Analysis APAC Region
3 pages
General Systems Theory
100% (2)
General Systems Theory
2 pages
21st Century Q1 Module 1 Lesson 3 - Cantre
No ratings yet
21st Century Q1 Module 1 Lesson 3 - Cantre
2 pages
Life Without Buildings - Marina Täube
No ratings yet
Life Without Buildings - Marina Täube
1 page
Are You Balancing The 7 Dimensions of Wellness
No ratings yet
Are You Balancing The 7 Dimensions of Wellness
3 pages
Conference Programme
No ratings yet
Conference Programme
2 pages
SKINNER, B. F. (1988) War, Peace and Behavior Analysis - Some Comments
No ratings yet
SKINNER, B. F. (1988) War, Peace and Behavior Analysis - Some Comments
2 pages
Checklist For Evaluating Internet Sources
No ratings yet
Checklist For Evaluating Internet Sources
1 page
Tarea 2 Activity 1
No ratings yet
Tarea 2 Activity 1
1 page
Datasheet Sensor de Presion
No ratings yet
Datasheet Sensor de Presion
8 pages
Data Guard Failover Test Using SQL
100% (1)
Data Guard Failover Test Using SQL
8 pages
A-level Maths Revision: Cheeky Revision Shortcuts
From Everand
A-level Maths Revision: Cheeky Revision Shortcuts
Scool Revision
3.5/5 (8)