0% found this document useful (0 votes)

2 views

JL transformation_minlash

This lecture discusses the curse of dimensionality in the nearest neighbor problem and the importance of dimensionality reduction for managing high-dimensional data. It introduces techniques such as random projections and the Johnson-Lindenstrauss transform to preserve interpoint distances while reducing dimensions, thereby improving computational efficiency. The lecture emphasizes the need for dimensionality reduction subroutines in algorithmic toolboxes to facilitate various distance-based computations.

Uploaded by

devlina.karmakar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

2 views

JL transformation_minlash

Uploaded by

devlina.karmakar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 11

CS168: The Modern Algorithmic Toolbox

Lecture #4: Dimensionality Reduction

Tim Roughgarden & Gregory Valiant∗
April 12, 2017

1 The Curse of Dimensionality in the Nearest Neigh-

bor Problem
Lectures #1 and #2 discussed “unstructured data”, where the only information we used
about two objects was whether or not they were equal. Last lecture, we started talking
about “structured data”. For now, we consider structure expressed as a (dis)similarity
measure between pairs of objects. There are many such measures; last lecture we mentioned
Jaccard similarity (for sets), L1 and L2 distance (for points in Rk , when coordinates do or
do not have meaning, respectively), edit distance (for strings), etc.
How can such structure be leveraged to understand the data? We’re currently focusing
on the canonical nearest neighbor problem, where the goal is to find the closest point of a
point set to a given point (either another point of the point set or a user-supplied query).
Last lecture also covered a solution to the problem, the k-d tree. This is a good “first
cut” solution to the problem when the number of dimensions is not too big — less than
logarithmic in the size of the point set.1 When the number k of dimensions is at most 20
or 25, a k-d tree is likely work well.
Why do we want the number of dimensions to be small? Because of the curse of di-
mensionality. Recall that to compute the nearest neighbor of a query q using a k-d tree,
one first does a downward traversal through the tree to identify the smallest region of space
that contains q. (Nodes of the k-d tree correspond to regions of space, with the regions
corresponding to the children y, z of a node x.) Then one does an upward traversal of the
tree, checking other cells that could conceivably contain q’s nearest neighbor. The number
of cells that have to be checked can scale exponentially with the dimension k.
∗c 2016–2017, Tim Roughgarden and Gregory Valiant. Not to be sold, published, or distributed without
the authors’ consent.
1
The trivial baseline for the nearest neighbor problem is brute-force search, where you just compute
the distance between the query point and every point in the point set. Brute-force search scales linearly
with the size of the point set. For large point sets one would prefer to maintain a data structure so that
nearest-neighbor queries can be answered in sub-linear time.

1
The curse of dimensionality appears to be fundamental to the nearest neighbor problem
(and many other geometric problems), and is not an artifact of the specific solution of the k-d
tree. For further intuition, consider the nearest neighbor problem in one dimension, where
the point set P and query q are simply points on the line. The natural way to preprocess
P is to sort the points by value; searching for the nearest neighbor is just binary search to
find the interval in which q lies, and then computing q’s distance to the points immediately
to the left and right of q. What about two dimensions? It’s no longer clear what “sorting
the point set” means, but let’s assume that we figure that out. (The k-d tree offers one
approach.) Intuitively, there are now four directions that we have to check for a possible
nearest neighbor. In three dimensions, there are eight directions to check, and in general
the number of relevant directions is scaling exponentially with k. There is no known way
to overcome the curse of dimensionality in the nearest neighbor problem, without resorting
to approximation (as we do below): every known solution that uses a reasonable amount of
space uses time that scales exponentially in k or linearly in the number n of points.

2 Point of Lecture
Why is the curse of dimensionality a problem? The issue is that the natural representation of
data is often high-dimensional. Recall our motivating examples for representing data points
in real space. With documents and the bag-of-words model, the number of coordinates equals
the number of words in the dictionary — often in the tens of thousands. Images are often
represented as real vectors, with at least one dimension per pixel (recording pixel intensities)
— here again, the number of dimensions is typically in the tens of thousands. The same
holds for points representing the purchase or page view history of an Amazon customer, or
of the movies watched by a Netflix subscriber.
The friction between the large number of dimensions we want to use to represent data
and the small number of dimensions required for computational tractability motivates dimen-
sionality reduction. The goal is to re-represent points in high-dimensional space as points in
low-dimensional space, preserving interpoint distances as much as possible. We can think of
dimensionality reduction as a form of lossy compression, tailored to approximately preserve
distances. For an analogy, the count-min sketch (Lecture #2) is a form a lossy compression
tailored to the approximate preservation of frequency counts.
Dimensionality reduction enables the following high-level approach to the nearest neigh-
bor problem:

1. Represent the data and queries using a large number k of dimensions (tens of thousands,
say).

2. Use dimensionality reduction to map the data and queries down to a small number d
of dimensions (in the hundreds, say).

3. Compute answers to nearest-neighbor queries in the low-dimensional space.

2
Provided the dimensionality reduction subroutine approximately preserves all interpoint dis-
tances, the answer to the nearest-neighbor query in low dimensions is an approximately
correct answer to the original high-dimensional query. Even if the reduced number d of
dimensions is still too big to use k-d trees, reducing the dimension still speeds up algorithms
significantly. For example, on Mini-Project #2, you will use dimensionality reduction to
improve the running time of a brute-force search algorithm, where the running time has
linear dependence on the dimension.
The three-step paradigm above is relevant for any computation that only cares about
interpoint distances between the points, not just the nearest-neighbor problem. Distance-
based clustering is another example.
It should now be clear that we would love to have subroutines for dimensionality reduction
in our algorithmic toolbox. The rest of this lecture gives a few examples of such subroutines,
and a unified way to think about them.

3 Role Model: Fingerprints

The best way to understand the new concepts in this lecture is via analogy to a hashing-
based trick that you already know cold. We review the key idea here; our other examples of
dimensionality reduction follow the same template.
Let’s return to a world of unstructured abstract data, with no notion of distance between
objects other than “equal” vs. “non-equal.” The analog of dimensionality reduction is then:
how can we re-represent all objects, using fewer bits than before, so that the “distinctness”
relation is approximately preserved? This goal is, of course, right in the wheelhouse of
hashing.
To maximize overlap with our later dimensionality reduction subroutines, we solve the
problem in two steps. Suppose we have n objects from a universe U . Each object requires
log2 U bits to describe. In the first step, we choose a function h — for example, uniformly
at random from a universal family, with range equal to all 32-bit values — and map each
object x to
f (x) = h(x) mod 2. (1)
This associates each object with a single bit, 0 or 1 — certainly this is much compressed
compared to the original log2 U -bit representation! The properties of this mapping are:

1. If x = y, then f (x) = f (y). That is, the property of equality is preserved.

1
2. If x 6= y and h is a good hash function, then Pr[f (x) = f (y)] ≤ 2
. That is, the
property of distinctness is preserved with probability at least 50%.

Achieving error 50% doesn’t sound too impressive, but it’s easy to reduce it in a sec-
ond step, via the “magic of independent trials.” Repeating the experiment above ` times
— choosing ` different hash functions h1 , . . . , h` and labeling each object x with ` bits
f1 (x), . . . , f` (x) — the properties become:

3
1. If x = y, then fi (x) = fi (y) for all i = 1, 2, . . . , `.
2. If x 6= y and the hi ’s are good and independent hash functions, then Pr[f (x) = f (y)] ≤
2−k .
For example, to achieve a user-specified error of δ > 0, we only need to use dlog2 1δ e bits to
represent each object. For all but the tiniest values of δ, this representation is much smaller
than the original log2 U -bit one.

4 L2 Distance and Random Projections

The “fingerprinting” subroutine of Section 3 approximately preserves a 0-1 function on object
pairs (“not equal vs. equal”). What if we want to preserve approximately the distances
between object pairs? For example, if we want to preserve the L2 (a.k.a. Euclidean) distance
v
u k
uX
t (x − y )2
i i
i=1

between points in Rk , what’s the analog of a hash function? This section proposes ran-
dom projection as the answer. This idea results in a very neat primitive, the Johnson-
Lindenstrauss (JL) transform, which says that if all we care about are the Euclidean dis-
tances between points, then we can assume (conceptually and computationally) that the
number of dimensions is not overly huge (in the hundreds, at most).

4.1 The High-Level Idea

Assume that the n objects of interest are points x1 , . . . , xn in k-dimensional Euclidean space
Rk (where k can be very large). Suppose we choose a “random vector” r = (r1 , . . . , rk ) ∈ Rk .
(See Section 4.3 for details on how the ri ’s are chosen.) Define a corresponding real-valued
function fr : Rk 7→ R by taking the inner product of its argument with the randomly chosen
coefficients r:
Xk
fr (x) = hx, ri = xj rj . (2)
j=1

Thus, fr (x) is a random linear combination of the components of x. This function will play a
role analogous to the single-bit function defined in (1) in the previous section. The function
in (1) compresses a log2 U -bit object description to a single bit; the random projection in (2)
replaces a vector of k real numbers with a single real number.
Figure 1 recalls the geometry of the inner product, as the projection of one vector onto
the line spanned by another, which should be familiar from your high school training.
If we want to use this idea to approximately preserve the Euclidean distances between
points, how should we pick the rj ’s? Inspired by our two-step approach in Section 3, we first
try to preserve distances only in a weak sense. We then use independent trials to reduce the
error.

4
x

hx, ri
O

Figure 1: The inner product of two vectors is the projection of one onto the line spanned by
the other.

Figure 2: The probability density function for the standard Gaussian distribution (with
mean 0 and variance 1).

4.2 Review: Gaussian Distributions

We briefly recall some properties of Gaussian (a.k.a. normal) distributions that are useful
for us. Recall the basic shape of the distribution (Figure 2). The distribution is symmetric
around its mean µ. Roughly 68% of its mass is assigned to points that are within one
standard deviation σ of its mean. Recall that, for any distribution, the square σ 2 of the
standard deviation is the variance. A Gaussian distribution is completely and uniquely
defined by the values of its mean µ and variance σ 2 . The standard Gaussian is the special
case where µ = 0 and σ = σ 2 = 1.2
We’re designing our own dimensionality reduction subroutine, and can choose the random
coefficients ri however we want. Why use Gaussians? The nice property we’ll exploit here is
their closure under addition. Formally, suppose X1 and X2 are independent random variables
with normal distributions N (µ1 , σ12 ) and N (µ2 , σ22 ), respectively. Then, the random variable
X1 + X2 has the normal distribution N (µ1 + µ2 , σ12 + σ22 ).3
Note that you shouldn’t be impressed that the mean of X1 + X2 equals the sum of the
means of X1 and X2 — by linearity of expectation, this is true for any pair of random
2
2
Perhaps you’ve previously been tortured by the density function √12π e−x /2 ; we won’t need this here.
3
The proof is a few lines of crunching integrals, and can be found in any decent statistics textbook.

5
variables, even non-independent ones. Similarly, it’s not interesting that the variance of
X1 + X2 is the sum of the variances of X1 and X2 — this holds for any pair of independent
random variables.4 What’s remarkable is that the distribution of X1 + X2 is a Gaussian
(with the only mean and variance that it could possibly have). Adding two distributions
from a family generally gives a distribution outside that family. For example, the sum of
two random variables that are uniform on [0, 1] certainly isn’t uniformly distributed on [0, 2]
— there’s more mass in the middle then on the ends.

4.3 Step 1: Unbiased Estimator of Squared L2 Distance

We now return to the random projection function fr defined in (2), with the random co-
efficients r1 , . . . , rk chosen independently from a standard Gaussian distribution. We next
derive the remarkable fact that, for every pair x, y of points in Rk , the square of fr (x)−fr (y)
is an unbiased estimator of the squared Euclidean distance between x andqy.
Pk
Fix x, y ∈ Rk . The L2 distance between x and y, denoted kx − yk2 , is j=1 (xj − yj ) .
2

By the definition of fr , we have

k
X k
X k
X
fr (x) − fr (y) = xj rj − yj rj = (xj − yj )rj . (3)
j=1 j=1 j=1

Here’s where the nice properties of Gaussians come in. Recall that the xj ’s and yj ’s are fixed
(i.e., constants), while the rj ’s are random. For each j = 1, 2, . . . , k, since rj is a Gaussian
with mean zero and variance 1, (xj − yj )rj is a Gaussian with mean zero and variance
(xj − yj )2 . (Multiplying a random variable by a scalar λ scales the standard deviation by λ
and hence the variance by λ2 .) Since Gaussians add, the right-hand side of (3) is a Gaussian
with mean 0 and variance
X k
(xj − yj )2 = kx − yk22 .
j=1

Whoa — this is an unexpected connection between the output of random projection and
the (square of the) quantity that we want to preserve. How can we exploit it? Recalling
the definition Var(X) = E[(X − E[X])2 ] of variance as the expected squared deviation of a
random variable from its mean, we see that for a random variable X with mean 0, Var(X)
is simply E[X 2 ]. Taking X to be the random variable in (3), we have

E (fr (x) − fr (y))2 = kx − yk22 .

(4)

That is, the random variable (fr (x) − fr (y))2 is an unbiased estimator of the squared Eu-
clidean distance between x and y.
4
But this doesn’t generally hold for non-independent random variables, right?

6
4.4 Step 2: The Magic of Independent Trials
We’ve showed that random projection reduces the number of dimensions from k to just
one (replacing each x by fr (x)), while preserving squared distances in expectation. Two
issues are: we care about preserving distances, not squares of distances;5 and we want to
almost always preserve distances very closely (not just in expectation). We’ll solve both
these problems in one fell swoop, via the magic of independent trials.
Suppose instead of picking a single vector r, we pick d vectors r1 , . . . , rd . Each component
of each vector is drawn i.i.d. from a standard Gaussian. For a given pair x, y of points, we get
d independent unbiased estimates of kx − yk22 (via (4)). Averaging independent unbiased
estimates yields an unbiased estimate with less error.6 Because our estimates in (4) are
(squares of) Gaussians, which are very well-understood distributions, one can figure out
exactly how large d needs to be to achieve a target approximation (for details, see [3]). The
bottom line is: for a set of n points in k dimensions, to preserve all n2 interpoint Euclidean
distances up to a 1 ± factor, one should set d = Θ(−2 log n).7

4.5 The Johnson-Lindenstrauss Transform

Rephrasing and repackaging what we’ve already done yields the Johnson-Lindenstrauss (JL)
transform. This was originally a result in pure functional analysis [4], and was ported over
to the algorithmic toolbox in the mid-90s [5]. The JL transform is an influential result, and
in the 21st century, many variations and improvements have been proposed (see the end of
the section).
The JL transform, for domain and range dimensions k and d, is defined using a d × k
matrix A in which each of the kd entries is chosen i.i.d. from a standard Gaussian distribution.
See Figure 3. This matrix defines a mapping from k-vectors to d-vectors via

x 7→ √1 Ax,
d
√
where the 1/ d scaling factor corresponds to the average over independent trials discussed
in Section 4.4.
To see how this mapping fA corresponds to our derivation in Section 4.4, note that for
5
The fact that X 2 has expectation µ2√does not imply that X has expectation µ. For example, suppose
X is equally likely to be 0 or 2 and µ = 2.
6
We’ll discuss this idea in more detail next week, but we’ve already reviewed the tools needed
Pd to make
this precise. Suppose X1 , . . . , Xd are independent and all have mean µ and variance σ 2 . Then i=1 Xi has
Pd
mean dµ and variance dσ 2 , and so the average d1 i=1 Xi has mean µ and variance σ 2 /d. Thus, averaging d
independent unbiased estimates yields yields an unbiased estimate and drops the variance by a factor of d.
7
The constant suppressed by the Θ is reasonable, no more than 2. In practice, it’s worth checking if
you get away with d smaller than what is necessary for this theoretical guarantee. For typical applications,
setting d in the low hundreds should be good enough for acceptable results. See also Mini-Project #2.

7
(k, 1)-vector x
x1

(d, k)-matrix A .. (d, 1)-vector Ax

a1,1 ... a1,i ... a1,k Ax1

..
a2,1 ... a2,i ... a2,k .
.. .. .. .. × xi = Axi
. . . . ..
.
ad,1 ... ad,i ... ad,k Axd

..
.

i.i.d. Gaussians
xk

Figure 3: The Johnson-Lindenstrauss tranform A for dimension reduction.

a fixed pair x, y of k-vectors, we have

2
1 1
kfA (x) − fA (y)k22 = √ Ax − √ Ay (5)
d d 2
1
= kA(x − y)k22 , (6)
d
d
1X T 2
= ai (x − y) , (7)
d i=1

with aTi denotes the ith row of A. Since each row aTi is just a k-vector with entries chosen
i.i.d. from a standard Gaussian, each term

k
!2
2 X
aTi (x − y) = aij (xj − yj )
j=1

is precisely the unbiased estimator of kx − yk22 described in (3) and (4). Thus (5)–(7) is
the average of d unbiased estimators. Provided d is sufficiently large, with probability close
to 1, all of the low-dimensional interpoint squared distances kfA (x) − fA (y)k22 are very
good approximations of the original squared distances kx − yk22 . This implies that, with
equally large probability, all interpoint Euclidean distances are approximately preserved by
the mapping fA down to d dimensions.
Thus, for any point set x1 , . . . , xn in k-dimensional space, and any computation that
cares only about interpoint Euclidean distances, there is little loss in doing the computation
on the d-dimensional fA (xi )’s rather than on the k-dimension xi ’s.

8
The JL transform is not usually implemented exactly the way we described it. One
simplification is to use ±1 entries rather than Gaussian entries; this idea has been justified
both empirically and theoretically (see [1]). Another line of improvement is add structure
to the matrix so that the matrix-vector product Ax can be computed particularly quickly a
la the Fast Fourier Transform — this is known as the “fast JL transform” [2]. Finally, since
the JL transform can often only bring the dimension down into the hundreds without overly
distorting interpoint distances, additional tricks are often needed. One of these is “locality
sensitive hashing (LSH),” touched on briefly in Section 6.

5 Jaccard Similarity and MinHash

Section 4 makes the case for using random projections for preserving Euclidean distances.
What about other notions of similarity? Several analogs of random projection for other
measures are known; we next discuss a representative one.

5.1 The High-Level Idea

A long time ago (mid/late 90s, post-Web but pre-Google) in a galaxy far, far away, there
was a Web search engine called Alta Vista. When designing a search engine, an immediate
problem is to filter search results to remove near-duplicate documents — for example, two
different Web pages that differ only in their timestamps. To turn this into a well-defined
problem, one needs a similarity measure between documents. Alta Vista decided to use
Jaccard similarity. Recall from last lecture that this is a similarity measure for sets: for two
sets A, B ⊆ U , the Jaccard similarity is defined as

|A ∩ B|
J(A, B) = .
|A ∪ B|

Jaccard similarity is easily defined for multi-sets (see last lecture); here, to keep things simple,
we do not allow an element to appear in a set more than once.
Random projection replaces k real numbers with a single one. So an analog here would
replace a set of elements with a single element. The plan is implement a random such
mapping that preserves Jaccard similarity in expectation, and then to use independent trials
as in Section 4.4 to boost the accuracy.

5.2 MinHash
For sets, the analog of random projection is the MinHash subroutine:

1. Choose a permutation π of the universe U uniformly at random.8

8
Just as we use well-chosen hash functions in place of random functions, actual implementations of
MinHash use hash functions instead of truly random permutations of U . Unlike many applications of
hashing, in this context it makes sense to use a hash function h that maps U back to itself, or perhaps to an

9
U

Figure 4: Two set A, B ⊆ U .

2. Map each set S to its minimum element argminx∈S π(x) under π.

Thus MinHash “projects” a set to its minimum element under a random permutation.
The brilliance of MinHash is that it gives a remarkably simple unbiased estimator of
Jaccard similarity. To see this, consider an arbitrary pair of sets A, B ⊆ U (Figure 4).
First, if the smallest (under π) element z of A ∪ B lies in the intersection A ∩ B, then
argminx∈A π(x) = argminx∈B π(x) = z, so A and B have the same MinHash. Second, if the
smallest element z of A ∪ B does not lie in A ∩ B — say it’s in A but not B — then the
MinHash of A is z while the MinHash of B is some element strictly larger than z (under π).
Thus, A and B have the same MinHash exactly when the smallest element of A ∪ B lies in
A ∩ B. Since all relative orderings of A ∪ B are equally likely under a random permutation,
each element of A ∪ B is equally likely to be the smallest. Thus:
|A ∩ B|
Pr[MinHash(A) = MinHash(B)] = = J(A, B).
|A ∪ B|
As usual, we can boost accuracy through
the magic of independent trials. If we want
n
an accurate estimate (up to ±) of all 2 Jaccard similarities of pairs of n objects, then
crunching some probabilities again shows that averaging Θ(−2 log n) independent estimates
is good enough. Accurately estimating all Jaccard similarities is overkill for many appli-
cations, which motivates “locality sensitive hashing (LSH),” discussed briefly in the next
section.

6 A Glimpse of Locality Sensitive Hashing

Recall from Section 5.1 the motivating application of filtering near-duplicate objects. If
we only wanted to filter exact duplicates, then there is an easy and effective hashing-based
even larger set to reduce the number of collisions. Collisions add error to the Jaccard similarity estimation
procedure, but as long as there are few collisions — as with a good hash function with a large range — the
effect is small.

10
solution:

1. Hash all n objects into b buckets using a good hash function. (b could be roughly n,
for example).

2. In each bucket, use brute-force search (i.e., compare all pairs) on the objects in that
bucket to identify and remove duplicate objects.9

Why is this a good solution? Duplicate objects hash to the same bucket, so all duplicate
objects are identified. With a good hash function and a sufficiently large number b of
buckets, different objects usually hash to different buckets. Thus, in a given bucket, we
expect a small number of distinct objects, so brute-force search in a bucket does not waste
much time comparing objects that are distinct and in the same bucket due to a hash function
collision.
Naively extending this idea to filter near-duplicates fails utterly. The problem is that
two objects x and x0 that are almost the same (but still distinct) are generally mapped to
unrelated buckets by a good hash function. To extend duplicate detection to near-duplicate
detection, we want a function h such that, if x and x0 are almost the same, then h is likely to
map x and x0 to the same bucket. This is the idea behind locality sensitive hashing (LSH).
(Additional optional material, not covered in lecture, to be added.)

References
[1] D. Achlioptas. Database-friendly random projections: Johnson-Lindenstrauss with bi-
nary coins. Journal of Computer and System Sciences, 66(4):671–687, 2003.

[2] N. Ailon and B. Chazelle. Faster dimension reduction. Communications of the ACM,
53(2):97–104, 2010.

[3] S. Dasgupta and A. Gupta. An elementary proof of a theorem of Johnson and Linden-
strauss. Random Structures and Algorithms, 22(1):60–65, 2003.

[4] W. B. Johnson and J. Lindenstrauss. Extensions of Lipschitz mappings into a Hilbert

space. In Conference in modern analysis and probability, pages 189–206, 1984.

[5] Y. Rabinovich N. Linial, E. London. The geometry of graphs and some of its algorithmic
applications. Combinatorica, 15(2):215–245, 1995.

9
It’s often possible to be smarter here, for example if it’s possible to sort the objects.

Swatch Watch - A Case Study
100% (2)
Swatch Watch - A Case Study
14 pages
p117 Andoni
No ratings yet
p117 Andoni
6 pages
07.01.approximate Nearest Neighbor Queries in Fixed Dimensions
No ratings yet
07.01.approximate Nearest Neighbor Queries in Fixed Dimensions
11 pages
Hashing For Similarity Search: A Survey: Jingdong Wang, Heng Tao Shen, Jingkuan Song, and Jianqiu Ji
No ratings yet
Hashing For Similarity Search: A Survey: Jingdong Wang, Heng Tao Shen, Jingkuan Song, and Jianqiu Ji
29 pages
Nguyen Princeton 0181D 11063
No ratings yet
Nguyen Princeton 0181D 11063
168 pages
Chapter 4
No ratings yet
Chapter 4
8 pages
An Optimal Algorithm For Approximate Nearest
No ratings yet
An Optimal Algorithm For Approximate Nearest
33 pages
When Is "Nearest Neighbor" Meaningful?: Abstract. We Explore The Effect of Dimensionality On The "Nearest Neigh
No ratings yet
When Is "Nearest Neighbor" Meaningful?: Abstract. We Explore The Effect of Dimensionality On The "Nearest Neigh
19 pages
1 Applications of Nearest Neighbor
No ratings yet
1 Applications of Nearest Neighbor
5 pages
CS168: The Modern Algorithmic Toolbox Lecture #3: Similarity Metrics and Kd-Trees
No ratings yet
CS168: The Modern Algorithmic Toolbox Lecture #3: Similarity Metrics and Kd-Trees
6 pages
UNIT 2 Bigdata Mining and Analytics
No ratings yet
UNIT 2 Bigdata Mining and Analytics
18 pages
Locality-Sensitive Hashing Scheme Based On P-Stable Distributions
No ratings yet
Locality-Sensitive Hashing Scheme Based On P-Stable Distributions
10 pages
Similarity Analysis
No ratings yet
Similarity Analysis
85 pages
Nearest Neighbor Search
No ratings yet
Nearest Neighbor Search
9 pages
similarity search-kd tree
No ratings yet
similarity search-kd tree
5 pages
Locality-Sensitive Hashing Scheme Based On P-Stable Distributions
No ratings yet
Locality-Sensitive Hashing Scheme Based On P-Stable Distributions
10 pages
01 Basics 02knn 02
No ratings yet
01 Basics 02knn 02
7 pages
Instance Based Learning
No ratings yet
Instance Based Learning
20 pages
Mini-Project #2: Instructions
No ratings yet
Mini-Project #2: Instructions
5 pages
IP Digital Geometry L2
No ratings yet
IP Digital Geometry L2
170 pages
Self Reading - KNN - Notes
No ratings yet
Self Reading - KNN - Notes
7 pages
Unit -2 DAA
No ratings yet
Unit -2 DAA
33 pages
Chapter_2
No ratings yet
Chapter_2
70 pages
Unit - IV
No ratings yet
Unit - IV
78 pages
Introduction To Data Science: Tom A S Horv Ath
No ratings yet
Introduction To Data Science: Tom A S Horv Ath
39 pages
Unit II
No ratings yet
Unit II
94 pages
DS_Module 4
No ratings yet
DS_Module 4
57 pages
Dimensionality Reduction
No ratings yet
Dimensionality Reduction
274 pages
ML Chapter 8 (IBL) Notes
No ratings yet
ML Chapter 8 (IBL) Notes
60 pages
20180723161729D4730 - Pert18 - K-Nearest Neighbor
No ratings yet
20180723161729D4730 - Pert18 - K-Nearest Neighbor
22 pages
Chapter 2
No ratings yet
Chapter 2
26 pages
CS246: Mining Massive Datasets Jure Leskovec,: Stanford University
No ratings yet
CS246: Mining Massive Datasets Jure Leskovec,: Stanford University
58 pages
Graph Theory
No ratings yet
Graph Theory
13 pages
Neighbour of A Pixel
No ratings yet
Neighbour of A Pixel
26 pages
Lecture 3
No ratings yet
Lecture 3
58 pages
Chapter6 - Unit IV2024
No ratings yet
Chapter6 - Unit IV2024
84 pages
Fast and exact fixed-radius neighbor search based on sorting
No ratings yet
Fast and exact fixed-radius neighbor search based on sorting
17 pages
Machine learning Lecture 02
No ratings yet
Machine learning Lecture 02
25 pages
Introduction to String Matching
No ratings yet
Introduction to String Matching
28 pages
Clustering, A Tool To Analyze Data Points
No ratings yet
Clustering, A Tool To Analyze Data Points
61 pages
Data Mining: Similarity and Distance Recommendation Systems Sketching, Locality Sensitive Hashing
No ratings yet
Data Mining: Similarity and Distance Recommendation Systems Sketching, Locality Sensitive Hashing
57 pages
A Short Course in Discrete Mathematics
From Everand
A Short Course in Discrete Mathematics
Edward A. Bender
3/5 (1)
ML4 (1)
No ratings yet
ML4 (1)
23 pages
Multimedia Information Retrieval
No ratings yet
Multimedia Information Retrieval
11 pages
L3 KNN
No ratings yet
L3 KNN
17 pages
Efficient Nearest Neighbor Search in High Dimensional Hamming Space
No ratings yet
Efficient Nearest Neighbor Search in High Dimensional Hamming Space
11 pages
Similarity_Based_learning_(part_2_)__
No ratings yet
Similarity_Based_learning_(part_2_)__
15 pages
003 05 KNN - Enhancements W3L2
No ratings yet
003 05 KNN - Enhancements W3L2
10 pages
Deep Learning Fundamentals in Python
From Everand
Deep Learning Fundamentals in Python
LazyProgrammer
4/5 (9)
ML-Chapter-8 (IBL) - Notes
No ratings yet
ML-Chapter-8 (IBL) - Notes
15 pages
An Optimized Divide-and-Conquer Algorithm For The Closest-Pair Problem in The Planar Case
No ratings yet
An Optimized Divide-and-Conquer Algorithm For The Closest-Pair Problem in The Planar Case
14 pages
Text Clustering and Validation For Web Search Results
No ratings yet
Text Clustering and Validation For Web Search Results
7 pages
An Optimized Divide-and-Conquer Algorithm For The Closest-Pair Problem in The Planar Case
100% (1)
An Optimized Divide-and-Conquer Algorithm For The Closest-Pair Problem in The Planar Case
14 pages
Closest Pair of Coordinates
No ratings yet
Closest Pair of Coordinates
24 pages
4.4-InstanceBasedLearning Part 1
No ratings yet
4.4-InstanceBasedLearning Part 1
16 pages
L19.Kd Trees
0% (1)
L19.Kd Trees
19 pages
Lecture Slides-Week15,16
No ratings yet
Lecture Slides-Week15,16
50 pages
1907.00845
No ratings yet
1907.00845
31 pages
JDA_CorrectedAfterPublication
No ratings yet
JDA_CorrectedAfterPublication
16 pages
Introduction To Machine Learning: K-Nearest Neighbor Algorithm
No ratings yet
Introduction To Machine Learning: K-Nearest Neighbor Algorithm
25 pages
Fixed-Radius Near Neighbors
No ratings yet
Fixed-Radius Near Neighbors
2 pages
Pe 11
No ratings yet
Pe 11
34 pages
Summer Training Project Report On "Awareness of Insurance Policies in Shri Ram Life Insurance" at Shriram Life Insurance Company Prayagraj
No ratings yet
Summer Training Project Report On "Awareness of Insurance Policies in Shri Ram Life Insurance" at Shriram Life Insurance Company Prayagraj
125 pages
Otb Case Study-Pak Sweets
No ratings yet
Otb Case Study-Pak Sweets
4 pages
Earth Science Remarks / Date Lesson 9: Earth Materials and Resource Energy Resources Content Standard
No ratings yet
Earth Science Remarks / Date Lesson 9: Earth Materials and Resource Energy Resources Content Standard
5 pages
Std X-English Language-Must Know Paper 4
No ratings yet
Std X-English Language-Must Know Paper 4
5 pages
12-Month Findings For Petitions To List The Greater Sage-Grouse As Threatened or Endangered, Proposed Rule 2010 USFWS
No ratings yet
12-Month Findings For Petitions To List The Greater Sage-Grouse As Threatened or Endangered, Proposed Rule 2010 USFWS
106 pages
7th semester Practical
No ratings yet
7th semester Practical
21 pages
Factors Influencing The Behavior in Recycling of E-Waste Using Integrated TPB and NAM Model
No ratings yet
Factors Influencing The Behavior in Recycling of E-Waste Using Integrated TPB and NAM Model
19 pages
Directional Method - Flat Roof With Parapet Example
No ratings yet
Directional Method - Flat Roof With Parapet Example
6 pages
PAROC Pro Section 100: Product Datasheet
No ratings yet
PAROC Pro Section 100: Product Datasheet
2 pages
Module 4 Slides Tracking RADAR and Sequential Lobing
No ratings yet
Module 4 Slides Tracking RADAR and Sequential Lobing
77 pages
Uy v. CA
No ratings yet
Uy v. CA
1 page
AI For Content Marketing Blueprint
No ratings yet
AI For Content Marketing Blueprint
23 pages
2 Dr. Chan
No ratings yet
2 Dr. Chan
25 pages
Chapter 7-Law of Insurance
No ratings yet
Chapter 7-Law of Insurance
9 pages
Advanced TVM Problems (1) (4)
No ratings yet
Advanced TVM Problems (1) (4)
4 pages
RubricsExamplesWriting PDF
No ratings yet
RubricsExamplesWriting PDF
730 pages
Alexander Putarch
No ratings yet
Alexander Putarch
2 pages
Statement of Financial Position
No ratings yet
Statement of Financial Position
3 pages
SF LAN Network Units TD T811123 en e
No ratings yet
SF LAN Network Units TD T811123 en e
66 pages
2019 Corbett Staged Model For Porphyry Development
No ratings yet
2019 Corbett Staged Model For Porphyry Development
7 pages
Rail Trail Economic Impact Study
No ratings yet
Rail Trail Economic Impact Study
24 pages
Violence Against Women
100% (1)
Violence Against Women
7 pages
SOA Spring Exam Schedule
No ratings yet
SOA Spring Exam Schedule
1 page
AcronisCyberProtect 15 Best Practices en-US
No ratings yet
AcronisCyberProtect 15 Best Practices en-US
100 pages
DEH-P3000IB: Installation English Installation English
No ratings yet
DEH-P3000IB: Installation English Installation English
7 pages
Concept Paper Back2School FINAL
No ratings yet
Concept Paper Back2School FINAL
4 pages
Coffee Science Biotechnological Advances, Economics, And Health Benefits
No ratings yet
Coffee Science Biotechnological Advances, Economics, And Health Benefits
717 pages

JL transformation_minlash

Uploaded by

JL transformation_minlash

Uploaded by

CS168: The Modern Algorithmic Toolbox

Lecture #4: Dimensionality Reduction

1 The Curse of Dimensionality in the Nearest Neigh-

3. Compute answers to nearest-neighbor queries in the low-dimensional space.

3 Role Model: Fingerprints

1. If x = y, then f (x) = f (y). That is, the property of equality is preserved.

4 L2 Distance and Random Projections

4.1 The High-Level Idea

4.2 Review: Gaussian Distributions

4.3 Step 1: Unbiased Estimator of Squared L2 Distance

By the definition of fr , we have

E (fr (x) − fr (y))2 = kx − yk22 .

4.5 The Johnson-Lindenstrauss Transform

(d, k)-matrix A .. (d, 1)-vector Ax

a1,1 ... a1,i ... a1,k Ax1

Figure 3: The Johnson-Lindenstrauss tranform A for dimension reduction.

a fixed pair x, y of k-vectors, we have

5 Jaccard Similarity and MinHash

5.1 The High-Level Idea

1. Choose a permutation π of the universe U uniformly at random.8

Figure 4: Two set A, B ⊆ U .

2. Map each set S to its minimum element argminx∈S π(x) under π.

6 A Glimpse of Locality Sensitive Hashing

[4] W. B. Johnson and J. Lindenstrauss. Extensions of Lipschitz mappings into a Hilbert

You might also like