0% found this document useful (0 votes)

41 views

Online and Streaming Algorithms For Clustering - UCSD - Lec6

This document summarizes an online lecture about online and streaming algorithms for clustering. It discusses: 1) The problem of online k-clustering where data points arrive sequentially and must be clustered without storing all points. 2) An online k-center algorithm that maintains cluster centers within a constant factor of optimal. 3) How a cover tree data structure can be used to obtain clusterings for any k within a constant factor of optimal.

Uploaded by

Thuy Nguyen

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

41 views

Online and Streaming Algorithms For Clustering - UCSD - Lec6

Uploaded by

Thuy Nguyen

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 9

CSE 291: Unsupervised learning Spring 2008

Lecture 6 — Online and streaming algorithms for clustering

6.1 On-line k-clustering

To the extent that clustering takes place in the brain, it happens in an on-line manner: each data point
comes in, is processed, and then goes away never to return. To formalize this, imagine an endless stream of
data points x1 , x2 , . . . and a k-clustering algorithm that works according to the following paradigm:
repeat forever:
get a new data point x
update the current set of k centers
This algorithm cannot store all the data it sees, because the process goes on ad infinitum. More precisely, we
will allow it space proportional to k. And at any given moment in time, we want the algorithm’s k-clustering
to be close to the optimal clustering of all the data seen so far.
This is a tall order, but nonetheless we can manage it for the simplest of our cost functions, k-center.

6.2 The on-line k-center problem

The setting is, as usual, a metric space (X , ρ). Recall that for data set S ⊂ X and centers T ⊂ X we define
cost(T ) = maxx∈S ρ(x, T ). In the on-line setting, our algorithm must conform to the following template.
repeat forever:
get x ∈ X
update centers T ⊂ X , |T | = k
And for all times t, we would like it to be the case that the cost of T for the points seen so far is close to
the optimal cost achievable for those particular points. We will look at two schemes that fit this bill.

6.2.1 A doubling algorithm

This algorithm is due to Charikar, Chekuri, Feder, and Motwani (1997).
T ← {first k distinct data points}
R ← smallest interpoint distance in T
repeat forever:
while |T | ≤ k:
(A) get new point x
if ρ(x, T ) > 2R : T ← T ∪ {x}
(B) T′ ← { }
while there exists z ∈ T such that ρ(z, T ′ ) > 2R:
T ′ ← T ′ ∪ {z}
T ← T′
(C) R ← 2R

6-1
CSE 291 Lecture 6 — Online and streaming algorithms for clustering Spring 2008

Here 2R is roughly the cost of the current clustering, as we will shortly make precise. The lemmas below
describe invariants that hold at lines (A), (B), and (C) in the code; each refers to the start of the line (that
is, before the line is executed).

Lemma 1. All data points seen so far are (i) within distance 2R of T at (B) and (ii) within distance 4R of
T at (C).

Proof. Use induction on the number of iterations through the main loop. Notice that (i) holds on the first
iteration, and that if (i) holds on the pth iteration then (ii) holds on the pth iteration and (i) holds on the
(p + 1)st iteration.

To prove an approximation guarantee, we also need to lower bound the cost of the optimal clustering in
terms of R.

Lemma 2. At (B), there are k + 1 centers at distance ≥ R from each other.

Proof. A similar induction. It is easiest to simultaneously establish that at (C), all centers are distance ≥ 2R
apart.

We now put these together to give a performance guarantee.

Theorem 3. Whenever the algorithm is at (A),

cost(T ) ≤ 8 · cost(optimal k centers for data seen so far).

Proof. At location (A), we have |T | ≤ k and cost(T ) ≤ 2R.

The last time the algorithm was at (B), there were k + 1 centers at distance ≥ R/2 from each other,
implying an optimal cost of ≥ R/4.

Problem 1. A commonly-used scheme for on-line k-means works as follows.

initialize the k centers t1 , . . . , tk in any way

create counters n1 , . . . , nk , all initialized to zero
repeat forever:
get data point x
let ti be its closest center
set ti ← (ni ti + x)/(ni + 1) and ni ← ni + 1
What can be said about this scheme?

6.2.2 An algorithm based on the cover tree

The algorithm of the previous section works for a pre-specified value of k. Is there some way to handle all k
simultaneously? Indeed, there is a clean solution to this; it is made possible by a data structure called the
cover tree, which was nicely formalized by Beygelzimer, Kakade, and Langford (2003), although it had been
in the air for a while before then.
Assume for the moment that all interpoint distances are ≤ 1. Of course this is ridiculous, but we will
get rid of this assumption shortly. A cover tree on data points x1 , . . . , xn is a rooted infinite tree with the
following properties.
1. Each node of the tree is associated with one of the data points xi .

2. If a node is associated with xi , then one of its children must also be associated with xi .

6-2
CSE 291 Lecture 6 — Online and streaming algorithms for clustering Spring 2008

3. All nodes at depth j are at distance at least 1/2j from each other.

4. Each node at depth j + 1 is within distance 1/2j of its parent (at depth j).

This is described as an infinite tree for simplicity of analysis, but it would not be stored as such. In practice,
there is no need to duplicate a node as its own child, and so the tree would take up O(n) space.
The figure below gives an example of a cover tree for a data set of five points. This is just the top few
levels of the tree, but the rest of it is simply a duplication of the bottom row. From the structure of the tree
we can conclude, for instance, that x1 , x2 , x5 are all at distance ≥ 1/2 from each other (since they are all at
depth 1), and that the distance between x2 and x3 is ≤ 1/4 (since x3 is at depth 3, and is a child of x2 ).

x1
x1 depth 0
x4
x2 x2 x1 x5 1
x5

x3
x2 x1 x5 x4 2

3
x3 x2 x1 x5 x4

What makes cover trees especially convenient is that they can be built on-line, one point at a time. To
insert a new point x: find the largest j such that x is within 1/2j of some node p at depth j in the tree; and
make x a child of p. Do you see why this maintains the four defining properties?
Once the tree is built, it is easy to obtain k-clusterings from it.
Lemma 4. For any k, consider the deepest level of the tree with ≤ k nodes, and let Tk be those nodes. Then:

cost(Tk ) ≤ 8 · cost(optimal k centers).

Proof. Fix any k, and suppose j is the deepest level with ≤ k nodes. By Property 4, all of Tk ’s children are
within distance 1/2j of it, and its grandchildren are within distance 1/2j +1/2j+1 of it, and so on. Therefore,
1 1 1
cost(Tk ) ≤ + j+1 + · · · = j−1 .
2j 2 2
Meanwhile, level j + 1 has ≥ k + 1 nodes, and by Property 3, these are at distance ≥ 1/2j+1 from each other.
Therefore the optimal k-clustering has cost at least 1/2j+2 .

To get rid of the assumption that all interpoint distances are ≤ 1, simply allow the tree to have depths
−1, −2, −3 and so on. Initially, the root is at depth 0, but if a node arrives that is at distance ≥ 8 (say)
from all existing nodes, then it is made the new root and is put at depth −3.
Also, the creation of the data structure appears to take O(n2 ) time and O(n) space. But if we are only
interested in values of k from 1 to K, then we need only keep the top few levels of the tree, those with ≤ K
nodes. This reduces the time and space requirements to O(nK) and O(K), respectively.

6-3
CSE 291 Lecture 6 — Online and streaming algorithms for clustering Spring 2008

6.2.3 On-line clustering algorithms: epilogue

It is an open problem to develop a good on-line algorithm for k-means clustering. There are at least two
different ways in which a proposed scheme might be analyzed. The first supposes at each time t, the algorithm
sees a new data point xt , and then outputs a set of k centers Tt . The hope is that for some constant α ≥ 1,
for all t,
cost(Tt ) ≤ α · cost(best k centers for x1 , . . . , xt ).
This is the pattern we have followed in our k-center examples.
The second kind of analysis is the more usual setting of the on-line learning literature. It supposes that
at each time t, the algorithm announces a set of k centers Tt , then sees a new point xt and incurs a loss
equal to the cost of xt under Tt (that is, the squared distance from xt to the closest center in Tt ). The hope
is that at any time t, the total loss incurred upto time t,
X
min kxt − zk2
z∈Tt
t′ ≤t

is not too much more than the optimal cost achievable by the best k centers for x1 , . . . , xt .

Problem 2. Develop an on-line algorithm for k-means clustering that has a performance guarantee under
either of the two criteria above.

6.3 Streaming algorithms for clustering

The streaming model of computation is inspired by massive data sets that are too large to fit in memory.
Unlike on-line learning, which continues forever, there is a finite job to be done. But it is a very big job,
and so only a small portion of the input can be held in memory
√ at any one time. For an input of size n,
one typically seeks algorithm that use memory o(n) (say, n) and that need to make just one or two passes
through the data.

On-line Streaming
Endless stream of data Stream of (known) length n
Fixed amount of memory Memory available is o(n)
Tested at every time step Tested only at the very end
Each point is seen only once More than one pass may be possible

There are some standard ways to convert regular algorithms (that assume the data fits in memory) into
streaming algorithms. These include divide-and-conquer and random sampling. We’ll now see k-medoid
algorithms based on each of these.

6.3.1 A streaming k-medoid algorithm based on divide-and-conquer

Recall the k-medoid problem: the setting is a metric space (X , ρ).

Input: Finite set S ⊂ X ; integer k.

Output: T ⊂ S with |T | = k.
P
Goal: Minimize cost(T ) = x∈S ρ(x, T ).

6-4
CSE 291 Lecture 6 — Online and streaming algorithms for clustering Spring 2008

T∗ optimal k medoids

huge data stream S

S1 S2 S3 Sl
(a, b)-approx
T1 T2 T3 Tl

Sw weighted instance of akl points

(a′ , b′ )-approx
T

Figure 6.1. A streaming algorithm for the k-medoid problem, based on divide and conquer.

Earlier, we saw an LP-based algorithm that finds 2k centers whose cost is at most 4 times that of the best
k-medoid solution. This extends to the case when the problem is weighted, that is, when each data point x
has an associated weight w(x), and the cost function is
X
cost(T ) = w(x)ρ(x, T ).
x∈S

We will call it a (2, 4)-approximation algorithm. It turns out that (a, b)-approximation algorithms are
available for many combinations (a, b).
A natural way to deal with a huge data stream S is to read as much of it as will fit into memory (call
this portion S1 ), solve this sub-instance, then read the next batch S2 , solve this sub-instance, and so on. At
the end, the partial solutions need to be combined somehow.

Divide S into groups S1 , S2 , . . . , Sl

for each i = 1, 2, . . . , l:
run an (a, b)-approximation alg on Si to get ≤ ak medoids Ti = {ti1 , ti2 , . . .}
suppose Si1 ∪ Si2 ∪ · · · are the induced clusters of Si
Sw ← T1 ∪ T2 ∪ · · · ∪ Tl , with weights w(tij ) ← |Sij |
run an (a′ , b′ )-approximation algorithm on weighted Sw to get ≤ a′ k centers T
return T

Figure 6.1 shows √

this pictorially. An interesting case to consider is when stream S has length√n, and each
batch Si consists of nk points. In this case, the second clustering
√ problem Sw is also of size nk, and so
the whole algorithm operates with just a single pass and O( nk) memory.
Before analyzing the algorithm we introduce some notation. Let T ∗ = {t∗1 , . . . , t∗k } be the optimal k
medoids for data set S. Let t∗ (x) ∈ T ∗ be the medoid closest to point x. Likewise, let t(x) ∈ T be the point
in T closest to x, and ti (x) the point in Ti closest to x. Since we are dividing the data into subsets Si , we
will need to talk about the costs of clustering these subsets as well as the overall cost of clustering S. To

6-5
CSE 291 Lecture 6 — Online and streaming algorithms for clustering Spring 2008

this end, define

cost(S ′ , T ′ ) = cost of medoids T ′ for data S ′
′
P
= Px∈S ′ ρ(x, T ) ′ unweighted instance
x∈S ′ w(x)ρ(x, T ) weighted instance

Theorem 5. This streaming algorithm is an (a′ , 2b + 2b′ (2b + 1))-approximation.

The a′ part is obvious; the rest will be shown over the course of the next few lemmas. The first step
is to bound the overall cost in terms of the costs of two clustering steps; this is a simple application of the
triangle inequality.
Pl
Lemma 6. cost(S, T ) ≤ i=1 cost(Si , Ti ) + cost(Sw , T ).
Proof. Recall that the second clustering problem Sw consists of all medoids tij from the first step, with
weights w(tij ) = |Sij |.
l X
X l X
X
cost(S, T ) = ρ(x, T ) ≤ (ρ(x, ti (x)) + ρ(ti (x), T ))
i=1 x∈Si i=1 x∈Si
l
X l X
X
= cost(Si , Ti ) + |Sij |ρ(tij , T )
i=1 i=1 j
l
X
= cost(Si , Ti ) + cost(Sw , T ).
i=1

The next lemma says that when clustering a data set S ′ , picking centers from S ′ is at most twice as bad
as picking centers from the entire underlying metric space X .
Lemma 7. For any S ′ ⊂ X , we have
min cost(S ′ , T ′ ) ≤ 2 min cost(S ′ , T ′ ).
T ′ ⊂S ′ ′ T ⊂X

Proof. Let T ′ ⊂ X be the optimal solution chosen from X . For each induced cluster of S ′ , replace its center
t′ ∈ T ′ by the closest neighbor of t′ in S ′ . This at most doubles the cost, by the triangle inequality.
Our final goal is to upper-bound cost(S, T ), and we will do so by bounding theP two terms on the right-
hand side of Lemma 6. Let’s start with the first of them. We’d certainly hope that i cost(Si , Ti ) is smaller
than cost(S, T ∗ ); after all, the former uses way more representatives (about akl of them) to approximate the
same set S. We now give a coarse upper bound to this effect.
Pl ∗
Lemma 8. i=1 cost(Si , Ti ) ≤ 2b · cost(S, T ).
Proof. Each Ti is a b-approximation solution to the k-medoid problem for Si . Thus
l
X l
X
cost(Si , Ti ) ≤ b · min cost(Si , T ′ )
T ⊂Si
′
i=1 i=1
l
X
≤ 2b · min cost(Si , T ′ )
T ⊂X
′
i=1
l
X
≤ 2b · cost(Si , T ∗ ) = cost(S, T ∗ ).
i=1

6-6
CSE 291 Lecture 6 — Online and streaming algorithms for clustering Spring 2008

The second inequality is from Lemma 7.

Finally, we bound the second term on the right-hand side of Lemma 6.
Pl
Lemma 9. cost(Sw , T ) ≤ 2b′ ( i=1 cost(Si , Ti ) + cost(S, T ∗ )).
Proof. It is enough to upper-bound cost(Sw , T ∗ ) and then invoke
cost(Sw , T ) ≤ b′ · min cost(Sw , T ′ ) ≤ 2b′ · min cost(Sw , T ′ ) ≤ 2b′ · cost(Sw , T ∗ ).
′T ⊂Sw ′ T ⊂X

To do so, we need only the triangle inequality.

X X X
cost(Sw , T ∗ ) = |Sij |ρ(tij , T ∗ ) ≤ (ρ(x, tij ) + ρ(x, t∗ (x)))
i,j i,j x∈Sij
XX
= (ρ(x, ti (x)) + ρ(x, t∗ (x)))
i x∈Si
X
= cost(Si , Ti ) + cost(S, T ∗ ).
i

The theorem follows immediately by putting together the last two lemmas with Lemma 6.
Problem 3. Notice that this streaming algorithm uses two k-medoid subroutines. Even if both are perfect,
that is, if both are (1, 1)-approximations, the overall bound on the approximation factor is 8. Can a better
factor be achieved?

6.3.2 A streaming k-medoid algorithm based on random sampling

In this approach, we randomly sample from stream S, taking as many points as will fit in memory. We then
solve the clustering problem on this subset S ′ ⊂ S. The resulting cluster centers should work well for all of
S, unless S contains a cluster that has few points (and is thus not represented in S ′ ) and is far away from
the rest of the data. A second round of clustering is used to deal with this contigency.
The algorithm below is due to Indyk (1999). Its input is a stream S of length n, and also a confidence
parameter δ.
let S ′ ⊂ S be a random subset of size s
run an (a, b)-approximation alg on S ′ to get ≤ ak medoids T ′
let S ′′ ⊂ S be the (8kn/s) log(k/δ) points furthest from T ′
run an (a, b)-approximation alg on S ′′ to get ≤ ak medoids T ′′
return T ′ ∪ T ′′
p
This algorithm returns
p 2ak medoids. It requires two passes through the data, and if s is set to 8kn log(k/δ),
it uses memory O( 8kn log(k/δ)).
Theorem 10. With probability ≥ 1 − 2δ, this is a (2a, (1 + 2b)(1 + 2/δ))-approximation.
As before, let T ∗ = {t∗1 , . . . , t∗k } denote the optimal medoids. Suppose these induce a clustering S1 , . . . , Sk
of S. In the initial random sample S ′ , the large clusters — those with a lot of points — will be well-
represented. We now formalize this. Let L be the large clusters:
L = {i : |Si | ≥ (8n/s) log(k/δ)}.
The sample S ′ contains points Si′ = Si ∩ S ′ from cluster Si .

6-7
CSE 291 Lecture 6 — Online and streaming algorithms for clustering Spring 2008

1 s
Lemma 11. With probability ≥ 1 − δ, for all i ∈ L: |Si′ | ≥ 2 · n · |Si |.
Proof. Use a multiplicative form of the Chernoff bound.
We’ll next show that the optimal centers T ∗ do a pretty good job on the random sample S ′ .
1 s
Lemma 12. With probability ≥ 1 − δ, cost(S ′ , T ∗ ) ≤ δ · n · cost(S, T ∗ ).
Proof. The expected value of cost(S ′ , T ∗ ) is exactly (s/n)cost(S, T ∗ ), whereupon the lemma follows by
Markov’s inequality.
Henceforth assume that the high-probability events of Lemmas 11 and 12 hold. Lemma 12 implies that
T ′ is also pretty good for S ′ .
1 s
Lemma 13. cost(S ′ , T ′ ) ≤ 2b · δ · n · cost(S, T ∗ ).
Proof. This is by now a familiar argument.

cost(S ′ , T ′ ) ≤ b · min′ cost(S ′ , T ) ≤ 2b · min cost(S ′ , T ) ≤ 2b · cost(S ′ , T ∗ )

T ⊂S T ⊂X

and the rest follows from Lemma 12.

Since the large clusters are well-represented in S ′ , we would expect centers T ′ to do a good job with
them.
Lemma 14. cost (∪i∈L Si , T ′ ) ≤ cost(S, T ∗ ) · 1 + 2(1 + 2b) 1δ .

Proof. We’ll show that for each optimal medoid t∗i of a large cluster, there must be a point in T ′ relatively
close by. To show this, we use the triangle inequality ρ(t∗i , T ′ ) ≤ ρ(t∗i , y) + ρ(y, t′ (y)), where t′ (y) ∈ T ′ is the
medoid in T ′ closest to point y. This inequality holds for all y, so we can average it over all y ∈ Si′ :
1 X
ρ(t∗i , T ′ ) ≤ (ρ(y, t∗i ) + ρ(y, t′ (y))) .
|Si′ | ′
y∈Si

Now we can bound the cost of the large clusters with respect to medoids T ′ .
XX
cost (∪i∈L Si , T ′ ) ≤ (ρ(x, t∗i ) + ρ(t∗i , T ′ ))
i∈L x∈Si
X
≤ cost(S, T ∗ ) + |Si |ρ(t∗i , T ′ )
i∈L
X |Si | X
≤ cost(S, T ∗ ) + (ρ(y, t∗i ) + ρ(y, t′ (y)))
|Si′ |
i∈L y∈Si′
2n X
≤ cost(S, T ∗ ) + (ρ(y, t∗ (y)) + ρ(y, t′ (y)))
s ′ y∈S
2n
= cost(S, T ∗ ) + (cost(S ′ , T ∗ ) + cost(S ′ , T ′ ))
s
where the last inequality is from Lemma 11. The rest follows from Lemmas 12 and 13.
Thus the large clusters are well-represented by T ′ . But this isn’t exactly what we need. Looking back at
the algorithm, we see that medoids T ′′ will take care of S ′′ , which means that we need medoids T ′ to take
care of S \ S ′′ .

6-8
CSE 291 Lecture 6 — Online and streaming algorithms for clustering Spring 2008

Lemma 15. cost(S \ S ′′ , T ′ ) ≤ cost(S, T ∗ ) · 1 + 2(1 + 2b) 1δ .

Proof. A large cluster is one with at least (8n/s) log(k/δ) points. Therefore, since there are at most k small
clusters, the total number of points in small clusters is at most (8kn/s) log(k/δ). This means that the large
clusters account for the vast majority of the data, at least n − (8kn/s) log(k/δ) points.
S\S ′′ is exactly the n−(8kn/s) log(k/δ) points closest to T ′ . Therefore cost(S\S ′′ , T ′ ) ≤ cost (∪i∈L Si , T ′ ).

To prove the main theorem, it just remains to bound the cost of S ′′ .

Lemma 16. cost(S ′′ , T ′′ ) ≤ 2b · cost(S, T ∗ ).

Proof. This is routine.

cost(S ′′ , T ′′ ) ≤ b · min′′ cost(S ′′ , T ) ≤ 2b · min cost(S ′′ , T ) ≤ 2b · cost(S ′′ , T ∗ ) ≤ 2b · cost(S, T ∗ ).

T ⊂S T ⊂X

Problem 4. The approximation factor contains a highly unpleasant 1/δ term. Is there a way to reduce this
to, say, log(1/δ)?

6-9

Fabless Semiconductor Implementation
100% (3)
Fabless Semiconductor Implementation
358 pages
Bhaskara 20 A
No ratings yet
Bhaskara 20 A
26 pages
DAA Unit III
No ratings yet
DAA Unit III
53 pages
1903.04936v1
No ratings yet
1903.04936v1
12 pages
An Efficient K-Means Clustering Algorithm
No ratings yet
An Efficient K-Means Clustering Algorithm
7 pages
Efficient Clustering Algorithm For Large Database
No ratings yet
Efficient Clustering Algorithm For Large Database
25 pages
2IL50 Data Structures: 2017-18 Q3 Lecture 9: Range Searching
No ratings yet
2IL50 Data Structures: 2017-18 Q3 Lecture 9: Range Searching
40 pages
An Efficient K-Means Clustering Algorithm
No ratings yet
An Efficient K-Means Clustering Algorithm
6 pages
10 K Means Clustering PDF
No ratings yet
10 K Means Clustering PDF
5 pages
Chapter 6
No ratings yet
Chapter 6
62 pages
Trees For Semidynamic Point Sets: AT&T Bell Labo Ttories Murray Hill, NJ 07974
No ratings yet
Trees For Semidynamic Point Sets: AT&T Bell Labo Ttories Murray Hill, NJ 07974
11 pages
Stopping_times_and_online_algorithms___Lecture_notes__Copy_ (2)
No ratings yet
Stopping_times_and_online_algorithms___Lecture_notes__Copy_ (2)
11 pages
Clustering Algorithm: An Unsupervised Learning Approach
No ratings yet
Clustering Algorithm: An Unsupervised Learning Approach
23 pages
Optimal Interval Clustering: Application To Bregman Clustering and Statistical Mixture Learning
No ratings yet
Optimal Interval Clustering: Application To Bregman Clustering and Statistical Mixture Learning
10 pages
Lect 4
No ratings yet
Lect 4
34 pages
An Efficient Enhanced K-Means Clustering Algorithm
No ratings yet
An Efficient Enhanced K-Means Clustering Algorithm
8 pages
Kd Trees
No ratings yet
Kd Trees
12 pages
kdtrees
No ratings yet
kdtrees
12 pages
Chazelle 1984
No ratings yet
Chazelle 1984
11 pages
Stopping Times and Online Algorithms Lecture Notes Copy (1)
No ratings yet
Stopping Times and Online Algorithms Lecture Notes Copy (1)
11 pages
Computational Geometry: Gun Srijuntongsiri
No ratings yet
Computational Geometry: Gun Srijuntongsiri
71 pages
Pattern Recognition Letters: Krista Rizman Z Alik
No ratings yet
Pattern Recognition Letters: Krista Rizman Z Alik
7 pages
Chapter 9 Clustering
No ratings yet
Chapter 9 Clustering
6 pages
Unit 5
No ratings yet
Unit 5
63 pages
KMeansPP Soda
No ratings yet
KMeansPP Soda
9 pages
WINSEM2020-21 CSE4020 ETH VL2020210504996 Reference Material I 26-Apr-2021 Clustering
No ratings yet
WINSEM2020-21 CSE4020 ETH VL2020210504996 Reference Material I 26-Apr-2021 Clustering
43 pages
Algorithm Unit 3
No ratings yet
Algorithm Unit 3
15 pages
The Basic Concepts of Algorithms: 2.1 The Minimal Spanning Tree Problem
No ratings yet
The Basic Concepts of Algorithms: 2.1 The Minimal Spanning Tree Problem
31 pages
8. Clustering
No ratings yet
8. Clustering
80 pages
Greedy Method
No ratings yet
Greedy Method
71 pages
Survey of Clustering Algorithms
No ratings yet
Survey of Clustering Algorithms
37 pages
KD Tree
No ratings yet
KD Tree
41 pages
Notes 01
No ratings yet
Notes 01
8 pages
Data Stream Clustering
No ratings yet
Data Stream Clustering
3 pages
Greedy Algorithms
No ratings yet
Greedy Algorithms
10 pages
ML L14 Clustering
No ratings yet
ML L14 Clustering
59 pages
A Tutorial On Clustering Algorithms
No ratings yet
A Tutorial On Clustering Algorithms
4 pages
DSML-ML09. Unsupervised Learning
No ratings yet
DSML-ML09. Unsupervised Learning
69 pages
kmeans-journal
No ratings yet
kmeans-journal
21 pages
Solution HW7
No ratings yet
Solution HW7
5 pages
Kmeans Is Np-Hard
No ratings yet
Kmeans Is Np-Hard
15 pages
02 Data Mining-Partitioning Method
No ratings yet
02 Data Mining-Partitioning Method
8 pages
Text Clustering and Validation For Web Search Results
No ratings yet
Text Clustering and Validation For Web Search Results
7 pages
Module 4
No ratings yet
Module 4
83 pages
Clustering L7
No ratings yet
Clustering L7
7 pages
17 23 Lecture Notes 1
No ratings yet
17 23 Lecture Notes 1
27 pages
I Jsa It 04132012
No ratings yet
I Jsa It 04132012
4 pages
Fin f05 Sol
No ratings yet
Fin f05 Sol
4 pages
4
No ratings yet
4
34 pages
GREEDY ALG.
No ratings yet
GREEDY ALG.
13 pages
Clustering Lec 1 Introduction To Clustering
No ratings yet
Clustering Lec 1 Introduction To Clustering
48 pages
Huffman Codes: Spanning Tree
No ratings yet
Huffman Codes: Spanning Tree
6 pages
greedy algorithms
No ratings yet
greedy algorithms
11 pages
Multidimensional Search Trees
No ratings yet
Multidimensional Search Trees
119 pages
Lecture06_RangeTree
No ratings yet
Lecture06_RangeTree
5 pages
Assignment 3: Kdtree: Due June 4, 11:59 PM
No ratings yet
Assignment 3: Kdtree: Due June 4, 11:59 PM
19 pages
BADSIS Assignment 3
No ratings yet
BADSIS Assignment 3
8 pages
Homework 4: Question 1 - Exercise 17.4-3, 17-3.6
No ratings yet
Homework 4: Question 1 - Exercise 17.4-3, 17-3.6
5 pages
Elementary Calculus
From Everand
Elementary Calculus
George N. Frempong
No ratings yet
Differential Forms
From Everand
Differential Forms
Henri Cartan
5/5 (2)
Student Solutions Manual to Accompany Economic Dynamics in Discrete Time, secondedition
From Everand
Student Solutions Manual to Accompany Economic Dynamics in Discrete Time, secondedition
Yue Jiang
4.5/5 (2)
Dtr0000018609 - A5.Catpart: Access Service View
No ratings yet
Dtr0000018609 - A5.Catpart: Access Service View
2 pages
Case Study 9
No ratings yet
Case Study 9
5 pages
Optelecom 9000 Series Installation and Operation Manual
No ratings yet
Optelecom 9000 Series Installation and Operation Manual
18 pages
PWM Gerorge Brake Lists-Prodlin3-1
No ratings yet
PWM Gerorge Brake Lists-Prodlin3-1
5 pages
Tagelus Top Mount Pool Filter Manual
No ratings yet
Tagelus Top Mount Pool Filter Manual
68 pages
Super Vs Hyper Duplex
No ratings yet
Super Vs Hyper Duplex
8 pages
Regulament Intern
No ratings yet
Regulament Intern
38 pages
Coal Mine Comms History Circa 84
No ratings yet
Coal Mine Comms History Circa 84
220 pages
IPU Air Starting Systems Brochure 2018 11
No ratings yet
IPU Air Starting Systems Brochure 2018 11
8 pages
Experiment Number (7) Determination of Octane Number by Shatox Sx-100M Instrumental
No ratings yet
Experiment Number (7) Determination of Octane Number by Shatox Sx-100M Instrumental
8 pages
7 CAAT-AIR-GM03 Guidance-Material-for-Foreign-Approved-Maintenance-Organization - I3R0 - 30oct2019 PDF
No ratings yet
7 CAAT-AIR-GM03 Guidance-Material-for-Foreign-Approved-Maintenance-Organization - I3R0 - 30oct2019 PDF
59 pages
EFT &amp Surge Waveforms and Simulation Models
No ratings yet
EFT &amp Surge Waveforms and Simulation Models
3 pages
TamRez 220 TG PDF
100% (2)
TamRez 220 TG PDF
2 pages
Pwma1003t 2003T 3003T
No ratings yet
Pwma1003t 2003T 3003T
5 pages
Long-Wavelength Infrared Semiconductor Lasers PDF
100% (1)
Long-Wavelength Infrared Semiconductor Lasers PDF
406 pages
ICT Value For Money Indicators Guidance 1) Introduction
No ratings yet
ICT Value For Money Indicators Guidance 1) Introduction
26 pages
Eee212 Homework2 240509 201627
No ratings yet
Eee212 Homework2 240509 201627
13 pages
Practice Sheet6 PDF
No ratings yet
Practice Sheet6 PDF
2 pages
N.B. Wimal Premadasa: Contact Profile
No ratings yet
N.B. Wimal Premadasa: Contact Profile
3 pages
CatalogoDiferencialScaniaRP731 PDF
No ratings yet
CatalogoDiferencialScaniaRP731 PDF
3 pages
ISO-1637-1987
No ratings yet
ISO-1637-1987
9 pages
Ecc Wqt-30.11.21
No ratings yet
Ecc Wqt-30.11.21
3 pages
Bill For Servicing, Repair and Maintenance of Air Cooler
No ratings yet
Bill For Servicing, Repair and Maintenance of Air Cooler
11 pages
AISC Design Guides
80% (5)
AISC Design Guides
3 pages
Valve & Amplifier Design, Valve Equivalents
80% (5)
Valve & Amplifier Design, Valve Equivalents
51 pages
33.PLATE COMPACTOR INSPECTION CHECKLIST
No ratings yet
33.PLATE COMPACTOR INSPECTION CHECKLIST
3 pages
Introduction To Vibration Based Condition Monitoring
No ratings yet
Introduction To Vibration Based Condition Monitoring
4 pages
Polyisobutylene Pib Primary Sealant TB 1250 XX
No ratings yet
Polyisobutylene Pib Primary Sealant TB 1250 XX
10 pages
P-7638 Minipack - 1
No ratings yet
P-7638 Minipack - 1
17 pages