Lecture 27
Lecture 27
Generalization: Moments
Suppose a stream has elements chosen
from a set A of N values
Let mi be the number of times value i occurs
in the stream T=0, T=10
i=1, 2,3,4,5 | m1, m2, m3, m4, m5
Kth moment= (m1)^1+(m2)^1+(m3)^1+
The kth moment is (m4)^1+(m5)^1
iA
( mi ) k
iA
( mi ) k
AMS Method
AMS method works for all moments
Gives an unbiased estimate
We will just concentrate on the 2nd moment S
We pick and keep track of many variables X:
For each variable X we store X.el and X.val
X.el corresponds to the item i
X.val corresponds to the count of item i
Note this requires a count in main memory,
so number of Xs is limited
Our goal is to compute
Stream: a a b b b a b a
2nd moment is
ct … number of times item at time t appears
from time t onwards (c1=ma , c2=ma-1, c3=mb)
mi … total count of
item i in the stream
(we are assuming
stream has length n)
Stream: a a b b b a b a
So,
We have the second moment (in expectation)!
So:
For k=3: c3 - (c-1)3 = 3c2 - 3c + 1
Generally: Estimate
J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org 9
Combining Samples
In practice:
Compute for
as many variables X as you can fit in memory
Average them in groups
Take median of averages
Drawbacks:
Only approximate
Number of itemsets is way too big
...
1/c
Important property: Sum over all weights is
1/[1 – (1 – c)] = 1/c