Lecture1 Intro Streaming
Lecture1 Intro Streaming
Logistics
• Prerequisites:
Algorithms + Complexity
or
Probability + Computational Models with grade
Logistics
• Grade:
• 70% exam
• 30% HW assignments (5-6)
• 5 bonus points for participating in Mentimeter quiz during class
• Participate sin at least 11 (out of 13) quizzes
data workspace
This Course
• Part I: Streaming Algorithms
• Part II: Sublinear-Time Algorithms
• Part III: Distributed Algorithms
Streaming Algorithms
Algorithm
(workspace)
data
Goal: compute
… approximately, w.h.p.
Streaming Algorithms
• Useful when:
• Data really is a stream
• Many cases where it’s not
Sublinear-Time Algorithms
Algorithm
𝑛
𝑥 ∈ {0 , 1}
?
?
?
Goal: compute
… approximately, w.h.p.
9
One Current Example….
Distributed Algorithms
data4
data1
data5
data3
data2
Goal: compute
… approximately, w.h.p.
Course Goals
• See some cool algorithms and lower bounds
• Get a “feel” for randomized algorithms and probability
Today: a Tasting Menu
• One sublinear-time algorithm
• One streaming algorithm
• One distributed algorithm
Testing List Sortedness in
Sublinear Time
[Ergün, Kannan, Kumar, Rubinfeld, Viswanathan ‘00]
List Sortedness
• Input: a list of integers
• Output: is sorted?
For every :
universe
NO
YES
???
Property
“close to ”
Need to change at
most of the object to
get
“far from ”
Property Testing (Formally)
Given and a property , distinguish between:
•,
• is -far from :
for all we have , where = “edit distance”
17
Back to Sortedness
• “-close to sorted”?
• Need to change at most values to get a sorted list
Naïve Attempt
• Sample uniformly random indices and verify
• How large should be?
• Bad example:
3 2 1 6 5 4 9 8 7 12 11 10 15 14 13 16
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Correctness
• Need to show:
• If is sorted, we accept w.h.p.
• If is -far from sorted, we reject w.h.p.
• Output:
Analysis of Flajolet-Martin
Analysis of Flajolet-Martin
Space Complexity
The Hash Function
• Pairwise-independence: for every and ,
is pairwise-independent.
• Representing ?
Improving the Accuracy
• Result must be of the form
• High variance
• How to improve?
Distributed Algorithm for All-
Pairs Shortest Paths