Reservoir sampling is a family of randomized algorithms for randomly choosing k samples from a list of n items, where n is either a very large or unknown number. Typically n is large enough that the list doesn't fit into main memory. For example, a list of search queries in Google and Facebook.
So we are given a big array (or stream) of numbers (to simplify), and we need to write an efficient function to randomly select k numbers where 1 <= k <= n. Let the input array be stream[].
A simple solution is to create an array reservoir[] of maximum size k. One by one randomly select an item from stream[0..n-1]. If the selected item is not previously selected, then put it in reservoir[]. To check if an item is previously selected or not, we need to search the item in reservoir[]. The time complexity of this algorithm will be O(k^2). This can be costly if k is big. Also, this is not efficient if the input is in the form of a stream.
It can be solved in O(n) time. The solution also suits well for input in the form of stream. The idea is similar to this post. Following are the steps.
1) Create an array reservoir[0..k-1] and copy first k items of stream[] to it.
2) Now one by one consider all items from (k+1)th item to nth item.
...a) Generate a random number from 0 to i where i is the index of the current item in stream[]. Let the generated random number is j.
...b) If j is in range 0 to k-1, replace reservoir[j] with stream[i]
Following is the implementation of the above algorithm.
C++
// An efficient program to randomly select
// k items from a stream of items
#include <bits/stdc++.h>
#include <time.h>
using namespace std;
// A utility function to print an array
void printArray(int stream[], int n)
{
for (int i = 0; i < n; i++)
cout << stream[i] << " ";
cout << endl;
}
// A function to randomly select
// k items from stream[0..n-1].
void selectKItems(int stream[], int n, int k)
{
int i; // index for elements in stream[]
// reservoir[] is the output array. Initialize
// it with first k elements from stream[]
int reservoir[k];
for (i = 0; i < k; i++)
reservoir[i] = stream[i];
// Use a different seed value so that we don't get
// same result each time we run this program
srand(time(NULL));
// Iterate from the (k+1)th element to nth element
for (; i < n; i++)
{
// Pick a random index from 0 to i.
int j = rand() % (i + 1);
// If the randomly picked index is smaller than k,
// then replace the element present at the index
// with new element from stream
if (j < k)
reservoir[j] = stream[i];
}
cout << "Following are k randomly selected items \n";
printArray(reservoir, k);
}
// Driver Code
int main()
{
int stream[] = {1, 2, 3, 4, 5, 6,
7, 8, 9, 10, 11, 12};
int n = sizeof(stream)/sizeof(stream[0]);
int k = 5;
selectKItems(stream, n, k);
return 0;
}
// This is code is contributed by rathbhupendra
C
// An efficient program to randomly select k items from a stream of items
#include <stdio.h>
#include <stdlib.h>
#include <time.h>
// A utility function to print an array
void printArray(int stream[], int n)
{
for (int i = 0; i < n; i++)
printf("%d ", stream[i]);
printf("\n");
}
// A function to randomly select k items from stream[0..n-1].
void selectKItems(int stream[], int n, int k)
{
int i; // index for elements in stream[]
// reservoir[] is the output array. Initialize it with
// first k elements from stream[]
int reservoir[k];
for (i = 0; i < k; i++)
reservoir[i] = stream[i];
// Use a different seed value so that we don't get
// same result each time we run this program
srand(time(NULL));
// Iterate from the (k+1)th element to nth element
for (; i < n; i++)
{
// Pick a random index from 0 to i.
int j = rand() % (i+1);
// If the randomly picked index is smaller than k, then replace
// the element present at the index with new element from stream
if (j < k)
reservoir[j] = stream[i];
}
printf("Following are k randomly selected items \n");
printArray(reservoir, k);
}
// Driver program to test above function.
int main()
{
int stream[] = {1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12};
int n = sizeof(stream)/sizeof(stream[0]);
int k = 5;
selectKItems(stream, n, k);
return 0;
}
Java
// An efficient Java program to randomly
// select k items from a stream of items
import java.util.Arrays;
import java.util.Random;
public class ReservoirSampling {
// A function to randomly select k items from
// stream[0..n-1].
static void selectKItems(int stream[], int n, int k)
{
int i; // index for elements in stream[]
// reservoir[] is the output array. Initialize it
// with first k elements from stream[]
int reservoir[] = new int[k];
for (i = 0; i < k; i++)
reservoir[i] = stream[i];
Random r = new Random();
// Iterate from the (k+1)th element to nth element
for (; i < n; i++) {
// Pick a random index from 0 to i.
int j = r.nextInt(i + 1);
// If the randomly picked index is smaller than
// k, then replace the element present at the
// index with new element from stream
if (j < k)
reservoir[j] = stream[i];
}
System.out.println(
"Following are k randomly selected items");
System.out.println(Arrays.toString(reservoir));
}
// Driver Program to test above method
public static void main(String[] args)
{
int stream[]
= { 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12 };
int n = stream.length;
int k = 5;
selectKItems(stream, n, k);
}
}
// This code is contributed by Sumit Ghosh
Python3
# An efficient Python3 program
# to randomly select k items
# from a stream of items
import random
# A utility function
# to print an array
def printArray(stream,n):
for i in range(n):
print(stream[i],end=" ");
print();
# A function to randomly select
# k items from stream[0..n-1].
def selectKItems(stream, n, k):
i=0;
# index for elements
# in stream[]
# reservoir[] is the output
# array. Initialize it with
# first k elements from stream[]
reservoir = [0]*k;
for i in range(k):
reservoir[i] = stream[i];
# Iterate from the (k+1)th
# element to nth element
while(i < n):
# Pick a random index
# from 0 to i.
j = random.randrange(i+1);
# If the randomly picked
# index is smaller than k,
# then replace the element
# present at the index
# with new element from stream
if(j < k):
reservoir[j] = stream[i];
i+=1;
print("Following are k randomly selected items");
printArray(reservoir, k);
# Driver Code
if __name__ == "__main__":
stream = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12];
n = len(stream);
k = 5;
selectKItems(stream, n, k);
# This code is contributed by mits
C#
// An efficient C# program to randomly
// select k items from a stream of items
using System;
using System.Collections;
public class ReservoirSampling
{
// A function to randomly select k
// items from stream[0..n-1].
static void selectKItems(int []stream,
int n, int k)
{
// index for elements in stream[]
int i;
// reservoir[] is the output array.
// Initialize it with first k
// elements from stream[]
int[] reservoir = new int[k];
for (i = 0; i < k; i++)
reservoir[i] = stream[i];
Random r = new Random();
// Iterate from the (k+1)th
// element to nth element
for (; i < n; i++)
{
// Pick a random index from 0 to i.
int j = r.Next(i + 1);
// If the randomly picked index
// is smaller than k, then replace
// the element present at the index
// with new element from stream
if(j < k)
reservoir[j] = stream[i];
}
Console.WriteLine("Following are k " +
"randomly selected items");
for (i = 0; i < k; i++)
Console.Write(reservoir[i]+" ");
}
//Driver code
static void Main()
{
int []stream = {1, 2, 3, 4, 5, 6, 7,
8, 9, 10, 11, 12};
int n = stream.Length;
int k = 5;
selectKItems(stream, n, k);
}
}
// This code is contributed by mits
JavaScript
<script>
// An efficient program to randomly select
// k items from a stream of items
// A utility function to print an array
function printArray(stream, n)
{
for(let i = 0; i < n; i++)
document.write(stream[i] + " ");
document.write('\n');
}
// A function to randomly select
// k items from stream[0..n-1].
function selectKItems(stream, n, k)
{
// Index for elements in stream[]
let i;
// reservoir[] is the output array. Initialize
// it with first k elements from stream[]
let reservoir = [];
for(i = 0; i < k; i++)
reservoir[i] = stream[i];
// Use a different seed value so that
// we don't get same result each time
// we run this program
// Iterate from the (k+1)th element
// to nth element
for(; i < n; i++)
{
// Pick a random index from 0 to i.
let j = (Math.floor(Math.random() *
100000000) % (i + 1));
// If the randomly picked index is
// smaller than k, then replace the
// element present at the index
// with new element from stream
if (j < k)
reservoir[j] = stream[i];
}
document.write("Following are k randomly " +
"selected items \n");
printArray(reservoir, k);
}
// Driver Code
let stream = [ 1, 2, 3, 4, 5, 6,
7, 8, 9, 10, 11, 12 ];
let n = stream.length;
let k = 5;
selectKItems(stream, n, k);
// This code is contributed by rohan07
</script>
PHP
<?php
// An efficient PHP program
// to randomly select k items
// from a stream of items
// A utility function
// to print an array
function printArray($stream,$n)
{
for ($i = 0; $i < $n; $i++)
echo $stream[$i]." ";
echo "\n";
}
// A function to randomly select
// k items from stream[0..n-1].
function selectKItems($stream, $n, $k)
{
$i; // index for elements
// in stream[]
// reservoir[] is the output
// array. Initialize it with
// first k elements from stream[]
$reservoir = array_fill(0, $k, 0);
for ($i = 0; $i < $k; $i++)
$reservoir[$i] = $stream[$i];
// Iterate from the (k+1)th
// element to nth element
for (; $i < $n; $i++)
{
// Pick a random index
// from 0 to i.
$j = rand(0,$i + 1);
// If the randomly picked
// index is smaller than k,
// then replace the element
// present at the index
// with new element from stream
if($j < $k)
$reservoir[$j] = $stream[$i];
}
echo "Following are k randomly ".
"selected items\n";
printArray($reservoir, $k);
}
// Driver Code
$stream = array(1, 2, 3, 4, 5, 6, 7,
8, 9, 10, 11, 12);
$n = count($stream);
$k = 5;
selectKItems($stream, $n, $k);
// This code is contributed by mits
?>
Output:
Following are k randomly selected items
6 2 11 8 12
Note: Output will differ every time as it selects and prints random elements
Time Complexity: O(n)
Auxiliary Space: O(k)
How does this work?
To prove that this solution works perfectly, we must prove that the probability that any item stream[i] where 0 <= i < n will be in final reservoir[] is k/n. Let us divide the proof in two cases as first k items are treated differently.
Case 1: For last n-k stream items, i.e., for stream[i] where k <= i < n
For every such stream item stream[i], we pick a random index from 0 to i and if the picked index is one of the first k indexes, we replace the element at picked index with stream[i]
To simplify the proof, let us first consider the last item. The probability that the last item is in final reservoir = The probability that one of the first k indexes is picked for last item = k/n (the probability of picking one of the k items from a list of size n)
Let us now consider the second last item. The probability that the second last item is in final reservoir[] = [Probability that one of the first k indexes is picked in iteration for stream[n-2]] X [Probability that the index picked in iteration for stream[n-1] is not same as index picked for stream[n-2] ] = [k/(n-1)]*[(n-1)/n] = k/n.
Similarly, we can consider other items for all stream items from stream[n-1] to stream[k] and generalize the proof.
Case 2: For first k stream items, i.e., for stream[i] where 0 <= i < k
The first k items are initially copied to reservoir[] and may be removed later in iterations for stream[k] to stream[n].
The probability that an item from stream[0..k-1] is in final array = Probability that the item is not picked when items stream[k], stream[k+1], .... stream[n-1] are considered = [k/(k+1)] x [(k+1)/(k+2)] x [(k+2)/(k+3)] x ... x [(n-1)/n] = k/n
References:
http://en.wikipedia.org/wiki/Reservoir_sampling
Similar Reads
Randomized Algorithms Randomized algorithms in data structures and algorithms (DSA) are algorithms that use randomness in their computations to achieve a desired outcome. These algorithms introduce randomness to improve efficiency or simplify the algorithm design. By incorporating random choices into their processes, ran
2 min read
Random Variable Random variable is a fundamental concept in statistics that bridges the gap between theoretical probability and real-world data. A Random variable in statistics is a function that assigns a real value to an outcome in the sample space of a random experiment. For example: if you roll a die, you can a
10 min read
Binomial Random Variables In this post, we'll discuss Binomial Random Variables.Prerequisite : Random Variables A specific type of discrete random variable that counts how often a particular event occurs in a fixed number of tries or trials. For a variable to be a binomial random variable, ALL of the following conditions mus
8 min read
Randomized Algorithms | Set 0 (Mathematical Background) Conditional Probability Conditional probability P(A | B) indicates the probability of even 'A' happening given that the even B happened.P(A|B) = \frac{P(A\cap B)}{P(B)} We can easily understand above formula using below diagram. Since B has already happened, the sample space reduces to B. So the pro
3 min read
Randomized Algorithms | Set 1 (Introduction and Analysis) What is a Randomized Algorithm? An algorithm that uses random numbers to decide what to do next anywhere in its logic is called a Randomized Algorithm. For example, in Randomized Quick Sort, we use a random number to pick the next pivot (or we randomly shuffle the array). And in Karger's algorithm,
5 min read
Randomized Algorithms | Set 2 (Classification and Applications) We strongly recommend to refer below post as a prerequisite of this. Randomized Algorithms | Set 1 (Introduction and Analysis) Classification Randomized algorithms are classified in two categories. Las Vegas: A Las Vegas algorithm were introduced by Laszlo Babai in 1979. A Las Vegas algorithm is an
13 min read
Randomized Algorithms | Set 3 (1/2 Approximate Median) Time Complexity: We use a set provided by the STL in C++. In STL Set, insertion for each element takes O(log k). So for k insertions, time taken is O (k log k). Now replacing k with c log n =>O(c log n (log (clog n))) =>O (log n (log log n)) How is probability of error less than 2/n2? Algorithm make
2 min read
Easy problems on randomized algorithms
Write a function that generates one of 3 numbers according to given probabilitiesYou are given a function rand(a, b) which generates equiprobable random numbers between [a, b] inclusive. Generate 3 numbers x, y, z with probability P(x), P(y), P(z) such that P(x) + P(y) + P(z) = 1 using the given rand(a,b) function.The idea is to utilize the equiprobable feature of the rand(a,b)
5 min read
Generate 0 and 1 with 25% and 75% probabilityGiven a function rand50() that returns 0 or 1 with equal probability, write a function that returns 1 with 75% probability and 0 with 25% probability using rand50() only. Minimize the number of calls to the rand50() method. Also, the use of any other library function and floating-point arithmetic ar
13 min read
Implement rand3() using rand2()Given a function rand2() that returns 0 or 1 with equal probability, implement rand3() using rand2() that returns 0, 1 or 2 with equal probability. Minimize the number of calls to rand2() method. Also, use of any other library function and floating point arithmetic are not allowed. The idea is to us
6 min read
Birthday ParadoxHow many people must be there in a room to make the probability 100% that at-least two people in the room have same birthday? Answer: 367 (since there are 366 possible birthdays, including February 29). The above question was simple. Try the below question yourself. How many people must be there in
7 min read
Expectation or expected value of an arrayExpectation or expected value of any group of numbers in probability is the long-run average value of repetitions of the experiment it represents. For example, the expected value in rolling a six-sided die is 3.5, because the average of all the numbers that come up in an extremely large number of ro
5 min read
Shuffle a deck of cardsGiven a deck of cards, the task is to shuffle them. Asked in Amazon Interview Prerequisite : Shuffle a given array Algorithm: 1. First, fill the array with the values in order. 2. Go through the array and exchange each element with the randomly chosen element in the range from itself to the end. //
5 min read
Program to generate CAPTCHA and verify userA CAPTCHA (Completely Automated Public Turing test to tell Computers and Humans Apart) is a test to determine whether the user is human or not.So, the task is to generate unique CAPTCHA every time and to tell whether the user is human or not by asking user to enter the same CAPTCHA as generated auto
6 min read
Find an index of maximum occurring element with equal probabilityGiven an array of integers, find the most occurring element of the array and return any one of its indexes randomly with equal probability.Examples: Input: arr[] = [-1, 4, 9, 7, 7, 2, 7, 3, 0, 9, 6, 5, 7, 8, 9] Output: Element with maximum frequency present at index 6 OR Element with maximum frequen
8 min read
Randomized Binary Search AlgorithmWe are given a sorted array A[] of n elements. We need to find if x is present in A or not.In binary search we always used middle element, here we will randomly pick one element in given range.In Binary Search we had middle = (start + end)/2 In Randomized binary search we do following Generate a ran
13 min read
Medium problems on randomized algorithms
Make a fair coin from a biased coinYou are given a function foo() that represents a biased coin. When foo() is called, it returns 0 with 60% probability, and 1 with 40% probability. Write a new function that returns 0 and 1 with a 50% probability each. Your function should use only foo(), no other library method. Solution:Â We know fo
6 min read
Shuffle a given array using FisherâYates shuffle AlgorithmGiven an array, write a program to generate a random permutation of array elements. This question is also asked as "shuffle a deck of cards" or "randomize a given array". Here shuffle means that every permutation of array element should be equally likely. Let the given array be arr[]. A simple solut
10 min read
Expected Number of Trials until SuccessConsider the following famous puzzle. In a country, all families want a boy. They keep having babies till a boy is born. What is the expected ratio of boys and girls in the country? This puzzle can be easily solved if we know following interesting result in probability and expectation. If probabilit
6 min read
Strong Password Suggester ProgramGiven a password entered by the user, check its strength and suggest some password if it is not strong. Criteria for strong password is as follows : A password is strong if it has : At least 8 characters At least one special char At least one number At least one upper and one lower case char. Exampl
15 min read
QuickSort using Random PivotingIn this article, we will discuss how to implement QuickSort using random pivoting. In QuickSort we first partition the array in place such that all elements to the left of the pivot element are smaller, while all elements to the right of the pivot are greater than the pivot. Then we recursively call
15+ min read
Operations on Sparse MatricesGiven two sparse matrices (Sparse Matrix and its representations | Set 1 (Using Arrays and Linked Lists)), perform operations such as add, multiply or transpose of the matrices in their sparse form itself. The result should consist of three sparse matrices, one obtained by adding the two input matri
15+ min read
Estimating the value of Pi using Monte CarloMonte Carlo estimation Monte Carlo methods are a broad class of computational algorithms that rely on repeated random sampling to obtain numerical results. One of the basic examples of getting started with the Monte Carlo algorithm is the estimation of Pi. Estimation of Pi The idea is to simulate ra
8 min read
Implement rand12() using rand6() in one lineGiven a function, rand6() that returns random numbers from 1 to 6 with equal probability, implement the one-liner function rand12() using rand6() which returns random numbers from 1 to 12 with equal probability. The solution should minimize the number of calls to the rand6() method. Use of any other
7 min read
Hard problems on randomized algorithms
Generate integer from 1 to 7 with equal probabilityGiven a function foo() that returns integers from 1 to 5 with equal probability, write a function that returns integers from 1 to 7 with equal probability using foo() only. Minimize the number of calls to foo() method. Also, use of any other library function is not allowed and no floating point arit
6 min read
Implement random-0-6-Generator using the given random-0-1-GeneratorGiven a function random01Generator() that gives you randomly either 0 or 1, implement a function that utilizes this function and generate numbers between 0 and 6(both inclusive). All numbers should have same probabilities of occurrence. Examples: on multiple runs, it gives 3 2 3 6 0 Approach : The i
5 min read
Select a random number from stream, with O(1) spaceGiven a stream of numbers, generate a random number from the stream. You are allowed to use only O(1) space and the input is in the form of a stream, so can't store the previously seen numbers. So how do we generate a random number from the whole stream such that the probability of picking any numbe
10 min read
Random number generator in arbitrary probability distribution fashionGiven n numbers, each with some frequency of occurrence. Return a random number with probability proportional to its frequency of occurrence. Example: Let following be the given numbers. arr[] = {10, 30, 20, 40} Let following be the frequencies of given numbers. freq[] = {1, 6, 2, 1} The output shou
11 min read
Reservoir SamplingReservoir sampling is a family of randomized algorithms for randomly choosing k samples from a list of n items, where n is either a very large or unknown number. Typically n is large enough that the list doesn't fit into main memory. For example, a list of search queries in Google and Facebook.So we
11 min read
Linearity of ExpectationPrerequisite: Random Variable This post is about mathematical concepts like expectation, linearity of expectation. It covers one of the required topics to understand Randomized Algorithms. Let us consider the following simple problem. Problem: Given a fair dice with 6 faces, the dice is thrown n tim
4 min read
Introduction and implementation of Karger's algorithm for Minimum CutGiven an undirected and unweighted graph, find the smallest cut (smallest number of edges that disconnects the graph into two components). The input graph may have parallel edges. For example consider the following example, the smallest cut has 2 edges. A Simple Solution use Max-Flow based s-t cut a
15+ min read
Select a Random Node from a Singly Linked ListGiven a singly linked list, select a random node from the linked list (the probability of picking a node should be 1/N if there are N nodes in the list). You are given a random number generator.Below is a Simple Solution Count the number of nodes by traversing the list. Traverse the list again and s
14 min read
Select a Random Node from a tree with equal probabilityGiven a Binary Tree with children Nodes, Return a random Node with an equal Probability of selecting any Node in tree.Consider the given tree with root as 1. 10 / \ 20 30 / \ / \ 40 50 60 70 Examples: Input : getRandom(root); Output : A Random Node From Tree : 3 Input : getRandom(root); Output : A R
8 min read
Freivaldâs Algorithm to check if a matrix is product of twoGiven three matrices A, B and C, find if C is a product of A and B. Examples: Input : A = 1 1 1 1 B = 1 1 1 1 C = 2 2 2 2 Output : Yes C = A x B Input : A = 1 1 1 1 1 1 1 1 1 B = 1 1 1 1 1 1 1 1 1 C = 3 3 3 3 1 2 3 3 3 Output : No A simple solution is to find product of A and B and then check if pro
12 min read
Random Acyclic Maze Generator with given Entry and Exit pointGiven two integers N and M, the task is to generate any N * M sized maze containing only 0 (representing a wall) and 1 (representing an empty space where one can move) with the entry point as P0 and exit point P1 and there is only one path between any two movable positions. Note: P0 and P1 will be m
15+ min read