Windows Function in R using Dplyr

Last Updated : 10 Nov, 2022

Aggregation functions in R are used to take a bunch of values and give us output as a single value. Some of the examples of aggregation methods are the sum and mean. Windows functions in R provide a variation to the aggregation methods in the sense that they return the number of outputs equivalent to the number of inputs if n number of inputs are taken, n outputs are returned. In this article, we will discuss the various windows functions that are available in R.

The functions we will covering in this articles are :

row_number	To rank the values.
min_rank	To compute the rank so that the minimum rank until that element is thrown as output.
percent_rank	To compute the rank so that the percentage rank between the values 0 and 1 is returned.
cume_dist	To compute a proportion of all values at most equal to the current rank.
Lead	To compute the next element in sequence of values specified in the vector.
Lag	To compute the previous element in sequence of values specified in the vector.
Cum Sum Method	To compute the sum of values encountered till that particular index.
Cum Prod Method	To compute the product of values encountered till that particular index.
Cum Min Method	To calculate the minimum value encountered until that particular index value.
Cum Max Method	To calculate the maximum value encountered until that particular index value.
Cum Mean Method	To calculate the mean value encountered until that particular index value.
Cum Any Method	To check if any of the elements in the vector satisfy the result.
Cum All Method	To check if all of the elements in the vector satisfy the result.

Let’s see the syntax and Code for each function.

Row_number

The row_number method is considered to be equivalent to the rank method. The missing values are left as it is.

Syntax: row_number(vec)

Arguments: vec- the vector of values that have to be ranked

R

library(dplyr)
library(data.table)
#creating a data vector
companies =  c("Geekster","Geeksforgeeks","Wipro","TCS")
#printing the original vector
print(companies)
#computing the row number of the used vector
rn <- row_number(companies)
print(rn)

Output:

"Geekster" "Geeksforgeeks" "Wipro" "TCS"          
2 1 4 3

Explanation:

The row numbers of the supplied input vector are computed after sorting the values in increasing order. For instance, the word(GeeksForGeeks) in the first index is the smallest lexicographically. Therefore its row number is 1. This is followed by the word “Geekster” with the row number corresponding to 2. TCS gets row number 3 since it is next in order.

Min_rank

The min_rank method is also used to compute the rank in such a way that the minimum rank until that element is thrown as output.

Syntax: min_rank(vec)

Arguments: vec- the vector of values that have to be ranked

R

#computing the rank of the used vector
companies =  c("Geekster","Geeksforgeeks",
               "Geekster","Wipro","TCS")
min_rank <- min_rank(companies)
print(min_rank)

Output:

2 1 2 5 4

Here we can see that Geekster has the min_rank of 2 so it assigned the same values i.e. not 3

Percent_rank

The percent_rank method is also used to compute the rank in such a way that the percentage rank between the values 0 and 1 is returned.

Syntax: percent_rank(vec)

Arguments: vec- the vector of values that have to be ranked

R

#computing the rank of the used vector
percent_rank <- percent_rank(companies)
print(percent_rank)

Output:

0.25 0.00 0.25 1.00 0.75

The values begin with the 0.0 percentage after being sorted in ascending order.

Cume_dist

The cume_dist method in R is equivalent to a cumulative distribution function. It is used to compute a proportion of all values at most equal to the current rank.

Syntax: cume_dist(vec)

Arguments: vec- the vector of values that have to be ranked

R

#computing the cume_dist of the used vector
dist <- cume_dist(companies)
print(dist)

Output:

0.6 0.2 0.6 1.0 0.8

Lead

The lead windows method in R is by default used to compute the next element in sequence of values specified in the vector. The lead value is not applicable for the last element of the input data object.

Syntax: lead(vec)

Arguments: vec- the vector of values that have to be ranked.

R

#creating a vector 
vec <- c(4,3,1,2,5)
print(vec)
 
lead <- lead(vec)
print(lead)

Output:

4 3 1 2 5
3 1 2 5 NA

Lag

The lag windows method in R is by default used to compute the previous element in sequence of values specified in the vector. The lag value is not applicable for the first element of the input data object, since there is no element before it.

Syntax: lag(vec)

Arguments: vec- the vector of values that have to be ranked

R

lag <- lag(vec)
print(lag)

Output:

NA  4  3  1  2

Explanation :

The lag method for the first element is not applicable. For the element at 1st index that is 3 , the lag value is equivalent to the value at the 0th index.

Cum Sum Method

The cumsum() method is used to compute the sum of values encountered till that particular index. The cumsum value of the first element is equivalent to the value itself.

Syntax: cumsum(vec)

Arguments: vec- the vector of values that have to be ranked

R

#creating a vector 
vec <- 1:5
cumsum <- cumsum(vec)
print(cumsum)

Output:

1  3  6 10 15

Explanation :

The sum of the first index element, 2 in the vector is 1+2 = 3. For the element 3 at index 2 in vector, cumsum = 1 + 2 + 3 = 6. Similarly the cumulative sums can be calculated.

Cum Prod Method

The cumprod() method is used to compute the product of values encountered till that particular index. The cumprod value of the first element is equivalent to the value itself.

Syntax: cumprod(vec)

Arguments: vec- the vector of values

R

cumprod <- cumprod(vec)
print(cumprod)

Output:

1 2 6 24 120

Explanation :

The product of 0th index element is the value itself, equivalent to 1.The product of the first index element, 2 in the vector is 122 = 3. For the element 3 at index 2 in vector, cumprod = 1 * 2 * 3 = 6. Similarly the cumulative products can be calculated.

Cum Min Method

The cummin() method is used to calculate the minimum value encountered until that particular index value.

Syntax: cummin(vec)

Arguments: vec- the vector of values

R

#creating a vector 
vec <- c(3,2,1,5,3)
cum_min <- cummin(vec)
print(cum_min)

Output:

3 2 1 1 1

Explanation :

The min value encountered till first element is the element value itself. In the second element 2, the minimum becomes 2. For the third element, min becomes 1. The fourth element is greater than min value therefore, min remains same.

Cum Max Method

The cummax() method is used to calculate the maximum value encountered until that particular index value.

Syntax: cummax(vec)

Arguments: vec- the vector of values

R

cum_max <- cummax(vec)
print(cum_max)

Output:

3 3 3 5 5

Cum Mean Method

The cummean() method is used to calculate the mean value encountered until that particular index value.

Syntax: cummean(vec)

Arguments: vec- the vector of values

R

cum_mean <- cummean(vec)
print(cum_mean)

Output:

3.00 2.50 2.00 2.75 2.80

Cum Any Method

The cumany method is used to check if any of the elements in the vector satisfy the result. The elements of the vector at any particular index are taken in account to consider the function value.

Syntax: cumany(vec)

Arguments: vec- the vector of values

R

cum_any_3 <- cumany(vec>3)
print("Any vector values greater than 3")
print(cum_any_3)
 
cum_any_0 <- cumany(vec==0)
print("Any vector values equal to 0")
print(cum_any_0)

Output:

"Any vector values greater than 3"
FALSE FALSE FALSE  TRUE  TRUE
"Any vector values equal to 0"
FALSE FALSE FALSE FALSE FALSE

Cum All Method

The cumall method is used to check if all of the elements in the vector satisfy the result. The elements of the vector at any particular index are taken in account to consider the function value.

Syntax: cumall(vec)

Arguments: vec- the vector of values

R

#using cumall method
cum_any_3 <- cumall(vec>3)
print(cum_any_3)

Output:

FALSE FALSE FALSE FALSE FALSE

dplyr::count() Function in R

yashchuahan

Improve

Article Tags :

R Language

Windows Function in R using Dplyr

Row_number

R

Min_rank

R

Percent_rank

R

Cume_dist

R

Lead

R

Lag

R

Cum Sum Method

R

Cum Prod Method

R

Cum Min Method

R

Cum Max Method

R

Cum Mean Method

R

Cum Any Method

R

Cum All Method

R

Similar Reads

Thank You!

What kind of Experience do you want to share?