Windows Function in R using Dplyr
Last Updated :
10 Nov, 2022
Aggregation functions in R are used to take a bunch of values and give us output as a single value. Some of the examples of aggregation methods are the sum and mean. Windows functions in R provide a variation to the aggregation methods in the sense that they return the number of outputs equivalent to the number of inputs if n number of inputs are taken, n outputs are returned. In this article, we will discuss the various windows functions that are available in R.
The functions we will covering in this articles are :
row_number |
To rank the values. |
min_rank |
To compute the rank so that the minimum rank until that element is thrown as output. |
percent_rank |
To compute the rank so that the percentage rank between the values 0 and 1 is returned. |
cume_dist |
To compute a proportion of all values at most equal to the current rank. |
Lead |
To compute the next element in sequence of values specified in the vector. |
Lag |
To compute the previous element in sequence of values specified in the vector. |
Cum Sum Method |
To compute the sum of values encountered till that particular index. |
Cum Prod Method |
To compute the product of values encountered till that particular index. |
Cum Min Method |
To calculate the minimum value encountered until that particular index value. |
Cum Max Method |
To calculate the maximum value encountered until that particular index value. |
Cum Mean Method |
To calculate the mean value encountered until that particular index value. |
Cum Any Method |
To check if any of the elements in the vector satisfy the result. |
Cum All Method |
To check if all of the elements in the vector satisfy the result. |
Let’s see the syntax and Code for each function.
Row_number
The row_number method is considered to be equivalent to the rank method. The missing values are left as it is.
Syntax: row_number(vec)
Arguments: vec- the vector of values that have to be ranked
R
library (dplyr)
library (data.table)
companies = c ( "Geekster" , "Geeksforgeeks" , "Wipro" , "TCS" )
print (companies)
rn <- row_number (companies)
print (rn)
|
Output:
"Geekster" "Geeksforgeeks" "Wipro" "TCS"
2 1 4 3
Explanation:
The row numbers of the supplied input vector are computed after sorting the values in increasing order. For instance, the word(GeeksForGeeks) in the first index is the smallest lexicographically. Therefore its row number is 1. This is followed by the word “Geekster” with the row number corresponding to 2. TCS gets row number 3 since it is next in order.
Min_rank
The min_rank method is also used to compute the rank in such a way that the minimum rank until that element is thrown as output.
Syntax: min_rank(vec)
Arguments: vec- the vector of values that have to be ranked
R
companies = c ( "Geekster" , "Geeksforgeeks" ,
"Geekster" , "Wipro" , "TCS" )
min_rank <- min_rank (companies)
print (min_rank)
|
Output:
2 1 2 5 4
Here we can see that Geekster has the min_rank of 2 so it assigned the same values i.e. not 3
Percent_rank
The percent_rank method is also used to compute the rank in such a way that the percentage rank between the values 0 and 1 is returned.
Syntax: percent_rank(vec)
Arguments: vec- the vector of values that have to be ranked
R
percent_rank <- percent_rank (companies)
print (percent_rank)
|
Output:
0.25 0.00 0.25 1.00 0.75
The values begin with the 0.0 percentage after being sorted in ascending order.
Cume_dist
The cume_dist method in R is equivalent to a cumulative distribution function. It is used to compute a proportion of all values at most equal to the current rank.
Syntax: cume_dist(vec)
Arguments: vec- the vector of values that have to be ranked
R
dist <- cume_dist (companies)
print (dist)
|
Output:
0.6 0.2 0.6 1.0 0.8
Lead
The lead windows method in R is by default used to compute the next element in sequence of values specified in the vector. The lead value is not applicable for the last element of the input data object.
Syntax: lead(vec)
Arguments: vec- the vector of values that have to be ranked.
R
vec <- c (4,3,1,2,5)
print (vec)
lead <- lead (vec)
print (lead)
|
Output:
4 3 1 2 5
3 1 2 5 NA
Lag
The lag windows method in R is by default used to compute the previous element in sequence of values specified in the vector. The lag value is not applicable for the first element of the input data object, since there is no element before it.
Syntax: lag(vec)
Arguments: vec- the vector of values that have to be ranked
R
lag <- lag (vec)
print (lag)
|
Output:
NA 4 3 1 2
Explanation :
The lag method for the first element is not applicable. For the element at 1st index that is 3 , the lag value is equivalent to the value at the 0th index.
Cum Sum Method
The cumsum() method is used to compute the sum of values encountered till that particular index. The cumsum value of the first element is equivalent to the value itself.
Syntax: cumsum(vec)
Arguments: vec- the vector of values that have to be ranked
R
vec <- 1:5
cumsum <- cumsum (vec)
print (cumsum)
|
Output:
1 3 6 10 15
Explanation :
The sum of the first index element, 2 in the vector is 1+2 = 3. For the element 3 at index 2 in vector, cumsum = 1 + 2 + 3 = 6. Similarly the cumulative sums can be calculated.
Cum Prod Method
The cumprod() method is used to compute the product of values encountered till that particular index. The cumprod value of the first element is equivalent to the value itself.
Syntax: cumprod(vec)
Arguments: vec- the vector of values
R
cumprod <- cumprod (vec)
print (cumprod)
|
Output:
1 2 6 24 120
Explanation :
The product of 0th index element is the value itself, equivalent to 1.The product of the first index element, 2 in the vector is 122 = 3. For the element 3 at index 2 in vector, cumprod = 1 * 2 * 3 = 6. Similarly the cumulative products can be calculated.
Cum Min Method
The cummin() method is used to calculate the minimum value encountered until that particular index value.
Syntax: cummin(vec)
Arguments: vec- the vector of values
R
vec <- c (3,2,1,5,3)
cum_min <- cummin (vec)
print (cum_min)
|
Output:
3 2 1 1 1
Explanation :
The min value encountered till first element is the element value itself. In the second element 2, the minimum becomes 2. For the third element, min becomes 1. The fourth element is greater than min value therefore, min remains same.
Cum Max Method
The cummax() method is used to calculate the maximum value encountered until that particular index value.
Syntax: cummax(vec)
Arguments: vec- the vector of values
R
cum_max <- cummax (vec)
print (cum_max)
|
Output:
3 3 3 5 5
Cum Mean Method
The cummean() method is used to calculate the mean value encountered until that particular index value.
Syntax: cummean(vec)
Arguments: vec- the vector of values
R
cum_mean <- cummean (vec)
print (cum_mean)
|
Output:
3.00 2.50 2.00 2.75 2.80
Cum Any Method
The cumany method is used to check if any of the elements in the vector satisfy the result. The elements of the vector at any particular index are taken in account to consider the function value.
Syntax: cumany(vec)
Arguments: vec- the vector of values
R
cum_any_3 <- cumany (vec>3)
print ( "Any vector values greater than 3" )
print (cum_any_3)
cum_any_0 <- cumany (vec==0)
print ( "Any vector values equal to 0" )
print (cum_any_0)
|
Output:
"Any vector values greater than 3"
FALSE FALSE FALSE TRUE TRUE
"Any vector values equal to 0"
FALSE FALSE FALSE FALSE FALSE
Cum All Method
The cumall method is used to check if all of the elements in the vector satisfy the result. The elements of the vector at any particular index are taken in account to consider the function value.
Syntax: cumall(vec)
Arguments: vec- the vector of values
R
cum_any_3 <- cumall (vec>3)
print (cum_any_3)
|
Output:
FALSE FALSE FALSE FALSE FALSE
Similar Reads
Group by function in R using Dplyr
Group_by() function belongs to the dplyr package in the R programming language, which groups the data frames. Group_by() function alone will not give any output. It should be followed by summarise() function with an appropriate action to perform. It works similar to GROUP BY in SQL and pivot table i
2 min read
Row wise operation in R using Dplyr
The dplyr package in R programming is used to perform simulations in the data by performing manipulations and transformations. It can be installed into the working space using the following command : install.packages("dplyr")Create Dataframe using Row The data frame created by tibble contains rows a
4 min read
dplyr::count() Function in R
The count() function in the dplyr package is used to count the number of occurrences of unique combinations of variables in a data frame. It is particularly useful for generating frequency tables or summarizing categorical data. Here's a detailed explanation of how to use count() in R Programming La
3 min read
Union() & union_all() functions in Dplyr package in R
In this article, we will discuss union() and union_all() functions using Dplyr package in the R programming language. Dataframes in use: Example: R program to create data frames with college student data and display them C/C++ Code # create dataframe1 with college # 1 data data1=data.frame(id=c(1,2,
2 min read
dplyr arrange() Function in R
In data analysis and manipulation, arranging data according to specific criteria is a fundamental operation. Whether it's sorting a dataset by a certain column or multiple columns, this task is often essential for gaining insights and making informed decisions. In R Programming Language the dplyr pa
3 min read
Apply a function to each group using Dplyr in R
In this article, we are going to learn how to apply a function to each group using dplyr in the R programming language. The dplyr package in R is used for data manipulations and modifications. The package can be downloaded and installed into the working space using the following command : install.pa
4 min read
Slice() Function In R
The slice() function in R is a very useful function to manipulate and subset data frames. it allows you to pick individual rows or a range of rows from a dataset with simple syntax This function is part of the dplyr package, which is essential for data manipulation. Syntaxslice(.data, ..., n = NULL,
8 min read
Case when statement in R Dplyr Package using case_when() Function
This article focuses upon the case when statement in the R programming language using the case_when() function from the Dplyr package. Case when is a mechanism using which we can vectorize a bunch of if and else if statements. In simple words, using a case when statement we evaluate a condition expr
4 min read
sum() function in R
sum() function in R Programming Language returns the addition of the values passed as arguments to the function. Syntax: sum(...) Parameters: ...: numeric or complex or logical vectorssum() Function in R ExampleR program to add two numbersHere we will use sum() functions to add two numbers. [GFGTABS
2 min read
Slice() From Dplyr In R
With so much data around us in today's world, dealing with them becomes tough. In this case, the Dplyr data frame package from R acts as a lifesaver and that package stands out as a powerful and versatile tool. for data manipulation. In R Programming Language package has many functions and among the
11 min read