Count the frequency of a variable per column in R Dataframe
Last Updated :
30 May, 2021
A data frame may contain repeated or missing values. Each column may contain any number of duplicate or repeated instances of the same variable. Data statistics and analysis mostly rely on the task of computing the frequency or count of the number of instances a particular variable contains within each column. In this article, we are going to see how to find the frequency of a variable per column in Dataframe in R Programming Language.
Method 1: Using plyr package
The plyr package is used preferably to experiment with the data, that is, create, modify and delete the columns of the data frame, subjecting them to multiple conditions and user-defined functions. It can be downloaded and loaded into the workspace using the following command:
install.packages("lpyr")
The ldply() method of this package is used to apply a pre-defined function over each element of a list and then combine the results into a data frame. This method can be used to calculate the frequency of the variable belonging to integer, character, or factor type class.
Syntax: ldply(data, fun = NULL)
Arguments :
data - The data over which to apply
fun - The function to be applied
In this method, the sum() function is applied as a function over the elements of each column belonging to the data frame. The function results in the summation of the number of times a particular specified value occurs within the column. The function is applied individually over each column. The output returned is in the form of a data frame where the first column gives the column names assigned to the data frame and the second column displays the total number of occurrences of the specified variable in that column.
Code:
R
library ('plyr')
set.seed(1)
# creating a data frame
data_table <- data.frame(col1 = sample(letters[1:3], 8,
replace = TRUE) ,
col2 = sample(letters[1:3], 8,
replace = TRUE),
col3 = sample(letters[1:3], 8,
replace = TRUE),
col4 = sample(letters[1:3], 8,
replace = TRUE))
print ("Original DataFrame")
print (data_table)
print ("Count of value per column")
# count number of c in each column
ldply(data_table, function(c) sum(c =="a"))
Output:
[1] "Original DataFrame"
col1 col2 col3 col4
1 a b b a
2 c c b b
3 a c c a
4 b a a a
5 a a c b
6 c a a b
7 c b a b
8 b b a a
[1] "Count of value per column"
.id V1
1 col1 3
2 col2 3
3 col3 4
4 col4 4
The method can also be used to calculate the frequency of a vector of values. The function is defined in such a way that it validates the occurrence of an element inside a vector using the %in% operator. The summation of TRUE occurrences within each column is then returned as the counts.
val %in% vec
Code:
R
library ('plyr')
set.seed(1)
# creating a data frame
data_table <- data.frame(col1 = sample(letters[1:3], 8,
replace = TRUE) ,
col2 = sample(letters[1:3], 8,
replace = TRUE),
col3 = sample(letters[1:3], 8,
replace = TRUE),
col4 = sample(letters[1:3], 8,
replace = TRUE))
print ("Original DataFrame")
print (data_table)
print ("Count of value per column")
ldply(data_table, function(c) sum(c %in% vec))
Output:
[1] "Original DataFrame"
col1 col2 col3 col4
1 a b b a
2 c c b b
3 a c c a
4 b a a a
5 a a c b
6 c a a b
7 c b a b
8 b b a a
[1] "Count of value per column"
.id V1
1 col1 5
2 col2 6
3 col3 6
4 col4 8
Method 2: Using sapply() method
The sapply() method, which is used to compute the frequency of the occurrences of a variable within each column of the data frame. The sapply() method is used to apply functions over vectors or lists, and return outputs based on these computations.
sapply (df , FUN)
In this case, the FUN is a user-defined function that initially computed the number of levels within the entire data frame cells. This is done by the application of the unlist() methods which are used to convert a data frame into a nested list. This is followed by the application of unique() which extracts only the unique variable values contained in the data frame.
unique (list)
The vector obtained as an output of the unique() method is explicitly converted to a factor type object by the factor() method, where the levels are the unique values encountered. All the components are thus mapped to levels within this vector.
factor (vec)
In the end, the table() method is then applied. The table() method takes the cross-classifying factors belonging in a vector to build a contingency table of the counts at each combination of factor levels. A contingency table is basically a tabulation of the counts and/or percentages for multiple variables. It excludes the counting of any missing values from the factor variable supplied to the method. The output returned is in the form of a table. This method can be used to cross-tabulation and statistical analysis.
table (fac-vec, .. )
The output is a data frame with row headings as the unique values of the data frame and the column headings as the column names of the original data frame, where each cell value indicates the number of occurrences of that row heading variable in the respective column.
Code:
R
set.seed(1)
# creating a data frame
data_table <- data.frame(col1 = sample(letters[1:3], 8,
replace = TRUE) ,
col2 = sample(letters[1:3], 8,
replace = TRUE),
col3 = sample(letters[1:3], 8,
replace = TRUE),
col4 = sample(letters[1:3], 8,
replace = TRUE))
print ("Original DataFrame")
print (data_table)
# compute unique levels in data frame
lvls <- unique(unlist(data_table))
# apply the summation per value
freq <- sapply(data_table,
function(x) table(factor(x, levels = lvls,
ordered = TRUE)))
print ("Count of variables per column")
print (freq)
Output:
[1] "Original DataFrame"
col1 col2 col3 col4
1 a b b a
2 c c b b
3 a c c a
4 b a a a
5 a a c b
6 c a a b
7 c b a b
8 b b a a
[1] "Count of variables per column"
col1 col2 col3 col4
a 3 3 4 4
c 3 2 2 0
b 2 3 2 4
Similar Reads
Count the number of NA values in a DataFrame column in R
A null value in R is specified using either NaN or NA. In this article, we will see how can we count these values in a column of a dataframe. Approach Create dataframePass the column to be checked to is.na() function Syntax: is.na(column) Parameter: column: column to be searched for na values Return
1 min read
Frequency count of multiple variables in R Dataframe
A data frame may contain repeated or missing values. Each column may contain any number of duplicate or repeated instances of the same variable. Data statistics and analysis mostly rely on the task of computing the frequency or count of the number of instances a particular variable contains within e
4 min read
Count non zero values in each column of R dataframe
In this article, we are going to count the number of non-zero data entries in the data using R Programming Language. To check the number of non-zero data entries in the data first we have to put that data in the data frame by using: data <- data.frame(x1 = c(1,2,0,100,0,3,10), x2 = c(5,0,1,8,10,0
2 min read
Split DataFrame Variable into Multiple Columns in R
In this article, we will discuss how to split dataframe variables into multiple columns using R programming language. Method 1: Using do.call method The strsplit() method in R is used to split the specified column string vector into corresponding parts. The pattern is used to divide the string into
3 min read
Replace contents of factor column in R dataframe
In this article, we are going to see how to replace the content of the factor column in the dataframe in R Programming Language. Example 1: Replacing content of factor column Initially, the factor column is converted to a character column using the explicit conversion of as.character() method in R.
2 min read
How to find the sum of column values of an R dataframe?
In this article, we are going to find the sum of the column values of a dataframe in R with the use of sum() function. Syntax: sum(dataframe$column_name) Creating a Dataframe A dataframe can be created with the use of data.frame() function that is pre-defined in the R library. This function accepts
2 min read
Sum of rows based on column value in R dataframe
In this article, we will be discussing how we can sum up row values based on column value in a data frame in R Programming Language. Suppose you have a data frame like this: Â fruits shop_1 shop_2 1. Apple 1 13 2. Mango 9 5 3. Strawberry 2 14 4. Apple 10 6 5. Apple 3 15 6. Strawberry 11 7 7. Mango 4
2 min read
Convert DataFrame Column to Numeric in R
In this article, we are going to see how to convert DataFrame Column to Numeric in R Programming Language. All dataframe column is associated with a class which is an indicator of the data type to which the elements of that column belong to. Therefore, in order to simulate the data type conversion,
9 min read
Convert Row Names into Column of DataFrame in R
In this article, we will discuss how to Convert Row Names into Columns of Dataframe in R Programming Language. Method 1: Using row.names() row.name() function is used to set and get the name of the DataFrame. Apply the row.name() function to the copy of the DataFrame and a name to the column which
3 min read
Find columns and rows with NA in R DataFrame
A data frame comprises cells, called data elements arranged in the form of a table of rows and columns. A data frame can have data elements belonging to different data types as well as missing values, denoted by NA. Approach Declare data frameUse function to get values to get NA valuesStore positio
3 min read