How to Find and Count Missing Values in R DataFrame
Last Updated :
21 Dec, 2023
In this article, we will be discussing how to find and count missing values in the R programming language.
Find and Count Missing Values in the R DataFrame
Generally, missing values in the given data are represented with NA. In R programming, the missing values can be determined by is.na() method.
This method accepts the data variable as a parameter and determines whether the data point is a missing value or not. To find the location of the missing value use which() method in which is.na() method is passed to which() method.
To count the total number of missing values use the sum() method in which is.na() method is passed.
Let's look into the syntax of methods that find the location and total count of missing values.
# finds the location of missing values
which(is.na(data))
# finds the count of missing valuesÂ
sum(is.na(data))
Find and count the Missing values From the entire Data Frame
In order to find the location of missing values and their count from the entire data frame pass the data frame name to the is.na() method. Let's look into a program for finding and counting the missing values from the entire Data Frame.
R
# create a data frame
stats <- data.frame(player=c('A', 'B', 'C', 'D'),
runs=c(100, 200, 408, NA),
wickets=c(17, 20, NA, 5))
# find location of missing values
print("Position of missing values ")
which(is.na(stats))
# count total missing values
print("Count of total missing values ")
sum(is.na(stats))
Output
[1] "Position of missing values "
[1] 8 11[1] "Count of total missing values "
[1] 2
In this code we created a Data frame "stats" that holds data of cricketers with few missing values. To determine the location and count of missing values in the given data we used which(is.na(stats)) and sum(is.na(stats)) methods.
Count the number of Missing Values with summary
R
# create a data frame
stats <- data.frame(player=c('A', 'B', 'C', 'D'),
runs=c(NA, 200, 408, NA),
wickets=c(17, 20, NA, 8))
summary(stats)
Output:
player runs wickets
Length:4 Min. :200 Min. : 8.0
Class :character 1st Qu.:252 1st Qu.:12.5
Mode :character Median :304 Median :17.0
Mean :304 Mean :15.0
3rd Qu.:356 3rd Qu.:18.5
Max. :408 Max. :20.0
NA's :2 NA's :1
Here in each column at last it will shows the number of missing values parsant in each columns.
Count the number of Missing Values with colSums
R
# create a data frame
stats <- data.frame(player=c('A', 'B', 'C', 'D'),
runs=c(NA, 200, 408, NA),
wickets=c(17, 20, NA, 8))
colSums(is.na(stats))
Output:
player runs wickets
0 2 1
Find and count the Missing values in one column of a Data Frame
In order to find the location of missing values and their count in one particular column of a data frame pass the dataframeName$columnName to the is.na() method. Â Let's look into a program for finding and counting the missing values in the specified column of a Data Frame.
R
# create a data frame
stats <- data.frame(player=c('A', 'B', 'C', 'D'),
runs=c(NA, 200, 408, NA),
wickets=c(17, 20, NA, 8))
print("Location of missing values in runs column")
which(is.na(stats$runs))
print("Count of missing values in wickets column")
sum(is.na(stats$wickets))
Output
[1] "Location of missing values in runs column"
[1] 1 4[1] "Count of missing values in wickets column"
[1] 1
In this code, we will find the location and count of missing values in a certain column. This output indicates that there are missing values in the "runs" column, specifically at positions 1 and 4 (rows 1 and 4).
This output indicates that there is 1 missing value in the "wickets" column.
Find and count missing values in all columns in Data Frame
We can also find the missing values in the data frame column-wise. It reduces the complexity of searching for missing values in the data frame. Let's look into a sample example program for finding and counting the missing values column-wise.
R
# create a data frame
stats <- data.frame(player=c('A', 'B', 'C', 'D'),
runs=c(100, 200, 408, NA),
wickets=c(17, 20, NA, 5))
# find location of missing values column wise
print("Position of missing values by column wise")
sapply(stats, function(x) which(is.na(x)))
# count the missing values by column wise
print("Count of missing values by column wise")
sapply(stats, function(x) sum(is.na(x)))
Output
"Position of missing values by column wise"
$player
integer(0)
$runs
4
$wickets
3
"Count of missing values by column wise"
player runs wickets
0 1 1
In this code, we will find the position and count of missing values in all the given columns in the dataframe. In order to find the missing values in all columns use apply function with the which and the sum function in is.na() method.
From the output, we can say that-
- player column has no missing values.
- runs column has 1 missing value at 4th position.
- wickets column has 1 missing value at 3rd position.
Similar Reads
How to find missing values in a factor in R Missing values are a regular occurrence in data analysis, and they might limit the precision and trustworthiness of your findings. When working with factors in R, the process gets considerably more complex. Have no fear! This article is your guide through the maze of missing values in R factors. We'
2 min read
Count NaN or missing values in Pandas DataFrame In this article, we will see how to Count NaN or missing values in Pandas DataFrame using isnull() and sum() method of the DataFrame. 1. DataFrame.isnull() MethodDataFrame.isnull() function detect missing values in the given object. It return a boolean same-sized object indicating if the values are
3 min read
How to find duplicate values in a factor in R finding duplicates in data is an important step in data analysis and management to ensure data quality, accuracy, and efficiency. In this article, we will see several approaches to finding duplicate values in a factor in the R Programming Language. It can be done with two methods Using duplicated()
2 min read
How to find missing values in a list in R Missing values are frequently encountered in data analysis. In R Programming Language effectively dealing with missing data is critical for correct analysis and interpretation. Whether you're a seasoned data scientist or a new R user, understanding how to identify missing values is critical. In this
3 min read
How to find missing values in a matrix in R In this article, we will examine various methods for finding missing values in a matrix by using R Programming Language. What are missing values?The data points in a dataset that are missing for a particular variable are known as missing values. These missing values are represented in various ways s
3 min read
Count Values in Pandas Dataframe Counting values in Pandas dataframe is important for understanding the distribution of data, checking for missing values or summarizing data. In this article, we will learn various methods to count values in a Pandas DataFrameWe will be using below dataframe to learn about various methods:Pythonimpo
3 min read