How to Find and Count Missing Values in R DataFrame
Last Updated :
21 Dec, 2023
In this article, we will be discussing how to find and count missing values in the R programming language.
Find and Count Missing Values in the R DataFrame
Generally, missing values in the given data are represented with NA. In R programming, the missing values can be determined by is.na() method.
This method accepts the data variable as a parameter and determines whether the data point is a missing value or not. To find the location of the missing value use which() method in which is.na() method is passed to which() method.
To count the total number of missing values use the sum() method in which is.na() method is passed.
Let's look into the syntax of methods that find the location and total count of missing values.
# finds the location of missing values
which(is.na(data))
# finds the count of missing valuesÂ
sum(is.na(data))
Find and count the Missing values From the entire Data Frame
In order to find the location of missing values and their count from the entire data frame pass the data frame name to the is.na() method. Let's look into a program for finding and counting the missing values from the entire Data Frame.
R
# create a data frame
stats <- data.frame(player=c('A', 'B', 'C', 'D'),
runs=c(100, 200, 408, NA),
wickets=c(17, 20, NA, 5))
# find location of missing values
print("Position of missing values ")
which(is.na(stats))
# count total missing values
print("Count of total missing values ")
sum(is.na(stats))
Output
[1] "Position of missing values "
[1] 8 11[1] "Count of total missing values "
[1] 2
In this code we created a Data frame "stats" that holds data of cricketers with few missing values. To determine the location and count of missing values in the given data we used which(is.na(stats)) and sum(is.na(stats)) methods.
Count the number of Missing Values with summary
R
# create a data frame
stats <- data.frame(player=c('A', 'B', 'C', 'D'),
runs=c(NA, 200, 408, NA),
wickets=c(17, 20, NA, 8))
summary(stats)
Output:
player runs wickets
Length:4 Min. :200 Min. : 8.0
Class :character 1st Qu.:252 1st Qu.:12.5
Mode :character Median :304 Median :17.0
Mean :304 Mean :15.0
3rd Qu.:356 3rd Qu.:18.5
Max. :408 Max. :20.0
NA's :2 NA's :1
Here in each column at last it will shows the number of missing values parsant in each columns.
Count the number of Missing Values with colSums
R
# create a data frame
stats <- data.frame(player=c('A', 'B', 'C', 'D'),
runs=c(NA, 200, 408, NA),
wickets=c(17, 20, NA, 8))
colSums(is.na(stats))
Output:
player runs wickets
0 2 1
Find and count the Missing values in one column of a Data Frame
In order to find the location of missing values and their count in one particular column of a data frame pass the dataframeName$columnName to the is.na() method. Â Let's look into a program for finding and counting the missing values in the specified column of a Data Frame.
R
# create a data frame
stats <- data.frame(player=c('A', 'B', 'C', 'D'),
runs=c(NA, 200, 408, NA),
wickets=c(17, 20, NA, 8))
print("Location of missing values in runs column")
which(is.na(stats$runs))
print("Count of missing values in wickets column")
sum(is.na(stats$wickets))
Output
[1] "Location of missing values in runs column"
[1] 1 4[1] "Count of missing values in wickets column"
[1] 1
In this code, we will find the location and count of missing values in a certain column. This output indicates that there are missing values in the "runs" column, specifically at positions 1 and 4 (rows 1 and 4).
This output indicates that there is 1 missing value in the "wickets" column.
Find and count missing values in all columns in Data Frame
We can also find the missing values in the data frame column-wise. It reduces the complexity of searching for missing values in the data frame. Let's look into a sample example program for finding and counting the missing values column-wise.
R
# create a data frame
stats <- data.frame(player=c('A', 'B', 'C', 'D'),
runs=c(100, 200, 408, NA),
wickets=c(17, 20, NA, 5))
# find location of missing values column wise
print("Position of missing values by column wise")
sapply(stats, function(x) which(is.na(x)))
# count the missing values by column wise
print("Count of missing values by column wise")
sapply(stats, function(x) sum(is.na(x)))
Output
"Position of missing values by column wise"
$player
integer(0)
$runs
4
$wickets
3
"Count of missing values by column wise"
player runs wickets
0 1 1
In this code, we will find the position and count of missing values in all the given columns in the dataframe. In order to find the missing values in all columns use apply function with the which and the sum function in is.na() method.
From the output, we can say that-
- player column has no missing values.
- runs column has 1 missing value at 4th position.
- wickets column has 1 missing value at 3rd position.
Similar Reads
How to find missing values in a factor in R
Missing values are a regular occurrence in data analysis, and they might limit the precision and trustworthiness of your findings. When working with factors in R, the process gets considerably more complex. Have no fear! This article is your guide through the maze of missing values in R factors. We'
2 min read
Replace Missing Values by Column Mean in R DataFrame
In this article, we are going to see how to replace missing values with columns mean in R Programming Language. Missing values in a dataset are usually represented as NaN or NA. Such values must be replaced with another value or removed. This process of replacing another value in place of missing da
4 min read
Count NaN or missing values in Pandas DataFrame
In this article, we will see how to Count NaN or missing values in Pandas DataFrame using isnull() and sum() method of the DataFrame. 1. DataFrame.isnull() MethodDataFrame.isnull() function detect missing values in the given object. It return a boolean same-sized object indicating if the values are
3 min read
How to find duplicate values in a factor in R
finding duplicates in data is an important step in data analysis and management to ensure data quality, accuracy, and efficiency. In this article, we will see several approaches to finding duplicate values in a factor in the R Programming Language. It can be done with two methods Using duplicated()
2 min read
How to find missing values in a list in R
Missing values are frequently encountered in data analysis. In R Programming Language effectively dealing with missing data is critical for correct analysis and interpretation. Whether you're a seasoned data scientist or a new R user, understanding how to identify missing values is critical. In this
3 min read
How to find missing values in a matrix in R
In this article, we will examine various methods for finding missing values in a matrix by using R Programming Language. What are missing values?The data points in a dataset that are missing for a particular variable are known as missing values. These missing values are represented in various ways s
3 min read
How to Calculate Correlation in R with Missing Values
When calculating correlation in R, missing values are excluded by default using a method called pairwise deletion. This means R ignores any observation where a variable in the pair is missing.How to Calculate Correlation in R with Missing ValuesThere are several ways to calculate correlation in R wh
3 min read
Count Values in Pandas Dataframe
Counting values in Pandas dataframe is important for understanding the distribution of data, checking for missing values or summarizing data. In this article, we will learn various methods to count values in a Pandas DataFrameWe will be using below dataframe to learn about various methods:Pythonimpo
3 min read
Count the NaN values in one or more columns in Pandas DataFrame
Let us see how to count the total number of NaN values in one or more columns in a Pandas DataFrame. In order to count the NaN values in the DataFrame, we are required to assign a dictionary to the DataFrame and that dictionary should contain numpy.nan values which is a NaN(null) value. Consider the
2 min read
How to Impute Missing Values in R?
In this article, we will discuss how to impute missing values in R programming language. In most datasets, there might be missing values either because it wasn't entered or due to some error. Replacing these missing values with another value is known as Data Imputation. There are several ways of imp
3 min read