Visualizing Missing Data with Barplot in R
Last Updated :
07 Mar, 2022
In this article, we will discuss how to visualize missing data with barplot using R programming language.
Missing Data are those data points that are not recorded i.e not entered in the dataset. Usually, missing data are represented as NA or NaN or even an empty cell.
Dataset in use:
In the case of larger datasets, few missing data might not affect the overall information whereas it can be a huge loss in information in the case of smaller datasets. These missing data are removed or imputed depending on the dataset. To decide how to deal with missing data we'll first see how to visualize the missing data points.
Let us first count the total number of missing values.
Example: Counting missing values
R
# Creating a sample dataframe using 3 vectors
age = c(12,34,NA,7,15,NA)
name = c('rob',NA,"arya","jon",NA,NA)
grade = c("A","A","D","B","C","B")
df <- data.frame(age,name,grade)
# count the total number of missing values
sum(is.na(df))
Output:
5
We can also find out how many missing values are there in each attribute/column.
Example: Count missing values in each attribute/column
R
# Creating a sample dataframe using 3 vectors
age = c(12,34,NA,7,15,NA)
name = c('rob',NA,"arya","jon",NA,NA)
grade = c("A","A","D","B","C","B")
df <- data.frame(age,name,grade)
# count number of missing values in each
# attribute/column
sapply(df, function(x) sum(is.na(x)))
Output:
age name grade
2 3 0
Visualizing all missing values
Let's first visualize the frequencies for missing and non-missing values for entire data using barplot( ) function in R.
Syntax of barplot():
barplot(x, name.args = NULL, col = " ", main = " " , xlab = " ", ylab = " " , beside = FALSE , horiz = TRUE ...)
Parameters:
- x : vector or matrix
- names.arg : label for each bar
- col : color for the bars
- main : title of the barplot
- xlab : label for x-axis
- ylab : label for y-axis
- beside : to specify grouped or stacked barplot
- horiz : orientation of bars (horizontal or vertical)
Example: Visualizing all missing values
R
# Creating a sample dataframe using 3 vectors
age = c(12,34,NA,7,15,NA)
name = c('rob',NA,"arya","jon",NA,NA)
grade = c("A","A","D","B","C","B")
df <- data.frame(age,name,grade)
# converting a frequency table for missing
# values to dataframe
freqDf <- data.frame(table(is.na(df)))
# barplot for visualization
barplot(freqDf$Freq , main = "Total Missing values",
xlab = "Missing Data", ylab = "Frequency",
names.arg = c("FALSE","TRUE"),
col = c("#80dfff","lightgreen"))
# legend for barplot
legend("topright",
c("Non-Missing Values","Missing Values"),
fill = c("#80dfff","lightgreen"))
Output:

Visualizing missing data for one column
For this, we select the column that we are trying to visualize and then do the needful.
Example: Visualizing missing data for one column
R
# Creating a sample dataframe using 3 vectors
age = c(12,34,NA,7,15,NA)
name = c('rob',NA,"arya","jon",NA,NA)
grade = c("A","A","D","B","C","B")
df <- data.frame(age,name,grade)
# frequency table for missing data for 1 column,
# here age column is taken
freqDf2 <- data.frame(table(is.na(df$age)))
# barplot for 1 column/feature
barplot(freqDf2$Freq,
main = "Total Missing values",xlab = "Missing Data",
ylab = "Frequency",names.arg = c("FALSE","TRUE"),
col = c("#ffb3b3","#99e6ff"))
# legend for barplot
legend("topright",
c("Non-Missing Values","Missing Values"),
fill = c("#ffb3b3","#99e6ff"))
Output:

Visualizing missing data for all columns
Let's create a function to transform the dataframe to a binary TRUE/FALSE matrix and then visualize it using a barplot in R.
Example: Visualizing missing data for all columns
R
age = c(12,34,NA,7,15,NA)
name = c('rob',NA,"arya","jon",NA,NA)
grade = c("A","A","D","B","C","B")
df <- data.frame(age,name,grade)
# function convert dataframe to binary TRUE/FALSE matrix
toBinaryMatrix <- function(df){
m<-c()
for(i in colnames(df)){
x<-sum(is.na(df[,i]))
# missing value count
m<-append(m,x)
# non-missing value count
m<-append(m,nrow(df)-x)
}
# adding column and row names to matrix
a<-matrix(m,nrow=2)
rownames(a)<-c("TRUE","FALSE")
colnames(a)<-colnames(df)
return(a)
}
# function call
binMat = toBinaryMatrix(df)
binMat
Output:
age name grade
TRUE 2 3 0
FALSE 4 3 6
Stacked barplot
The missing values can be represented in contrast with the values present using a stacked barplot.
Example: Stacked barplot
R
age = c(12,34,NA,7,15,NA)
name = c('rob',NA,"arya","jon",NA,NA)
grade = c("A","A","D","B","C","B")
df <- data.frame(age,name,grade)
# stacked barplot for missing data in all columns
barplot(binMat,
main = "Missing values in all features",xlab = "Frequency",
col = c("#4dffd2","#ff9999"))
# legend for barplot
legend("bottomright",
c("Missing values","Non-Missing values"),
fill = c("#4dffd2","#ff9999"))
Output:

Grouped barplot
Another visualization that can be helpful is a grouped barplot.
Example: Grouped barplot
R
age = c(12,34,NA,7,15,NA)
name = c('rob',NA,"arya","jon",NA,NA)
grade = c("A","A","D","B","C","B")
df <- data.frame(age,name,grade)
# grouped barplot for missing data in all columns
barplot(binMat,
main = "Missing values in all features",xlab = "Frequency",
col = c("#ffff99","#33bbff"),beside=TRUE,
horiz = TRUE)
# legend for barplot
legend("right",c("Missing values","Non-Missing values"),
fill = c("#ffff99","#33bbff"))
Output:
Similar Reads
Master Data Visualization With ggplot2 In this article, we are going to see the master data visualization with ggplot2 in R Programming Language. Generally, data visualization is the pictorial representation of a dataset in a visual format like charts, plots, etc. These are the important graphs in data visualization with ggplot2, Bar Ch
8 min read
Bar Chart Visualization with Excel Power View Bar charts are commonly used to compare data points from many data series. The categories are sorted vertically, and values are organized horizontally in a bar chart. To learn more about bar charts please refer here. A bar chart, often known as a bar graph, is a type of chart that uses rectangular b
3 min read
Create multiple barplots in R side by side In R programming language, barplot is a graphical representation of linear data which is one-dimensional. Bar plot is used for statistical analysis for easy understanding of the data. It represents the given data in the form of bars. It can give bar graphs along x-axis and y-axis. Where x-axis repre
2 min read
Change Space and Width of Bars in ggplot2 Barplot in R In this article, we will see how to change the Space and width of bars in ggplot2 barplot in R. For Create a simple Barplot using ggplot2, first we have to load the ggplot2 package using the library() function. If you have not already installed then you can install it by writing the below command i
4 min read
Grouped barplot in R with error bars In this article, we are going to see how to create grouped barplot in the R programming language with error bars. A data frame can be created in R working space using the data.frame() method. The tidyverse package is installed and loaded into the working space in order to perform data mutations and
3 min read
Coping with Missing, Invalid and Duplicate Data in R Data is the base of statistical analysis and machine learning. The free data we get for processing is often raw and has many issues like invalid terms, and missing or duplicate values that can cause major changes in our model processing and estimation. We use the past data to train our model and pre
15+ min read