Remove rows with missing values using R
Last Updated :
15 Mar, 2024
In this article, we will explore various methods to remove rows containing missing values (NA) in the R Programming Language.
What are missing values?
Missing values are the data points that are absent for a specific variable in a dataset. It can be represented in various ways such as Blank spaces, null values, or any special symbols like"NA".Because of these various reasons missing values can occur, such as data entry errors, malfunction in equipment...etc.Dealing with missing data is a crucial step in data analysis. Some of the methods are.
- na.omit()
- complete.cases()
Remove rows with missing values using na. omit()
na. omit() function is used for removing NA values that were present in the dataset row-wise. This function checks each row and removes any row that contains one or more NA values, which works more efficiently in manner while dealing with missing values. For example, we have a data frame consisting of rows and columns, using the na.omit() function then all NA values are removed from the dataframe on row-wise.
Syntax:
na.omit(dataframe )
Here, created a dataframe.After using function 'na.omit()', it removes all NA values which were present in the dataframe by row-wise.
R
df1= data.frame(
A1 = c(NA, 10, NA, 7, 8, 11,20),
A2 = c("A", 9, 3, "B", "C", "D","E"),
A3 = c(1, 0, NA, 1, 1, NA,3)
)
#printing the dataframe
print(df1)
print("After removing the NA values ")
result=na.omit(df1)
print(result)
Output:
A1 A2 A3
1 NA A 1
2 10 9 0
3 NA 3 NA
4 7 B 1
5 8 C 1
6 11 D NA
7 20 E 3
[1] "After removing the NA values "
A1 A2 A3
2 10 9 0
4 7 B 1
5 8 C 1
7 20 E 3
Here, created a dataframe with the help of vectors. After using the function 'na.omit()', it removes all NA values which were present in the dataframe by row-wise.
R
vec1=c(1,2,NA,4,8,4)
vec2=c(6,7,8,9,2,9)
vec3=c(34,NA,67,78,23,12)
df1=data.frame(vec1,vec2,vec3)
#printing the dataframe
print(df1)
print("After removing NA values: ")
df2=na.omit(df1)
print(df2)
Output:
vec1 vec2 vec3
1 1 6 34
2 2 7 NA
3 NA 8 67
4 4 9 78
5 8 2 23
6 4 9 12
[1] "After removing NA values: "
vec1 vec2 vec3
1 1 6 34
4 4 9 78
5 8 2 23
6 4 9 12
Remove rows with missing values using complete.cases()
The complete.cases() is used for removing missing data in a dataframe or in matrix or in a vector. This function can easily filter the rows with missing data and works more efficient in manner . This function is mostly useful, when you want to remove the data based on missing values.
For example, if you have a dataset and you want to remove rows that have missing values,then you can use 'complete.cases()' .
Syntax:
complete.cases( dataframe)
R
df1 <- data.frame(
A1 = c(NA, 10, NA, 7, 8, 11,20),
A2 = c("A", 9, 3, "B", "C", "D","E"),
A3 = c(1, 0, NA, 1, 1, NA,3)
)
#printing the dataframe
print(df1)
print("After removing the NA values ")
result=df1[complete.cases(df1),]
print(result)
Output:
A1 A2 A3
1 NA A 1
2 10 9 0
3 NA 3 NA
4 7 B 1
5 8 C 1
6 11 D NA
7 20 E 3
[1] "After removing the NA values "
A1 A2 A3
2 10 9 0
4 7 B 1
5 8 C 1
7 20 E 3
Here we created a dataframe with the help of vectors. After ,using the function 'complete.cases()' removed all NA values by row-wise.
R
vec1 = c(1,2,NA,4,8,4)
vec2 = c(6,7,8,9,2,9)
vec3 = c(34,NA,67,78,23,12)
#printing the dataframe
print(df1)
print("After removing the NA values ")
result=df1[complete.cases(df1),]
print(result)
Output:
A1 A2 A3
1 NA A 1
2 10 9 0
3 NA 3 NA
4 7 B 1
5 8 C 1
6 11 D NA
7 20 E 3
[1] "After removing the NA values "
A1 A2 A3
2 10 9 0
4 7 B 1
5 8 C 1
7 20 E 3
Conclusion
In conclusion ,we learned two different methods for removing a missing value by using the functions 'na.omit() ' and ' complete.cases() '. R language offers versatile tools for efficient data manipulation and analysis.
Similar Reads
Remove Rows with NA Using dplyr Package in R
NA means Not Available is often used for missing values in a dataset. In Machine Learning NA values are a common problem and if not treated properly can create severe issues during data analysis. NA is also referred as NaN which means Not a number. To understand NA values we can think of an admissio
5 min read
How to Select Rows with NA Values in R
In this article, we will examine various methods to select rows with NA values in the R programming language. What are NA values?NA represents 'not available' used for indicating the missing values or undefined data in the datasets. It is a logical constant of length 1. NA is one of the reserved wor
4 min read
Remove Duplicate rows in R using Dplyr
In this article, we are going to remove duplicate rows in R programming language using Dplyr package. Method 1: distinct() This function is used to remove the duplicate rows in the dataframe and get the unique data Syntax: distinct(dataframe) We can also remove duplicate rows based on the multiple c
3 min read
How to remove NA values with dplyr filter
In this article, we will examine various methods to remove NA values with dplyr filter by using R Programming Language. Remove NA values with the dplyr filterR language offers various methods to remove NA values with dplyr filter efficiently. By using these methods provided by R, it is possible to r
3 min read
Handling Missing Values with Random Forest
Data imputation is a critical challenge in machine learning, with missing values impacting statistical modelling. Random Forest, an ensemble learning method, is a robust solution for accurate predictions, particularly in healthcare. It can handle classification and regression problems, and it is mor
10 min read
How To Remove Row In R
In R Programming Language you can remove rows from a data frame using various methods depending on your specific requirements. Here are a few common approaches: Remove Row Using Logical IndexingYou can remove rows based on a logical condition using indexing. For example, to remove rows where a certa
3 min read
Remove Axis Values of Plot in Base R
In this article, we will be looking at the approach to remove axis values of the plot using the base functions of the R programming language. In this approach to remove the axis values of the plot, the user just need to use the base function plot() of the R programming language, and further in this
2 min read
Replace NAs with specified values
In this article, we will examine various methods to replace NAs with specified values in the R programming language. What is NA?NA stands for Not Available. These NAs are found with empty spaces or with a symbol NA which represents unavailable data points in the datasets. At the time of transferring
4 min read
Display NA values in excel when using WriteXLS in R
The term NA refers to "No Value is Available". The NA values in an Excel file are the cells that are left empty( there is no data). In R Programming, the data in the excel files are manipulated using the xlsx package. The Null (NA) values present in an excel file are displayed by initializing the a
3 min read
Remove Parentheses and Text Within from Strings in R
When working with text data in R Programming Language you might encounter situations where you need to clean up strings by removing certain characters or patterns. A common task is to remove parentheses and the text within them from a string. This can be useful in many contexts, such as data preproc
3 min read