How to Remove Duplicate Rows in R DataFrame?

Last Updated : 15 Feb, 2022

In this article, we will discuss how to remove duplicate rows in dataframe in R programming language.

Dataset in use:

Method 1: Using distinct()

This method is available in dplyr package which is used to get the unique rows from the dataframe. We can remove rows from the entire which are duplicates and also we cab remove duplicate rows in a particular column.

Syntax:

distinct(dataframe)

distinct(dataframe,column1,column2,.,column n)

Example: R program to remove duplicate rows using distinct() function

# load the package
library(dplyr)

# create dataframe
data=data.frame(names=c("manoj","bobby","sravan",
                        "deepu","manoj","bobby") ,
                id=c(1,2,3,4,1,2), 
                subjects=c("java","python","php",
                           "html","java","python"))


# remove all duplicate rows
print(distinct(data))

# remove  duplicate rows in subjects column
print(distinct(data,subjects))

# remove  duplicate rows in namescolumn
print(distinct(data,names))

Output:

Method 2: Using duplicated()

This function will return the duplicates from the dataframe, In order to get the unique rows, we have to specify ! operator before this method

Syntax:

data[!duplicated(data$column_name), ]

where,

data is the input dataframe
column_name is the column where duplicates are removed in this column

Example: R program to remove duplicate rows using duplicated() function

# create dataframe
data=data.frame(names=c("manoj","bobby","sravan",
                        "deepu","manoj","bobby") ,
                id=c(1,2,3,4,1,2), 
                subjects=c("java","python","php",
                           "html","java","python"))


# remove duplicate rows in subjects column
print(data[!duplicated(data$subjects), ])

# remove  duplicate rows in names column
print(data[!duplicated(data$names), ])

# remove  duplicate rows in  id column
print(data[!duplicated(data$id), ])

Output:

Method 3 : Using unique()

This will get the unique rows from the dataframe.

Syntax:

unique(dataframe)

To get in a particular column

Syntax:

unique(dataframe$column_name

Example: R program to remove duplicate rows using unique() function

# create dataframe
data=data.frame(names=c("manoj","bobby","sravan",
                        "deepu","manoj","bobby") ,
                id=c(1,2,3,4,1,2), 
                subjects=c("java","python","php",
                           "html","java","python"))


# remove duplicate rows in subjects column
print(unique(data$subjects))

# remove  duplicate rows in names column
print(unique(data$names))

# remove  duplicate rows in  id column
print(unique(data$id))

Output:

[1] "java"   "python" "php"    "html"  
[1] "manoj"  "bobby"  "sravan" "deepu"  
[1] 1 2 3 4