Remove Duplicate rows in R using Dplyr
Last Updated :
21 Jul, 2021
In this article, we are going to remove duplicate rows in R programming language using Dplyr package.
Method 1: distinct()
This function is used to remove the duplicate rows in the dataframe and get the unique data
Syntax:
distinct(dataframe)
We can also remove duplicate rows based on the multiple columns/variables in the dataframe
Syntax:
distinct(dataframe,column1,column2,.,column n)
Dataset in use:
Example 1: R program to remove duplicate rows from the dataframe
R
# load the package
library(dplyr)
# create dataframe with three columns
# named id,name and address
data1=data.frame(id=c(1,2,3,4,5,6,7,1,4,2),
name=c('sravan','ojaswi','bobby',
'gnanesh','rohith','pinkey',
'dhanush','sravan','gnanesh',
'ojaswi'),
address=c('hyd','hyd','ponnur','tenali',
'vijayawada','vijayawada','guntur',
'hyd','tenali','hyd'))
# remove duplicate rows
print(distinct(data1))
Output:
Example 2: Remove duplicate rows based on single column
R
# load the package
library(dplyr)
# create dataframe with three columns
# named id,name and address
data1=data.frame(id=c(1,2,3,4,5,6,7,1,4,2),
name=c('sravan','ojaswi','bobby',
'gnanesh','rohith','pinkey',
'dhanush','sravan','gnanesh',
'ojaswi'),
address=c('hyd','hyd','ponnur','tenali',
'vijayawada','vijayawada','guntur',
'hyd','tenali','hyd'))
# remove duplicate rows based on name
# column
print(distinct(data1,name))
Output:
Example 3: Remove duplicate rows based on multiple columns
R
# load the package
library(dplyr)
# create dataframe with three columns
# named id,name and address
data1=data.frame(id=c(1,2,3,4,5,6,7,1,4,2),
name=c('sravan','ojaswi','bobby',
'gnanesh','rohith','pinkey',
'dhanush','sravan','gnanesh',
'ojaswi'),
address=c('hyd','hyd','ponnur','tenali',
'vijayawada','vijayawada','guntur',
'hyd','tenali','hyd'))
# remove duplicate rows based on
# name and address columns
print(distinct(data1,address,name))
Output:

Method 2: using duplicated() functionÂ
duplicated() function will return the duplicated rows and !duplicated() function will return the unique rows.
Syntax:
dataframe[!duplicated(dataframe$column_name), ]
Here, dataframe is the input dataframe and column_name is the column in dataframe, based on that column the duplicate data is removed.
Example: R program to remove duplicate data based on particular column
R
# load the package
library(dplyr)
# create dataframe with three columns
# named id,name and address
data1=data.frame(id=c(1,2,3,4,5,6,7,1,4,2),
name=c('sravan','ojaswi','bobby',
'gnanesh','rohith','pinkey',
'dhanush','sravan','gnanesh',
'ojaswi'),
address=c('hyd','hyd','ponnur','tenali',
'vijayawada','vijayawada','guntur',
'hyd','tenali','hyd'))
# remove duplicate rows using duplicated()
# function based on name column
print(data1[!duplicated(data1$name), ] )
print("=====================")
# remove duplicate rows using duplicated()
# function based on id column
print(data1[!duplicated(data1$id), ] )
print("=====================")
# remove duplicate rows using duplicated()
# function based on address column
print(data1[!duplicated(data1$address), ] )
print("=====================")
Output:

Method 3: Using unique() function
unique() function is used to remove duplicate rows by returning the unique data
Syntax:
unique(dataframe)
To get unique data from column pass the name of the column along with the name of the dataframe,
Syntax:
unique(dataframe$column_name)
Where, dataframe is the input dataframe and column_name is the column in the dataframe.
Example 1: R program to remove duplicates using unique() function
R
# load the package
library(dplyr)
# create dataframe with three columns
# named id,name and address
data1=data.frame(id=c(1,2,3,4,5,6,7,1,4,2),
name=c('sravan','ojaswi','bobby',
'gnanesh','rohith','pinkey',
'dhanush','sravan','gnanesh',
'ojaswi'),
address=c('hyd','hyd','ponnur','tenali',
'vijayawada','vijayawada','guntur',
'hyd','tenali','hyd'))
# get unique data from the dataframe
print(unique(data1))
Output:
Example 2: R program to remove duplicate in particular column
R
# load the package
library(dplyr)
# create dataframe with three columns
# named id,name and address
data1=data.frame(id=c(1,2,3,4,5,6,7,1,4,2),
name=c('sravan','ojaswi','bobby',
'gnanesh','rohith','pinkey',
'dhanush','sravan','gnanesh',
'ojaswi'),
address=c('hyd','hyd','ponnur','tenali',
'vijayawada','vijayawada','guntur',
'hyd','tenali','hyd'))
# get unique data from the dataframe
# in id column
print(unique(data1$id))
# get unique data from the dataframe
# in name column
print(unique(data1$name))
# get unique data from the dataframe
# in address column
print(unique(data1$address))
Output:
Similar Reads
Remove duplicate rows based on multiple columns using Dplyr in R
In this article, we will learn how to remove duplicate rows based on multiple columns using dplyr in R programming language. Dataframe in use: lang value usage 1 Java 21 21 2 C 21 21 3 Python 3 0 4 GO 5 99 5 RUST 180 44 6 Javascript 9 48 7 Cpp 12 53 8 Java 21 21 9 Julia 6 6 10 Typescript 0 8 11 Pyth
4 min read
Remove Rows with NA Using dplyr Package in R
NA means Not Available is often used for missing values in a dataset. In Machine Learning NA values are a common problem and if not treated properly can create severe issues during data analysis. NA is also referred as NaN which means Not a number.Dplyr package in R is a popular package for Data man
4 min read
Filter or subsetting rows in R using Dplyr
In this article, we are going to filter the rows from dataframe in R programming language using Dplyr package. Dataframe in use: Method 1: Subset or filter a row using filter() To filter or subset row we are going to use the filter() function. Syntax: filter(dataframe,condition) Here, dataframe is t
6 min read
How to Remove a Column using Dplyr package in R
In this article, we are going to remove a column(s) in the R programming language using dplyr library. Dataset in use: Remove column using column nameHere we will use select() method to select and remove column by its name. Syntax: select(dataframe,-column_name) Here, dataframe is the input datafram
3 min read
Row wise operation in R using Dplyr
The dplyr package in R programming is used to perform simulations in the data by performing manipulations and transformations. It can be installed into the working space using the following command : install.packages("dplyr")Create Dataframe using Row The data frame created by tibble contains rows a
4 min read
How to remove NA values with dplyr filter
In this article, we will examine various methods to remove NA values with dplyr filter by using R Programming Language. Remove NA values with the dplyr filterR language offers various methods to remove NA values with dplyr filter efficiently. By using these methods provided by R, it is possible to r
3 min read
How To Remove Duplicates From Vector In R
A vector is a basic data structure that is used to represent an ordered collection of elements of the same data type. It is one-dimensional and can contain numeric, character, or logical values. It is to be noted that the vector in C++ and the vector in R Programming Language are not the same. In C+
4 min read
Remove rows with missing values using R
Missing values are the data points that are absent for a specific variable in a dataset. It can be represented in various ways such as Blank spaces, null values or any special symbols like"NA". Because of these various reasons missing values can occur, such as data entry errors, malfunction in equip
3 min read
Identify and Remove Duplicate Data in R
A dataset can have duplicate values and to keep it redundancy-free and accurate, duplicate rows need to be identified and removed. In this article, we are going to see how to identify and remove duplicate data in R. First we will check if duplicate data is present in our data, if yes then, we will r
2 min read
Remove Axis Labels using ggplot2 in R
In this article, we are going to see how to remove axis labels of the ggplot2 plot in the R programming language. We will use theme() function from ggplot2 package. In this approach to remove the ggplot2 plot labels, the user first has to import and load the ggplot2 package in the R console, which i
2 min read