How To Remove Duplicates From Vector In R
Last Updated :
24 Apr, 2025
A vector is a basic data structure that is used to represent an ordered collection of elements of the same data type. It is one-dimensional and can contain numeric, character, or logical values. It is to be noted that the vector in C++ and the vector in R Programming Language are not the same. In C++, a vector is a dynamic array that can grow or shrink in size but in the case of R, a vector is a fundamental data structure itself.
In R language, a vector is initialized using the c() function which stands for "combine" or "concatenate". A vector
Initializing Numerical Vector
R
# Initializing a Numeric vector
numericVector <- c(1,2,3,4,5,6)
# Printing the numericVector.
cat("numericalVector: " , numericVector, "\n")
Output:
numericalVector: 1 2 3 4 5 6
Initializing Character Vector
R
# Initializing a Character Vector
characterVector <- c("Anakin", "Luke", "Ezra", "order66")
# Printing the characterVector.
cat("characterVector: " , characterVector, "\n")
Output:
characterVector: Anakin Luke Ezra order66
Initializing Logical Vector
R
# Initializing a Logical vector
logicalVector <- c(TRUE, FALSE, FALSE,FALSE, FALSE, TRUE, FALSE);
#Printing the logicalVector.
cat("logicalVector: " , logicalVector, "\n")
Output:
logicalVector: TRUE FALSE FALSE FALSE FALSE TRUE FALSE
Removing the duplicates from vector
In R, the unique function is commonly used to eliminate duplicate values from a vector. In the context of data science, where R is frequently employed, it is crucial to ensure that the data being analyzed is of high quality and makes sense. Dealing with large volumes of data necessitates a focus on obtaining meaningful information.
Removing duplicates from a vector is a fundamental step during data cleaning and Exploratory Data Analysis (EDA). This process helps enhance the quality of the data by eliminating redundant or repeated values. The benefits of removing duplicates include obtaining consistent and reliable results, as well as avoiding unnecessary repetition in the dataset. In essence, this practice contributes to the overall reliability and usefulness of the data being analyzed in R for data science purposes.
Unique() Function
R Language provides unique() function which can be used to remove duplicates from the vector.
Using unique() on numerical vector
R
# Creating a vector with duplicates.
myVector <- c(1,2,2,3,4,5,6,6,5,7,9)
# Using the unique() function to remove duplicates.
uniqueVector <- unique(myVector)
#Print the result
print(uniqueVector)
Output:
[1] 1 2 3 4 5 6 7 9
Using unique() function on character vector
R
# Creating duplicated character vector.
duplicatedCharVec <- c("Anakin" , "anakin", "Anakin", "Luke", "Ashoka")
# Using unique() function to remove duplicates from duplicatedCharVec.
CharVec <- unique(duplicatedCharVec)
# Printing the CharVec.
print(CharVec)
Output:
[1] "Anakin" "anakin" "Luke" "Ashoka"
Duplicated() Function with indexing
The duplicated() function takes a vector as input and returns a logical vector of the same length, indicating whether each element is a duplicate (i.e has occurred previously in the vector).
Using duplicated() function along with indexing on numerical vector
R
# Example vector with duplicates.
myVector <- c(1,2,2,2,3,3,3,4,3,4,3,6)
# Removing duplicates using duplicated() and indexing.
uniqueVector <- myVector[!duplicated(myVector)]
# Printing the uniqueVector
print(uniqueVector)
Output:
[1] 1 2 3 4 6
Using duplicated() function along with indexing on character vector
R
# Example vector with duplicates.
myVector <- c("Anakin", "Luke","Anakin","Ezra","Darth Vader","Obi-Wan")
# Removing duplicates using duplicated() and indexing.
uniqueVector <- myVector[!duplicated(myVector)]
# Printing the uniqueVector
print(uniqueVector)
Output:
[1] "Anakin" "Luke" "Ezra" "Darth Vader" "Obi-Wan"
Using `dplyr` Package
- dplyr Package of R Language is used for data manipulation tasks, making code more readable and efficient.
- Various key functions provided by dplyr Package are as follows
- filter() : Filter rows based on specified conditions.
- select() : Select specific columns.
- mutate() : Add new variable or modify existing ones.
- arrange() : Reorder rows based on variable values.
- group_by() : Group data by one or more variables.
- summarize() : Summarize data, typically using aggregate functions.
- distinct() : Get distinct (unique) rows.
For further reference on dplyr Package in R follow : dplyr Package in R Programming
Using distinct() function of dplyr Package to remove duplicate values from numerical vector
R
# Install and load the dplyr package if not already installed
# install.packages("dplyr")
library(dplyr)
# Example vector with duplicates
myVector <- c(1,2,2,3,3,2,1,5,6)
# Remove duplicates using distinct() from dplyr
uniqueVector <- distinct(data.frame(value = myVector))$value
# Printing the value of uniqueVector
print(uniqueVector)
Output:
[1] 1 2 3 5 6
Using distinct() function of dplyr Package to remove duplicate values from character Vector
R
# Install and load the dplyr package if not already installed
# install.packages("dplyr")
library(dplyr)
# Example vector with duplicates
myVector <- c("Anakin","Ezra","Luke","Anakin")
# Remove duplicates using distinct() from dplyr
uniqueVector <- distinct(data.frame(value = myVector))$value
# Printing the value of uniqueVector
print(uniqueVector)
Output:
[1] "Anakin" "Ezra" "Luke"
Similar Reads
How to Remove Duplicates in Google Sheets
Google Sheets as a part of Google Workspace, is one of the popular cloud-based spreadsheet applications widely used for data management and analysis. It allows users to create and edit data on spreadsheets and enables us to share spreadsheets online which can be accessible from any device with inter
5 min read
How to Remove Duplicates in LibreOffice?
Removing duplicates in LibreOffice is a useful way to clean up your data and make sure you have accurate and unique information in your spreadsheets. Duplicates can cause confusion and errors in data analysis and reporting, so it's important to know how to remove them efficiently. LibreOffice Calc,
5 min read
How to Remove Duplicates From Array Using VBA in Excel?
Excel VBA code to remove duplicates from a given range of cells. In the below data set we have given a list of 15 numbers in âColumn Aâ range A1:A15. Need to remove duplicates and place unique numbers in column B. Sample Data: Cells A1:A15 Sample Data Final Output: VBA Code to remove duplicates and
2 min read
How to Erase Duplicates and Sort a Vector in C++?
In this article, we will learn how to remove duplicates and sort a vector in C++.The simplest method to remove the duplicates and sort the vector is by using sort() and unique() functions. Letâs take a look at an example:C++#include <bits/stdc++.h> using namespace std; int main() { vector<i
3 min read
How to Find and Remove Duplicates in Excel
Removing duplicates in Excel is essential when cleaning up data to ensure accuracy and avoid redundancy. Whether youâre working with small datasets or large spreadsheets, Excel provides built-in tools and methods to help you identify and remove duplicates effectively. This guide will walk you throug
9 min read
How To Remove A Column In R
R is a versatile language that is widely used in data analysis and statistical computing. A common task when working with data is removing one or more columns from a data frame. This guide will show you various methods to remove columns in R Programming Language using different approaches and provid
4 min read
How to create a new vector from a given vector in R
In this article, we will discuss How to create a new vector from a given vector in R Programming Language. Create a new vector from a given vectorYou can use various functions and techniques depending on your needs to create a new vector from a given vector in R. Here are some common methods. 1. Sub
2 min read
How to Split Vector and DataFrame in R
R is a programming language and environment specifically designed for facts analysis, statistical computing, and graphics. Sometimes it is required to split data into batches for various data manipulation and analysis tasks. In this article, we will discuss some techniques to split vectors into chun
6 min read
How to remove NA values with dplyr filter
In this article, we will examine various methods to remove NA values with dplyr filter by using R Programming Language. Remove NA values with the dplyr filterR language offers various methods to remove NA values with dplyr filter efficiently. By using these methods provided by R, it is possible to r
3 min read
How to find duplicate values in a factor in R
finding duplicates in data is an important step in data analysis and management to ensure data quality, accuracy, and efficiency. In this article, we will see several approaches to finding duplicate values in a factor in the R Programming Language. It can be done with two methods Using duplicated()
2 min read