Open In App

How to Transform Data in R?

Last Updated : 25 Apr, 2025
Comments
Improve
Suggest changes
Like Article
Like
Report

Data transformation in R can be performed using the tidyverse and dplyr packages, which offer various methods for data manipulation. These packages can be easily installed and provide a range of techniques for data transformation.

Installing Required Packages

The tidyverse and dplyr package can be installed by install.packages() function.

R
install.packages("tidyverse")
install.packages("dplyr")

Method 1: Using Arrange() method

We will use the arrange() function to create an order for the sequence of the observations given. The arrange() method in the tidyverse package inputs a list of column names to rearrange them in a specified order. By default, the arrange() method arranges the data in ascending order.

Syntax: arrange(col-name) 

Parameter:

  • col-name - Name of the column.

Example 1:

We are creating a data frame with numeric and character columns, then arranging the data frame by the col1 values in ascending order using the arrange() function from the tidyverse package. We print both the original and the rearranged data frames.

R
library(tidyverse)

data_frame = data.frame(
  col1 = c(2,4,1,7,5,3,5,8),
  col2 = letters[1:8],
  l3 = c(0,1,1,1,0,0,0,0))

rownames(data_frame) <- c("r1","r2","r3","r4","r5","r6","r7","r8")
print("Data Frame")
print(data_frame)

arr_data_frame <- data_frame %>% arrange(col1)
print("Arranged Data Frame")
print(arr_data_frame)

Output:

arrange1
using arrange() function

Example 2:

We are creating a data frame with numeric and character columns, then arranging the data frame by col1 in descending order using the arrange() function from the tidyverse package. We print both the original and the rearranged data frames.

R
library(tidyverse)

data_frame = data.frame(
  col1 = c(2,4,1,7,5,3,5,8),
  col2 = letters[1:8],
  col3 = c(0,1,1,1,0,0,0,0))


rownames(data_frame) <- c("r1","r2","r3","r4","r5","r6","r7","r8")
print("Data Frame")
print(data_frame)

arr_data_frame <- data_frame %>%
                  arrange(desc(col1)) 
                
print("Arranged Data Frame")
print(arr_data_frame)

Output:

arrange2
Using arrange() function

Method 2: Using select() method

We will use the select() function from the tidyverse package to fetch columns in the specified order. This method returns a subset of the data frame containing only the selected columns.

Syntax: select(list-of-col-names)

Parameter:

  • list-of-col-names - List of column names separated by comma.

Example 1:

We are creating a data frame with four columns (col1, col2, col3, col4), then using the select() function from the tidyverse package to select only the col2 and col4 columns. The result is a subset of the original data frame, which is then printed.

R
library(tidyverse)

data_frame = data.frame(
  col1 = c(2,4,1,7,5,3,5,8),
  col2 = letters[1:8],
  col3 = c(0,1,1,1,0,0,0,0),
  col4 = c(9:16))

print("Data Frame")
print(data_frame)


arr_data_frame <- data_frame %>%
                select(col2,col4)
              
print("Selecting col2 and col4 in Data Frame")
print(arr_data_frame)

Output:

select1
Using select() function

Example 2:

We are creating a data frame with four columns (col1, col2, col3, col4), then using the select() function from the tidyverse package to select columns from col2 to col4. The result is a subset of the original data frame containing only the selected columns, which is then printed.

R
library(tidyverse)

data_frame = data.frame(
  col1 = c(2,4,1,7,5,3,5,8),
  col2 = letters[1:8],
  col3 = c(0,1,1,1,0,0,0,0),
  col4 = c(9:16))

print("Data Frame")
print(data_frame)

arr_data_frame <- data_frame %>% 
  select(col2:col4)
  
print("Selecting col2 to col4 in Data Frame")
print(arr_data_frame)

Output:

select-2
Using select() function

Method 3: Using filter() method

The filter() method in the tidyverse package is used to apply a range of constraints and conditions to the column values of the data frame. It filters the data and results in the smaller output returned by the column values satisfying the specified condition. The conditions are specified using the logical operators, and values are validated then. 

Syntax: filter(cond1, cond2)

Parameter:

  • cond1, cond2 - Condition to be applied on data.

Example 1:

We are creating a data frame with four columns (col1, col2, col3, col4), then using the filter() function from the tidyverse package to select rows where the value of col1 is greater than 4. The filtered data frame is then printed.

R
library(tidyverse)

data_frame = data.frame(
  col1 = c(2,4,1,7,5,3,5,8),
  col2 = letters[1:8],
  col3 = c(0,1,1,1,0,0,0,0),
  col4 = c(9:16))

print("Data Frame")
print(data_frame)

arr_data_frame <- data_frame %>%
    filter(col1>4)
  
print("Selecting col1 >4 ")
print(arr_data_frame)

Output:

filer1
Using filter() function


Example 2:

We are creating a data frame with four columns (col1, col2, col3, col4), then using the filter() function from the tidyverse package to select rows where col3 contains either "there" or "this" using the %in% operator. The filtered data frame is then printed.

R
library(tidyverse)

data_frame = data.frame(
  col1 = c(2,4,1,7,5,3,5,8),
  col2 = letters[1:8],
  col3 = c("this","that","there","here","there","this","that","here"),
  col4 = c(9:16))

print("Data Frame")
print(data_frame)


arr_data_frame <- data_frame %>% 
  filter(col3 %in% c("there", "this"))

print("Selecting col1>4 ")
print(arr_data_frame)

Output:

filter2
Using filter() function

Example 3:

We are creating a data frame with four columns (col1, col2, col3, col4), then using the filter() function from the tidyverse package to select rows where col3 is "there" and col1 is 5. The filtered data frame is then printed.

R
library(tidyverse)

data_frame = data.frame(
  col1 = c(2,4,1,7,5,3,5,8),
  col2 = letters[1:8],
  col3 = c("this","that","there","here","there","this","that","here"),
  col4 = c(9:16))

print("Data Frame")
print(data_frame)

arr_data_frame <- data_frame %>%
    filter(col3=="there",col1==5)
  
print("Selecting col3 value is there and col1 is 5")
print(arr_data_frame)

Output:

filter-3
Using filter() function

Method 4: Using spread() method

The spread method is used to spread any key-value pair in multiple columns in the data frame. It is used to increase the readability of the data specified in the data frame. The data is rearranged according to the list of columns in the spread() method.

Syntax: spread(col-name)

Parameter:

  • col-name - Name of one or more columns according to which data is to be structured.

Example 1:

We are creating a data frame with three columns (col1, col2, col3), then using the spread() function from the tidyverse package to reshape the data by spreading col2 values into individual columns and filling the col3 values accordingly. The reshaped data frame is then printed.

R
library(tidyr)

data_frame = data.frame(
  col1 = c("A","A","A","A","A","A",
           "B","B","B","B","B","B"),
  col2 = c("Eng","Phy","Chem","MAQ","Bio","SST",
           "Eng","Phy","Chem","MAQ","Bio","SST"),
  col3 = c(34,56,46,23,72,67,89,43,88,45,78,99)
  )

print("Data Frame")
print(data_frame)

arr_data_frame <- data_frame %>%
    spread(col2,col3)
  
print("Spread using col2 and col3")
print(arr_data_frame)

Output:

spread-1
Using spread() function

Example 2:

We are creating a data frame with three columns (col1, col2, col3), then using the spread() function to reshape the data by turning unique values from col1 ("A" and "B") into separate columns, and filling them with corresponding values from col3, using col2 as the row identifier. The transformed data frame is then printed.

R
library(tidyr)

data_frame = data.frame(
  col1 = c("A","A","A","A","A","A",
           "B","B","B","B","B","B"),
  col2 = c("Eng","Phy","Chem","MAQ","Bio","SST",
           "Eng","Phy","Chem","MAQ","Bio","SST"),
  col3 = c(34,56,46,23,72,67,89,43,88,45,78,99)
)

print("Data Frame")
print(data_frame)

arr_data_frame <- data_frame %>%
    spread(col1,col3)
  
print("Spread using col1 and col3")
print(arr_data_frame)

Output:

spread-2
Using spread() function

Method 5: Using mutate() method

The mutate() method is used to create and modify new variables in the specified data frame. A new column name can be assigned to the data frame and evaluated to an expression where constants or column values can be used. The output data frame has the new columns created.

Syntax: mutate (new-col-name = expr)

Parameters:

  • new-col-name - Name of column to be created.
  • expr -  Expression which is applied on new column.

Example:

We are creating a data frame with four columns and then using the mutate() function to add two new columns: col5 (the sum of col1 and col4) and col6 (col3 incremented by 1). The updated data frame with the new columns is then printed.

R
library(tidyverse)

data_frame = data.frame(
  col1 = c(2,4,1,7,5,3,5,8),
  col2 = letters[1:8],
  col3 = c(0,1,1,1,0,0,0,0),
  col4 = c(9:16))

print("Data Frame")
print(data_frame)

data_frame_mutate <- data_frame %>% 
  mutate(col5 = col1 + col4 ,
         col6 = col3+1)
         
print("Mutated Data Frame")
print(data_frame_mutate)

Output:

mutate
Using mutate() function

Method 6: Using group_by() and summarise() method

The group_by() and summarise() methods are used collectively to group by variables of the data frame and reduce multiple values down to a single value. It is used to make the data more readable. The column name can be specified in R's group_by() method.

Syntax: group_by(col-name) 

Syntax: group_by(col,..) %>% summarise(action)

Example:

We are grouping the data frame by col3 and then using summarise() to calculate the count of rows and the mean of col1 within each group. The resulting summary table is then printed.

R
library(dplyr)

data_frame = data.frame(
  col1 = c(2,4,1,7,5,3,5,8),
  col2 = letters[1:8],
  col3 = c(0,1,1,1,0,0,0,0),
  col4 = c(9:16))

print("Data Frame")
print(data_frame)

data_frame_mutate <- data_frame %>% 
    group_by(col3) %>%
  summarise(
    count = n(),
    mean_col1 = mean(col1)
  )

print("Mutated Data Frame")
print(data_frame_mutate)

Output:

groupby-and-summarise
Using groupby() and summarise() functions

Method 7: Using the gather() method

The gather() function to reshape the data by combining columns col2 to col4 into key-value pairs. The column names are stored under "Subject", and their corresponding values form a new column.

Syntax: gather(data, key, value)

Example:

We are using the gather() function from the dplyr package to reshape the data frame from wide to long format. Columns Maths, Physics, and Chemistry are combined into two columns: "Subject" (holding the subject names) and "Marks" (holding the corresponding values).

R
library(dplyr)

data_frame = data.frame(col1 = 
  c("Jack","Jill","Yash","Mallika",
    "Muskan","Keshav","Meenu","Sanjay"),
     Maths = c(26,47,14,73,65,83,95,48),
     Physics = c(24,53,45,88,68,35,78,24),
     Chemistry = c(67,23,79,67,33,66,25,78)
     )

print("Data Frame")
print(data_frame)

data_frame_mutate <- data_frame %>%
    gather("Subject","Marks",2:4)
  
print("Mutated Data Frame")
print(data_frame_mutate)

Output:

gather
Using gather() function

In this article, we explored how to reshape and transform data in R using functions like gather(), spread(), mutate(), filter(), and select() from the tidyverse and dplyr packages. These functions make it easier to manipulate and analyze data efficiently by changing its structure to suit different analysis needs.


Next Article
Article Tags :

Similar Reads