How to Transform Data in R?

Last Updated : 25 Apr, 2025

Data transformation in R can be performed using the tidyverse and dplyr packages, which offer various methods for data manipulation. These packages can be easily installed and provide a range of techniques for data transformation.

Installing Required Packages

The tidyverse and dplyr package can be installed by install.packages() function.

install.packages("tidyverse")
install.packages("dplyr")

Method 1: Using Arrange() method

We will use the arrange() function to create an order for the sequence of the observations given. The arrange() method in the tidyverse package inputs a list of column names to rearrange them in a specified order. By default, the arrange() method arranges the data in ascending order.

Syntax: arrange(col-name)
Parameter:
col-name - Name of the column.

Example 1:

We are creating a data frame with numeric and character columns, then arranging the data frame by the col1 values in ascending order using the arrange() function from the tidyverse package. We print both the original and the rearranged data frames.

library(tidyverse)

data_frame = data.frame(
  col1 = c(2,4,1,7,5,3,5,8),
  col2 = letters[1:8],
  l3 = c(0,1,1,1,0,0,0,0))

rownames(data_frame) <- c("r1","r2","r3","r4","r5","r6","r7","r8")
print("Data Frame")
print(data_frame)

arr_data_frame <- data_frame %>% arrange(col1)
print("Arranged Data Frame")
print(arr_data_frame)

Output:

Example 2:

We are creating a data frame with numeric and character columns, then arranging the data frame by col1 in descending order using the arrange() function from the tidyverse package. We print both the original and the rearranged data frames.

library(tidyverse)

data_frame = data.frame(
  col1 = c(2,4,1,7,5,3,5,8),
  col2 = letters[1:8],
  col3 = c(0,1,1,1,0,0,0,0))


rownames(data_frame) <- c("r1","r2","r3","r4","r5","r6","r7","r8")
print("Data Frame")
print(data_frame)

arr_data_frame <- data_frame %>%
                  arrange(desc(col1)) 
                
print("Arranged Data Frame")
print(arr_data_frame)

Output:

Method 2: Using select() method

We will use the select() function from the tidyverse package to fetch columns in the specified order. This method returns a subset of the data frame containing only the selected columns.

Syntax: select(list-of-col-names)
Parameter:
list-of-col-names - List of column names separated by comma.

Example 1:

We are creating a data frame with four columns (col1, col2, col3, col4), then using the select() function from the tidyverse package to select only the col2 and col4 columns. The result is a subset of the original data frame, which is then printed.

library(tidyverse)

data_frame = data.frame(
  col1 = c(2,4,1,7,5,3,5,8),
  col2 = letters[1:8],
  col3 = c(0,1,1,1,0,0,0,0),
  col4 = c(9:16))

print("Data Frame")
print(data_frame)


arr_data_frame <- data_frame %>%
                select(col2,col4)
              
print("Selecting col2 and col4 in Data Frame")
print(arr_data_frame)

Output:

Example 2:

We are creating a data frame with four columns (col1, col2, col3, col4), then using the select() function from the tidyverse package to select columns from col2 to col4. The result is a subset of the original data frame containing only the selected columns, which is then printed.

library(tidyverse)

data_frame = data.frame(
  col1 = c(2,4,1,7,5,3,5,8),
  col2 = letters[1:8],
  col3 = c(0,1,1,1,0,0,0,0),
  col4 = c(9:16))

print("Data Frame")
print(data_frame)

arr_data_frame <- data_frame %>% 
  select(col2:col4)
  
print("Selecting col2 to col4 in Data Frame")
print(arr_data_frame)

Output:

Method 3: Using filter() method

The filter() method in the tidyverse package is used to apply a range of constraints and conditions to the column values of the data frame. It filters the data and results in the smaller output returned by the column values satisfying the specified condition. The conditions are specified using the logical operators, and values are validated then.

Syntax: filter(cond1, cond2)

Parameter:

cond1, cond2 - Condition to be applied on data.

Example 1:

We are creating a data frame with four columns (col1, col2, col3, col4), then using the filter() function from the tidyverse package to select rows where the value of col1 is greater than 4. The filtered data frame is then printed.

library(tidyverse)

data_frame = data.frame(
  col1 = c(2,4,1,7,5,3,5,8),
  col2 = letters[1:8],
  col3 = c(0,1,1,1,0,0,0,0),
  col4 = c(9:16))

print("Data Frame")
print(data_frame)

arr_data_frame <- data_frame %>%
    filter(col1>4)
  
print("Selecting col1 >4 ")
print(arr_data_frame)

Output:

Example 2:

We are creating a data frame with four columns (col1, col2, col3, col4), then using the filter() function from the tidyverse package to select rows where col3 contains either "there" or "this" using the %in% operator. The filtered data frame is then printed.

library(tidyverse)

data_frame = data.frame(
  col1 = c(2,4,1,7,5,3,5,8),
  col2 = letters[1:8],
  col3 = c("this","that","there","here","there","this","that","here"),
  col4 = c(9:16))

print("Data Frame")
print(data_frame)


arr_data_frame <- data_frame %>% 
  filter(col3 %in% c("there", "this"))

print("Selecting col1>4 ")
print(arr_data_frame)

Output:

Example 3:

We are creating a data frame with four columns (col1, col2, col3, col4), then using the filter() function from the tidyverse package to select rows where col3 is "there" and col1 is 5. The filtered data frame is then printed.

library(tidyverse)

data_frame = data.frame(
  col1 = c(2,4,1,7,5,3,5,8),
  col2 = letters[1:8],
  col3 = c("this","that","there","here","there","this","that","here"),
  col4 = c(9:16))

print("Data Frame")
print(data_frame)

arr_data_frame <- data_frame %>%
    filter(col3=="there",col1==5)
  
print("Selecting col3 value is there and col1 is 5")
print(arr_data_frame)

Output:

Method 4: Using spread() method

The spread method is used to spread any key-value pair in multiple columns in the data frame. It is used to increase the readability of the data specified in the data frame. The data is rearranged according to the list of columns in the spread() method.

Syntax: spread(col-name)
Parameter:
col-name - Name of one or more columns according to which data is to be structured.

Example 1:

We are creating a data frame with three columns (col1, col2, col3), then using the spread() function from the tidyverse package to reshape the data by spreading col2 values into individual columns and filling the col3 values accordingly. The reshaped data frame is then printed.

library(tidyr)

data_frame = data.frame(
  col1 = c("A","A","A","A","A","A",
           "B","B","B","B","B","B"),
  col2 = c("Eng","Phy","Chem","MAQ","Bio","SST",
           "Eng","Phy","Chem","MAQ","Bio","SST"),
  col3 = c(34,56,46,23,72,67,89,43,88,45,78,99)
  )

print("Data Frame")
print(data_frame)

arr_data_frame <- data_frame %>%
    spread(col2,col3)
  
print("Spread using col2 and col3")
print(arr_data_frame)

Output:

Example 2:

We are creating a data frame with three columns (col1, col2, col3), then using the spread() function to reshape the data by turning unique values from col1 ("A" and "B") into separate columns, and filling them with corresponding values from col3, using col2 as the row identifier. The transformed data frame is then printed.

library(tidyr)

data_frame = data.frame(
  col1 = c("A","A","A","A","A","A",
           "B","B","B","B","B","B"),
  col2 = c("Eng","Phy","Chem","MAQ","Bio","SST",
           "Eng","Phy","Chem","MAQ","Bio","SST"),
  col3 = c(34,56,46,23,72,67,89,43,88,45,78,99)
)

print("Data Frame")
print(data_frame)

arr_data_frame <- data_frame %>%
    spread(col1,col3)
  
print("Spread using col1 and col3")
print(arr_data_frame)

Output:

Method 5: Using mutate() method

The mutate() method is used to create and modify new variables in the specified data frame. A new column name can be assigned to the data frame and evaluated to an expression where constants or column values can be used. The output data frame has the new columns created.

Syntax: mutate (new-col-name = expr)

Parameters:

new-col-name - Name of column to be created.
expr - Expression which is applied on new column.

Example:

We are creating a data frame with four columns and then using the mutate() function to add two new columns: col5 (the sum of col1 and col4) and col6 (col3 incremented by 1). The updated data frame with the new columns is then printed.

library(tidyverse)

data_frame = data.frame(
  col1 = c(2,4,1,7,5,3,5,8),
  col2 = letters[1:8],
  col3 = c(0,1,1,1,0,0,0,0),
  col4 = c(9:16))

print("Data Frame")
print(data_frame)

data_frame_mutate <- data_frame %>% 
  mutate(col5 = col1 + col4 ,
         col6 = col3+1)
         
print("Mutated Data Frame")
print(data_frame_mutate)

Output:

Method 6: Using group_by() and summarise() method

The group_by() and summarise() methods are used collectively to group by variables of the data frame and reduce multiple values down to a single value. It is used to make the data more readable. The column name can be specified in R's group_by() method.

Syntax: group_by(col-name)
Syntax: group_by(col,..) %>% summarise(action)

Example:

We are grouping the data frame by col3 and then using summarise() to calculate the count of rows and the mean of col1 within each group. The resulting summary table is then printed.

library(dplyr)

data_frame = data.frame(
  col1 = c(2,4,1,7,5,3,5,8),
  col2 = letters[1:8],
  col3 = c(0,1,1,1,0,0,0,0),
  col4 = c(9:16))

print("Data Frame")
print(data_frame)

data_frame_mutate <- data_frame %>% 
    group_by(col3) %>%
  summarise(
    count = n(),
    mean_col1 = mean(col1)
  )

print("Mutated Data Frame")
print(data_frame_mutate)

Output:

groupby-and-summarise — Using groupby() and summarise() functions

Method 7: Using the gather() method

The gather() function to reshape the data by combining columns col2 to col4 into key-value pairs. The column names are stored under "Subject", and their corresponding values form a new column.

Syntax: gather(data, key, value)

Example:

We are using the gather() function from the dplyr package to reshape the data frame from wide to long format. Columns Maths, Physics, and Chemistry are combined into two columns: "Subject" (holding the subject names) and "Marks" (holding the corresponding values).

library(dplyr)

data_frame = data.frame(col1 = 
  c("Jack","Jill","Yash","Mallika",
    "Muskan","Keshav","Meenu","Sanjay"),
     Maths = c(26,47,14,73,65,83,95,48),
     Physics = c(24,53,45,88,68,35,78,24),
     Chemistry = c(67,23,79,67,33,66,25,78)
     )

print("Data Frame")
print(data_frame)

data_frame_mutate <- data_frame %>%
    gather("Subject","Marks",2:4)
  
print("Mutated Data Frame")
print(data_frame_mutate)

Output:

In this article, we explored how to reshape and transform data in R using functions like gather(), spread(), mutate(), filter(), and select() from the tidyverse and dplyr packages. These functions make it easier to manipulate and analyze data efficiently by changing its structure to suit different analysis needs.

How to Plot a Smooth Line using ggplot2 in R ?

yashchuahan

Improve

Article Tags :

R Language

How to Transform Data in R?

Installing Required Packages

Method 1: Using Arrange() method

Example 1:

Example 2:

Method 2: Using select() method

Example 1:

Example 2:

Method 3: Using filter() method

Example 1:

Example 2:

Example 3:

Method 4: Using spread() method

Example 1:

Example 2:

Method 5: Using mutate() method

Example:

Method 6: Using group_by() and summarise() method

Example:

Method 7: Using the gather() method

Example:

Similar Reads

Introduction to ggplot2

Working with External Data

Basic Plotting with ggplot2

Common Geometric Objects (Geoms)

Advanced Data Visualization Techniques

Adding labels, titles, and legends in r

Customizing Visual Appearance

Handling Data Subsets: Faceting

Grouping Data: Dodge and Position Adjustments

Thank You!

What kind of Experience do you want to share?