How to Transform Data in R?
Last Updated :
25 Apr, 2025
Data transformation in R can be performed using the tidyverse
and dplyr
packages, which offer various methods for data manipulation. These packages can be easily installed and provide a range of techniques for data transformation.
Installing Required Packages
The tidyverse and dplyr package can be installed by install.packages() function.
R
install.packages("tidyverse")
install.packages("dplyr")
Method 1: Using Arrange() method
We will use the arrange() function to create an order for the sequence of the observations given. The arrange() method in the tidyverse package inputs a list of column names to rearrange them in a specified order. By default, the arrange() method arranges the data in ascending order.
Syntax: arrange(col-name)
Parameter:
- col-name - Name of the column.
Example 1:
We are creating a data frame with numeric and character columns, then arranging the data frame by the col1
values in ascending order using the arrange()
function from the tidyverse
package. We print both the original and the rearranged data frames.
R
library(tidyverse)
data_frame = data.frame(
col1 = c(2,4,1,7,5,3,5,8),
col2 = letters[1:8],
l3 = c(0,1,1,1,0,0,0,0))
rownames(data_frame) <- c("r1","r2","r3","r4","r5","r6","r7","r8")
print("Data Frame")
print(data_frame)
arr_data_frame <- data_frame %>% arrange(col1)
print("Arranged Data Frame")
print(arr_data_frame)
Output:
using arrange() functionExample 2:
We are creating a data frame with numeric and character columns, then arranging the data frame by col1
in descending order using the arrange()
function from the tidyverse
package. We print both the original and the rearranged data frames.
R
library(tidyverse)
data_frame = data.frame(
col1 = c(2,4,1,7,5,3,5,8),
col2 = letters[1:8],
col3 = c(0,1,1,1,0,0,0,0))
rownames(data_frame) <- c("r1","r2","r3","r4","r5","r6","r7","r8")
print("Data Frame")
print(data_frame)
arr_data_frame <- data_frame %>%
arrange(desc(col1))
print("Arranged Data Frame")
print(arr_data_frame)
Output:
Using arrange() functionMethod 2: Using select() method
We will use the select()
function from the tidyverse
package to fetch columns in the specified order. This method returns a subset of the data frame containing only the selected columns.
Syntax: select(list-of-col-names)
Parameter:
- list-of-col-names - List of column names separated by comma.
Example 1:
We are creating a data frame with four columns (col1
, col2
, col3
, col4
), then using the select()
function from the tidyverse
package to select only the col2
and col4
columns. The result is a subset of the original data frame, which is then printed.
R
library(tidyverse)
data_frame = data.frame(
col1 = c(2,4,1,7,5,3,5,8),
col2 = letters[1:8],
col3 = c(0,1,1,1,0,0,0,0),
col4 = c(9:16))
print("Data Frame")
print(data_frame)
arr_data_frame <- data_frame %>%
select(col2,col4)
print("Selecting col2 and col4 in Data Frame")
print(arr_data_frame)
Output:
Using select() functionExample 2:
We are creating a data frame with four columns (col1
, col2
, col3
, col4
), then using the select()
function from the tidyverse
package to select columns from col2
to col4
. The result is a subset of the original data frame containing only the selected columns, which is then printed.
R
library(tidyverse)
data_frame = data.frame(
col1 = c(2,4,1,7,5,3,5,8),
col2 = letters[1:8],
col3 = c(0,1,1,1,0,0,0,0),
col4 = c(9:16))
print("Data Frame")
print(data_frame)
arr_data_frame <- data_frame %>%
select(col2:col4)
print("Selecting col2 to col4 in Data Frame")
print(arr_data_frame)
Output:
Using select() functionMethod 3: Using filter() method
The filter() method in the tidyverse package is used to apply a range of constraints and conditions to the column values of the data frame. It filters the data and results in the smaller output returned by the column values satisfying the specified condition. The conditions are specified using the logical operators, and values are validated then.
Syntax: filter(cond1, cond2)
Parameter:
- cond1, cond2 - Condition to be applied on data.
Example 1:
We are creating a data frame with four columns (col1
, col2
, col3
, col4
), then using the filter()
function from the tidyverse
package to select rows where the value of col1
is greater than 4. The filtered data frame is then printed.
R
library(tidyverse)
data_frame = data.frame(
col1 = c(2,4,1,7,5,3,5,8),
col2 = letters[1:8],
col3 = c(0,1,1,1,0,0,0,0),
col4 = c(9:16))
print("Data Frame")
print(data_frame)
arr_data_frame <- data_frame %>%
filter(col1>4)
print("Selecting col1 >4 ")
print(arr_data_frame)
Output:
Using filter() function
Example 2:
We are creating a data frame with four columns (col1
, col2
, col3
, col4
), then using the filter()
function from the tidyverse
package to select rows where col3
contains either "there" or "this" using the %in%
operator. The filtered data frame is then printed.
R
library(tidyverse)
data_frame = data.frame(
col1 = c(2,4,1,7,5,3,5,8),
col2 = letters[1:8],
col3 = c("this","that","there","here","there","this","that","here"),
col4 = c(9:16))
print("Data Frame")
print(data_frame)
arr_data_frame <- data_frame %>%
filter(col3 %in% c("there", "this"))
print("Selecting col1>4 ")
print(arr_data_frame)
Output:
Using filter() functionExample 3:
We are creating a data frame with four columns (col1
, col2
, col3
, col4
), then using the filter()
function from the tidyverse
package to select rows where col3
is "there" and col1
is 5. The filtered data frame is then printed.
R
library(tidyverse)
data_frame = data.frame(
col1 = c(2,4,1,7,5,3,5,8),
col2 = letters[1:8],
col3 = c("this","that","there","here","there","this","that","here"),
col4 = c(9:16))
print("Data Frame")
print(data_frame)
arr_data_frame <- data_frame %>%
filter(col3=="there",col1==5)
print("Selecting col3 value is there and col1 is 5")
print(arr_data_frame)
Output:
Using filter() functionMethod 4: Using spread() method
The spread method is used to spread any key-value pair in multiple columns in the data frame. It is used to increase the readability of the data specified in the data frame. The data is rearranged according to the list of columns in the spread() method.
Syntax: spread(col-name)
Parameter:
- col-name - Name of one or more columns according to which data is to be structured.
Example 1:
We are creating a data frame with three columns (col1
, col2
, col3
), then using the spread()
function from the tidyverse
package to reshape the data by spreading col2
values into individual columns and filling the col3
values accordingly. The reshaped data frame is then printed.
R
library(tidyr)
data_frame = data.frame(
col1 = c("A","A","A","A","A","A",
"B","B","B","B","B","B"),
col2 = c("Eng","Phy","Chem","MAQ","Bio","SST",
"Eng","Phy","Chem","MAQ","Bio","SST"),
col3 = c(34,56,46,23,72,67,89,43,88,45,78,99)
)
print("Data Frame")
print(data_frame)
arr_data_frame <- data_frame %>%
spread(col2,col3)
print("Spread using col2 and col3")
print(arr_data_frame)
Output:
Using spread() functionExample 2:
We are creating a data frame with three columns (col1
, col2
, col3
), then using the spread()
function to reshape the data by turning unique values from col1
("A" and "B") into separate columns, and filling them with corresponding values from col3
, using col2
as the row identifier. The transformed data frame is then printed.
R
library(tidyr)
data_frame = data.frame(
col1 = c("A","A","A","A","A","A",
"B","B","B","B","B","B"),
col2 = c("Eng","Phy","Chem","MAQ","Bio","SST",
"Eng","Phy","Chem","MAQ","Bio","SST"),
col3 = c(34,56,46,23,72,67,89,43,88,45,78,99)
)
print("Data Frame")
print(data_frame)
arr_data_frame <- data_frame %>%
spread(col1,col3)
print("Spread using col1 and col3")
print(arr_data_frame)
Output:
Using spread() functionMethod 5: Using mutate() method
The mutate() method is used to create and modify new variables in the specified data frame. A new column name can be assigned to the data frame and evaluated to an expression where constants or column values can be used. The output data frame has the new columns created.
Syntax: mutate (new-col-name = expr)
Parameters:
- new-col-name - Name of column to be created.
- expr - Expression which is applied on new column.
Example:
We are creating a data frame with four columns and then using the mutate()
function to add two new columns: col5
(the sum of col1
and col4
) and col6
(col3
incremented by 1). The updated data frame with the new columns is then printed.
R
library(tidyverse)
data_frame = data.frame(
col1 = c(2,4,1,7,5,3,5,8),
col2 = letters[1:8],
col3 = c(0,1,1,1,0,0,0,0),
col4 = c(9:16))
print("Data Frame")
print(data_frame)
data_frame_mutate <- data_frame %>%
mutate(col5 = col1 + col4 ,
col6 = col3+1)
print("Mutated Data Frame")
print(data_frame_mutate)
Output:
Using mutate() functionMethod 6: Using group_by() and summarise() method
The group_by() and summarise() methods are used collectively to group by variables of the data frame and reduce multiple values down to a single value. It is used to make the data more readable. The column name can be specified in R's group_by() method.
Syntax: group_by(col-name)
Syntax: group_by(col,..) %>% summarise(action)
Example:
We are grouping the data frame by col3
and then using summarise()
to calculate the count of rows and the mean of col1
within each group. The resulting summary table is then printed.
R
library(dplyr)
data_frame = data.frame(
col1 = c(2,4,1,7,5,3,5,8),
col2 = letters[1:8],
col3 = c(0,1,1,1,0,0,0,0),
col4 = c(9:16))
print("Data Frame")
print(data_frame)
data_frame_mutate <- data_frame %>%
group_by(col3) %>%
summarise(
count = n(),
mean_col1 = mean(col1)
)
print("Mutated Data Frame")
print(data_frame_mutate)
Output:
Using groupby() and summarise() functionsMethod 7: Using the gather() method
The gather()
function to reshape the data by combining columns col2
to col4
into key-value pairs. The column names are stored under "Subject", and their corresponding values form a new column.
Syntax: gather(data, key, value)
Example:
We are using the gather()
function from the dplyr package to reshape the data frame from wide to long format. Columns Maths
, Physics
, and Chemistry
are combined into two columns: "Subject"
(holding the subject names) and "Marks"
(holding the corresponding values).
R
library(dplyr)
data_frame = data.frame(col1 =
c("Jack","Jill","Yash","Mallika",
"Muskan","Keshav","Meenu","Sanjay"),
Maths = c(26,47,14,73,65,83,95,48),
Physics = c(24,53,45,88,68,35,78,24),
Chemistry = c(67,23,79,67,33,66,25,78)
)
print("Data Frame")
print(data_frame)
data_frame_mutate <- data_frame %>%
gather("Subject","Marks",2:4)
print("Mutated Data Frame")
print(data_frame_mutate)
Output:
Using gather() functionIn this article, we explored how to reshape and transform data in R using functions like gather()
, spread()
, mutate()
, filter()
, and select()
from the tidyverse and dplyr packages. These functions make it easier to manipulate and analyze data efficiently by changing its structure to suit different analysis needs.
Similar Reads
Data visualization with R and ggplot2
The ggplot2 ( Grammar of Graphics ) is a free, open-source visualization package widely used in R Programming Language. It includes several layers on which it is governed. The layers are as follows:Layers with the grammar of graphicsData: The element is the data set itself.Aesthetics: The data is to
7 min read
Working with External Data
Basic Plotting with ggplot2
Plot Only One Variable in ggplot2 Plot in R
In this article, we will be looking at the two different methods to plot only one variable in the ggplot2 plot in the R programming language. Draw ggplot2 Plot Based On Only One Variable Using ggplot & nrow Functions In this approach to drawing a ggplot2 plot based on the only one variable, firs
5 min read
How to create a plot using ggplot2 with Multiple Lines in R ?
In this article, we will discuss how to create a plot using ggplot2 with multiple lines in the R programming language. Method 1: Using geom_line() function In this approach to create a ggplot with multiple lines, the user need to first install and import the ggplot2 package in the R console and then
3 min read
Plot Lines from a List of DataFrames using ggplot2 in R
For data visualization, the ggplot2 package is frequently used because it allows us to create a wide range of plots. To effectively display trends or patterns, we can combine multiple data frames to create a combined plot.Syntax: ggplot(data = NULL, mapping = aes(), colour())Parameters:data - Defaul
3 min read
How to plot a subset of a dataframe using ggplot2 in R ?
In this article, we will discuss plotting a subset of a data frame using ggplot2 in the R programming language. Dataframe in use: Â AgeScoreEnrollNo117700521880103177915419752051885256199630717903581971409188345 To get a complete picture, let us first draw a complete data frame. Example: R # Load ggp
9 min read
Change Theme Color in ggplot2 Plot in R
A theme in ggplot2 is a collection of settings that control the non-data elements of the plot. These settings include things like background colors, grid lines, axis labels, and text sizes. we can use various theme-related functions to customize the appearance of your plots, including changing theme
4 min read
Modify axis, legend, and plot labels using ggplot2 in R
In this article, we are going to see how to modify the axis labels, legend, and plot labels using ggplot2 bar plot in R programming language. For creating a simple bar plot we will use the function geom_bar( ). Syntax: geom_bar(stat, fill, color, width) Parameters :Â Â stat : Set the stat parameter to
5 min read
Common Geometric Objects (Geoms)
Comprehensive Guide to Scatter Plot using ggplot2 in R
Scatter plot uses dots to represent values for two different numeric variables and is used to observe relationships between those variables. To plot the Scatter plot we will use we will be using the geom_point() function. This function is available in ggplot2 package which is a free and open-source
7 min read
Line Plot using ggplot2 in R
In a line graph, we have the horizontal axis value through which the line will be ordered and connected using the vertical axis values. We are going to use the R package ggplot2 which has several layers in it. First, you need to install the ggplot2 package if it is not previously installed in R Stu
6 min read
R - Bar Charts
Bar charts provide an easy method of representing categorical data in the form of bars. The length or height of each bar represents the value of the category it represents. In R, bar charts are created using the function barplot(), and it can be applied both for vertical and horizontal charts.Syntax
4 min read
Histogram in R using ggplot2
A histogram is an approximate representation of the distribution of numerical data. In a histogram, each bar groups numbers into ranges. Taller bars show that more data falls in that range. It is used to display the shape and spread of continuous sample data.Plotting Histogram using ggplot2 in RWe c
5 min read
Box plot in R using ggplot2
A box plot is a graphical display of a data set which indicates its distribution and highlights potential outliers It displays the range of the data, the median, and the quartiles, making it easy to observe the spread and skewness of the data.In ggplot2, the geom_boxplot() function is used to create
5 min read
geom_area plot with areas and outlines in ggplot2 in R
An Area Plot helps us to visualize the variation in quantitative quantity with respect to some other quantity. It is simply a line chart where the area under the plot is colored/shaded. It is best used to study the trends of variation over a period of time, where we want to analyze the value of one
3 min read
Advanced Data Visualization Techniques
Combine two ggplot2 plots from different DataFrame in R
In this article, we are going to learn how to Combine two ggplot2 plots from different DataFrame in R Programming Language. Here in this article we are using a scatter plot, but it can be applied to any other plot. Let us first individually draw two ggplot2 Scatter Plots by different DataFrames then
2 min read
Annotating text on individual facet in ggplot2 in R
In this article, we will discuss how to annotate a text on the Individual facet in ggplot2 in R Programming Language. To plot facet in R programming language, we use the facet_grid() function from the ggplot2 library. The facet_grid() is used to form a matrix of panels defined by row and column face
5 min read
How to annotate a plot in ggplot2 in R ?
In this article, we will discuss how to annotate functions in R Programming Language in ggplot2 and also read the use cases of annotate. What is annotate?An annotate function in R can help the readability of a plot. It allows adding text to a plot or highlighting a specific portion of the curve. Th
4 min read
Annotate Text Outside of ggplot2 Plot in R
Ggplot2 is based on the grammar of graphics, the idea that you can build every graph from the same few components: a data set, a set of geomsâvisual marks that represent data points, and a coordinate system. There are many scenarios where we need to annotate outside the plot area or specific area as
2 min read
How to put text on different lines to ggplot2 plot in R?
ggplot2 is a plotting package in R programming language that is used to create complex plots from data specified in a data frame. It provides a more programmatic interface for specifying which variables to plot onto the graphical device, how they are displayed, and general visual properties. In thi
3 min read
How to Connect Paired Points with Lines in Scatterplot in ggplot2 in R?
In this article, we will discuss how to connect paired points in scatter plot in ggplot2 in R Programming Language. Scatter plots help us to visualize the change in two more categorical clusters of data. Sometimes, we need to work with paired quantitative variables and try to visualize their relatio
2 min read
How to highlight text inside a plot created by ggplot2 using a box in R?
In this article, we will discuss how to highlight text inside a plot created by ggplot2 using a box in R programming language. There are many ways to do this, but we will be focusing on one of the ways. We will be using the geom_label function present in the ggplot2 package in R. This function allo
3 min read
Adding labels, titles, and legends in r
Working with Legends in R using ggplot2
A legend in a plot helps us to understand which groups belong to each bar, line, or box based on its type, color, etc. We can add a legend box in R using the legend() function. These work as guides. The keys can be determined by scale breaks. In this article, we will be working with legends and asso
6 min read
How to Add Labels Directly in ggplot2 in R
Labels are textual entities that have information about the data point they are attached to which helps in determining the context of those data points. In this article, we will discuss how to directly add labels to ggplot2 in R programming language. To put labels directly in the ggplot2 plot we add
5 min read
How to change legend title in ggplot2 in R?
In this article, we will see how to change the legend title using ggplot2 in R Programming. We will use ScatterPlot. For the Data of Scatter Plot, we will pick some 20 random values for the X and Y axis both using rnorm() function which can generate random normal values, and here we have one more p
3 min read
How to change legend title in R using ggplot ?
A legend helps understand what the different plots on the same graph indicate. They basically provide labels or names for useful data depicted by graphs. In this article, we will discuss how legend names can be changed in R Programming Language. Let us first see what legend title appears by default.
2 min read
Customizing Visual Appearance
Handling Data Subsets: Faceting
How to create a faceted line-graph using ggplot2 in R ?
A potent visualization tool that enables us to investigate the relationship between two variables at various levels of a third-category variable is the faceted line graph. The ggplot2 tool in R offers a simple and versatile method for making faceted line graphs. This visual depiction improves our co
6 min read
How to Combine Multiple ggplot2 Plots in R?
In this article, we will discuss how to combine multiple ggplot2 plots in the R programming language. Combining multiple ggplot2 plots using '+' sign to the final plot In this method to combine multiple plots, here the user can add different types of plots or plots with different data to a single p
2 min read
Change Labels of GGPLOT2 Facet Plot in R
In this article, we will see How To Change Labels of ggplot2 Facet Plot in R Programming language. To create a ggplot2 plot, we have to load ggplot2 package. library() function is used for that. Then either create or load dataframe. Create a regular plot with facets. The labels are added by default
3 min read
Change Font Size of ggplot2 Facet Grid Labels in R
In this article, we will see how to change font size of ggplot2 Facet Grid Labels in R Programming Language. Let us first draw a regular plot without any changes so that the difference is apparent. Example: R library("ggplot2") DF <- data.frame(X = rnorm(20), Y = rnorm(20), group = c("Label 1",
2 min read
Remove Labels from ggplot2 Facet Plot in R
In this article, we will discuss how to remove the labels from the facet plot in ggplot2 in the R Programming language. Facet plots, where one subsets the data based on a categorical variable and makes a series of similar plots with the same scale. We can easily plot a facetted plot using the facet_
2 min read
Grouping Data: Dodge and Position Adjustments