How to split a big dataframe into smaller ones in R?
Last Updated :
23 Sep, 2022
In this article, we are going to learn how to split and write very large data frames into slices in the R programming language.
Introduction
We know we have to deal with large data frames, and that is something which is not easy, So to deal with such large data frames, it is very much helpful to split big data frames into many smaller ones. We often use split functions to do the task. To split very large data frames, there are various steps let's have a look at that.
Stepwise Implementation:
Step 1: Let's take a data frame on which we are going to apply the split operation to break it into small chunks.
P Q R
SP1 2012-01 123
SP2 2022-01 143
SP3 2022-01 342
SP1 2022-02 542
SP2 2022-02 876
SP3 2022-02 982
SP1 2022-03 884
SP2 2022-03 936
SP3 2022-03 987
Step 2: Now, in this step, we need something which returns the data into the form of a table, and for that, we will use read.table() function. read.table() function is used to read the data from a text file, and then it returns the data in the form of a table. There are various arguments supported by this function, such as text files, headers, etc.
Syntax: read.table(filename, header = FALSE, sep = “”)
Parameters:
header: represents if the file contains header row or not.
sep: represents the delimiter value used in file.
R
# Reading data in the form
# of table
df <-read.table(text=
"P Q R
SP1 2012-01 123
SP2 2022-01 143
SP3 2022-01 342
SP1 2022-02 542
SP2 2022-02 876
SP3 2022-02 982
SP1 2022-03 884
SP2 2022-03 936
SP3 2022-03 987",
header = TRUE)
# Printing original data frame
print(df)
Output:
P Q R
1 SP1 2012-01 123
2 SP2 2022-01 143
3 SP3 2022-01 342
4 SP1 2022-02 542
5 SP2 2022-02 876
6 SP3 2022-02 982
7 SP1 2022-03 884
8 SP2 2022-03 936
9 SP3 2022-03 987
Step 3: In this step, we will split the data frames into smaller ones, and for that, we have to use the split() function. It is a built-in R function that divides the vector or data frame into smaller groups according to the function’s parameters.
Syntax: split(x, f, drop = FALSE)
Parameters:
x: represents data vector or data frame
f: represents factor to divide the data
drop: represents logical value which indicates if levels that do not occur should be dropped
We need to create some new data frames using the content of any column i.e., Q and P. We will be using the content of column Q, and after that, name the data frames too; below is the code and screenshot referring to how to make a new data frame using the split function, name it and print the new data frame, Below used df1 is the name of the new data frame.
R
df1 = split(df,df$Q)
# Printing splitted data frame
print(df1)
Output:
$`2012-01`
P Q R
1 SP1 2012-01 123
$`2022-01`
P Q R
2 SP2 2022-01 143
3 SP3 2022-01 342
$`2022-02`
P Q R
4 SP1 2022-02 542
5 SP2 2022-02 876
6 SP3 2022-02 982
$`2022-03`
P Q R
7 SP1 2022-03 884
8 SP2 2022-03 936
9 SP3 2022-03 987
Step 4: In this step, we will create a new data frame using column P's content and naming it df2. Below code and screenshot refers to how to make a new data frame using the split() function, name it and print the new data frame, Below used df2 is the name of the new data frame.
R
df2 = split(df,df$P)
# Printing splitted data frame
print(df2)
$SP1
P Q R
1 SP1 2012-01 123
4 SP1 2022-02 542
7 SP1 2022-03 884
$SP2
P Q R
2 SP2 2022-01 143
5 SP2 2022-02 876
8 SP2 2022-03 936
$SP3
P Q R
3 SP3 2022-01 342
6 SP3 2022-02 982
9 SP3 2022-03 987
We can see from the output that SP1, SP2, and SP3 are separated, and that's how we can split the large data frames into smaller ones.
Similar Reads
How to select a subset of DataFrame in R In general, when we were working on larger dataframes, we will be only interested in a small portion of it for analyzing it instead of considering all the rows and columns present in the dataframe. Creation of Sample Dataset Let's create a sample dataframe of Students as follows R student_details
2 min read
How to plot a subset of a dataframe in R ? In this article, we will learn multiple approaches to plotting a subset of a Dataframe in R Programming Language. Here we will be using, R language's inbuilt "USArrests" dataset. Method 1: Using subset() function In this method, first a subset of the data is created base don some condition, and then
2 min read
How to Split Vector and DataFrame in R R is a programming language and environment specifically designed for facts analysis, statistical computing, and graphics. Sometimes it is required to split data into batches for various data manipulation and analysis tasks. In this article, we will discuss some techniques to split vectors into chun
6 min read
How to re-partition pyspark dataframe in Python Are you a data science or machine learning enthusiast who likes to play with data? Have you ever got the need to repartition the Pyspark dataset you got? Got confused, about how to fulfill the demand? Don't worry! In this article, we will discuss the re-partitioning of the Pyspark data frame in Pyth
3 min read
Split Spark DataFrame based on condition in Python In this article, we are going to learn how to split data frames based on conditions using Pyspark in Python. Spark data frames are a powerful tool for working with large datasets in Apache Spark. They allow to manipulate and analyze data in a structured way, using SQL-like operations. Sometimes, we
5 min read
How to add multiple columns to a data.frame in R? In R Language adding multiple columns to a data.frame can be done in several ways. Below, we will explore different methods to accomplish this, using some practical examples. We will use the base R approach, as well as the dplyr package from the tidyverse collection of packages.Understanding Data Fr
4 min read