How to select a subset of DataFrame in R
Last Updated :
12 Jul, 2022
In general, when we were working on larger dataframes, we will be only interested in a small portion of it for analyzing it instead of considering all the rows and columns present in the dataframe.
Creation of Sample Dataset
Let's create a sample dataframe of Students as follows
R
student_details < -data.frame(
stud_id=c(1: 10),
stud_name=c("Anu", "Abhi", "Bob",
"Charan", "Chandu",
"Daniel", "Girish", "Harish",
"Pandit", "Suchith"),
age=c(18, 19, 17, 18, 19, 15, 21,
16, 15, 17),
section=c(1, 2, 1, 2, 1, 1, 2, 1,
2, 1)
)
print(student_details)
Output:
Method 1. Using Index Slicing
This method is used when the analyst was aware of the row/ column numbers to extract from the main dataset and create a subset from them for easy analysis. The numbers given to those rows or columns are called Index(s).
Syntax: dataframe[rows,columns]
Example: To make a subset of the dataframe of the first five rows and the second and fourth column
R
subset_1<-student_details[c(1:5),c(2,4)]
print(subset_1)
Output:
Method 2. Using subset() function
When the analyst is aware of row names and column names then subset() method is used. Simply, This function is used when we want to derive a subset of a dataframe based on implanting some conditions on rows and columns of the dataframe. This method is more efficient and easy to use than the Index method.
Syntax: subset(dataframe,rows_condition,column_condition)
Example: Extract names of students belonging to section1
R
subset_2=subset(student_details,section==1,stud_name)
print(subset_2)
Output:
Method 3. Using dplyr package functions
In the filter()- this function is used when we want to derive a subset of the dataframe based on a specific condition.
This method is used when analysts want to derive a subset based on some condition either on rows or columns or both using row and column names. Among above mentioned three methods this method is efficient than the other two.
Syntax: filter(dataframe,condition)
Note: Make sure you installed dplyr package in the Workspace Environment using commands
install.packages("dplyr") -To install
library(dplyr) - To load
Example: Let's extract rows that contain student names starting with the letter C.
R
library(dplyr)
subset_3 < -filter(student_details,
startsWith(stud_name, 'C'))
print(subset_3)
Output:
Similar Reads
How to select a subset of a DataFrame? We often work with subsets of a dataset, whether extracting specific columns, filtering rows based on conditions, or both. In this guide, weâll explore various ways to select subsets of data using the pandas library in Python. All examples use the nba.csv dataset.Pythonimport pandas as pd df = pd.re
4 min read
How to plot a subset of a dataframe in R ? In this article, we will learn multiple approaches to plotting a subset of a Dataframe in R Programming Language. Here we will be using, R language's inbuilt "USArrests" dataset. Method 1: Using subset() function In this method, first a subset of the data is created base don some condition, and then
2 min read
Select Subset of DataTable Columns in R In this article, we will discuss how to select a subset of data table columns in R programming language. Let's create a data table using a matrix. First, we need to load data.table package in the working space. Installation install.packages("data.table") Â Â Â Â Â Â Â Â Â Â Â Â Â Loading library("dat
2 min read
How to plot a subset of a dataframe using ggplot2 in R ? In this article, we will discuss plotting a subset of a data frame using ggplot2 in the R programming language. Dataframe in use: Â AgeScoreEnrollNo117700521880103177915419752051885256199630717903581971409188345 To get a complete picture, let us first draw a complete data frame. Example: R # Load ggp
9 min read
Subset Dataframe Rows Based On Factor Levels in R In this article, we will be discussing how to subset a given dataframe rows based on different factor levels with the help of some operators in the R programming language. Method 1: Subset dataframe Rows Based On One Factor Levels In this approach to subset dataframe rows based on one-factor levels,
2 min read
How to Write a Loop to Run the t-Test of a Data Frame in R In statistical analysis, the t-test is used to compare the means of two groups to determine whether there is a significant difference between them. Often, you may need to run t-tests for multiple variables in a data frame. Writing a loop in R allows you to automate this process, which is especially
4 min read