Execute SQL queries on a dataframe using R
Last Updated :
24 Apr, 2025
In R Programming Language We can use the sqldf package in R to execute SQL queries on a data frame. This can be useful for performing various data manipulation tasks using SQL syntax. The sqldf package provides a way to write SQL queries as strings and apply them to a data frame, allowing us to perform operations such as filtering, sorting, aggregation, joining, and more. In this article, we will explore how we can perform this.
To use the sqldf package, you first need to install it. install it by the below command
install.packages("sqldf")
Now let's see some common SQL operations we can perform on a data frame using the sqldf package:
- Filtering
- Sorting
- Aggregation
- Joining
- Grouping
- Subsetting
- Updating
- Deleting
We will be performing above operation on below data frame.
R
# Create vectors for different columns
ids <- c(1, 2, 3, 4, 5)
names <- c("Alice", "Bob", "Charlie", "David", "Eve")
years_of_exp <- c(5, 8, 3, 10, 6)
roles <- c("Engineer", "Manager", "Analyst", "Director", "Developer")
# Create a dataframe
df <- data.frame(
id = ids,
name = names,
year_of_exp = years_of_exp,
role = roles,
stringsAsFactors = FALSE # Prevent strings from being converted to factors
)
df
Output:
id name year_of_exp role
1 Alice 5 Engineer
2 Bob 8 Manager
3 Charlie 3 Analyst
4 David 10 Director
5 Eve 6 Developer
Filtering
In below example we are Selecting rows where year_of_exp is greater than 5.
R
library(sqldf)
# Create vectors for different columns
ids <- c(1, 2, 3, 4, 5)
names <- c("Alice", "Bob", "Charlie", "David", "Eve")
years_of_exp <- c(5, 8, 3, 10, 6)
roles <- c("Engineer", "Manager", "Analyst", "Director", "Developer")
# Create a dataframe
df <- data.frame(
id = ids,
name = names,
year_of_exp = years_of_exp,
role = roles,
stringsAsFactors = FALSE # Prevent strings from being converted to factors
)
result <- sqldf("SELECT * FROM df WHERE year_of_exp > 5")
print(result)
Output:
id name year_of_exp role
1 2 Bob 8 Manager
2 4 David 10 Director
3 5 Eve 6 Developer
Sorting
In this example we will Order the dataframe by year_of_exp in descending order.
R
library(sqldf)
# Create vectors for different columns
ids <- c(1, 2, 3, 4, 5)
names <- c("Alice", "Bob", "Charlie", "David", "Eve")
years_of_exp <- c(5, 8, 3, 10, 6)
roles <- c("Engineer", "Manager", "Analyst", "Director", "Developer")
# Create a dataframe
df <- data.frame(
id = ids,
name = names,
year_of_exp = years_of_exp,
role = roles,
stringsAsFactors = FALSE # Prevent strings from being converted to factors
)
result <- sqldf("SELECT * FROM df ORDER BY year_of_exp DESC")
print(result)
Output:
id name year_of_exp role
1 4 David 10 Director
2 2 Bob 8 Manager
3 5 Eve 6 Developer
4 1 Alice 5 Engineer
5 3 Charlie 3 Analyst
Aggregation
Calculating the average years of experience.
R
library(sqldf)
# Create vectors for different columns
ids <- c(1, 2, 3, 4, 5)
names <- c("Alice", "Bob", "Charlie", "David", "Eve")
years_of_exp <- c(5, 8, 3, 10, 6)
roles <- c("Engineer", "Manager", "Analyst", "Director", "Developer")
# Create a dataframe
df <- data.frame(
id = ids,
name = names,
year_of_exp = years_of_exp,
role = roles,
stringsAsFactors = FALSE # Prevent strings from being converted to factors
)
result <- sqldf("SELECT AVG(year_of_exp) AS avg_exp FROM df")
print(result)
Output:
avg_exp
1 6.4
Joining
In this example we will Combine data from two data frames based on a common column.
R
library(sqldf)
# Create vectors for different columns
ids <- c(1, 2, 3, 4, 5)
names <- c("Alice", "Bob", "Charlie", "David", "Eve")
years_of_exp <- c(5, 8, 3, 10, 6)
roles <- c("Engineer", "Manager", "Analyst", "Director", "Developer")
# Create a dataframe
df <- data.frame(
id = ids,
name = names,
year_of_exp = years_of_exp,
role = roles,
stringsAsFactors = FALSE # Prevent strings from being converted to factors
)
df2 <- data.frame(id = c(1, 2, 3, 4, 5), salary = c(50000, 60000, 70000, 80000, 90000))
result <- sqldf("SELECT df.*, df2.salary FROM df LEFT JOIN df2 ON df.id = df2.id")
print(result)
Output:
id name year_of_exp role salary
1 1 Alice 5 Engineer 50000
2 2 Bob 8 Manager 60000
3 3 Charlie 3 Analyst 70000
4 4 David 10 Director 80000
5 5 Eve 6 Developer 90000
Grouping
In this example we will Group the rows by role and calculate the average years of experience for each role.
R
library(sqldf)
# Create vectors for different columns
ids <- c(1, 2, 3, 4, 5)
names <- c("Alice", "Bob", "Charlie", "David", "Eve")
years_of_exp <- c(5, 8, 3, 10, 6)
roles <- c("Engineer", "Manager", "Analyst", "Director", "Developer")
# Create a dataframe
df <- data.frame(
id = ids,
name = names,
year_of_exp = years_of_exp,
role = roles,
stringsAsFactors = FALSE # Prevent strings from being converted to factors
)
result <- sqldf("SELECT role, AVG(year_of_exp) AS avg_exp FROM df GROUP BY role")
print(result)
Output:
role avg_exp
1 Analyst 3
2 Developer 6
3 Director 10
4 Engineer 5
5 Manager 8
Subsetting
In this example we will see how to Select specific columns.
R
library(sqldf)
# Create vectors for different columns
ids <- c(1, 2, 3, 4, 5)
names <- c("Alice", "Bob", "Charlie", "David", "Eve")
years_of_exp <- c(5, 8, 3, 10, 6)
roles <- c("Engineer", "Manager", "Analyst", "Director", "Developer")
# Create a dataframe
df <- data.frame(
id = ids,
name = names,
year_of_exp = years_of_exp,
role = roles,
stringsAsFactors = FALSE # Prevent strings from being converted to factors
)
result <- sqldf("SELECT id, name FROM df")
print(result)
Output:
id name
1 1 Alice
2 2 Bob
3 3 Charlie
4 4 David
5 5 Eve
Updating
In below example we will update the year_of_exp column in the df dataframe. Here we updated the year_of_exp column of id=1.
R
library(sqldf)
# Create vectors for different columns
ids <- c(1, 2, 3, 4, 5)
names <- c("Alice", "Bob", "Charlie", "David", "Eve")
years_of_exp <- c(5, 8, 3, 10, 6)
roles <- c("Engineer", "Manager", "Analyst", "Director", "Developer")
# Create a dataframe
df <- data.frame(
id = ids,
name = names,
year_of_exp = years_of_exp,
role = roles,
stringsAsFactors = FALSE # Prevent strings from being converted to factors
)
# Update the year_of_exp column
df2 <- sqldf("SELECT *,
CASE
WHEN id = 1 THEN year_of_exp + 1
ELSE year_of_exp
END AS year_of_exp
FROM df")
print(df2)
Output:
id name year_of_exp role year_of_exp
1 1 Alice 5 Engineer 6
2 2 Bob 8 Manager 8
3 3 Charlie 3 Analyst 3
4 4 David 10 Director 10
5 5 Eve 6 Developer 6
Deleting
In this example we will see how to delete the row from data frame.
R
library(sqldf)
# Sample dataframe
df <- data.frame(
id = c(1, 2, 3, 4, 5),
name = c("Alice", "Bob", "Charlie", "David", "Eve"),
year_of_exp = c(5, 8, 3, 10, 6),
role = c("Engineer", "Manager", "Analyst", "Director", "Developer")
)
# Filter out rows where year_of_exp is less than 5
df2 <- sqldf("SELECT * FROM df WHERE year_of_exp >= 5")
# Print the updated dataframe
print(df2)
Output:
id name year_of_exp role
1 1 Alice 5 Engineer
2 2 Bob 8 Manager
3 4 David 10 Director
4 5 Eve 6 Developer
Similar Reads
Manipulate R Data Frames Using SQL
Manipulating data frames in R Programming using SQL can be easily done using the sqldf package. This package in R provides a mechanism that allows data frame manipulation with SQL and also helps to connect with a limited number of databases. The sqldf package in R is basically used for executing the
8 min read
Indexing and Slicing Data Frames in R
Indexing and Slicing are use for accessing and manipulating data.Indexing: Accessing specific elements (rows or columns) in data structures.Slicing: Extracting subsets of data based on conditions or indices.In R, indexing a data frame allows you to retrieve specific columns by their names:dataframeN
3 min read
Get difference of dataframes using Dplyr in R
In this article, we will discuss How to find the difference between two dataframes using the Dplyr package in the R programming language. Set difference refers to getting or extracting those values from one dataset that are not present in the other. For this, dplyr supports a function called setdiff
1 min read
Subset Dataframe Rows Based On Factor Levels in R
In this article, we will be discussing how to subset a given dataframe rows based on different factor levels with the help of some operators in the R programming language. Method 1: Subset dataframe Rows Based On One Factor Levels In this approach to subset dataframe rows based on one-factor levels,
2 min read
How to Delete DataFrames in R?
In R, a DataFrame is a data structure which can be two-dimensional, that is it can be used to hold data in rows and columns. To create a DataFrame, you can use the data.frame() function. but after you're done with a DataFrame, you may wish to remove it so that memory can be released or your workspac
3 min read
Intersection of dataframes using Dplyr in R
In this article, we will discuss how to find the Intersection of two dataframes using the Dplyr package in R programming language. Dplyr provides intersect() method to get the common data in two dataframes. Syntax: intersect(dataframe1,dataframe2,dataframe3,........,dataframe n) We can perform this
1 min read
DataFrame Row Slice in R
In this article, we are going to see how to Slice row in Dataframe using R Programming Language. Row slicing in R is a way to access the data frame rows and further use them for operations or methods. The rows can be accessed in any possible order and stored in other vectors or matrices as well. Row
4 min read
Sorting DataFrame in R using Dplyr
In this article, we will discuss about how to sort a dataframe in R programming language using Dplyr package. The package Dplyr in R programming language provides a function called arrange() function which is useful for sorting the dataframe. Syntax :Â arrange(.data, ...) The methods given below sho
3 min read
Export Dataframes to Multiple Excel Sheets in R
An excel workbook is used to contain tabular information stored within a cell-like structure. Every excel workbook in R contains multiple sheets to contain the data belonging to the same domain information. The excel sheet can contain columns belonging to different data types. The Excel files can be
5 min read
How to Export DataFrame to CSV in R ?
R Programming language allows us to read and write data into various files like CSV, Excel, XML, etc. In this article, we are going to discuss how to Export DataFrame to CSV file in R Programming Language. Approach:Â Write Data in column wise formatCreate DataFrame for these dataWrite Data to the CS
1 min read