Open In App

Execute SQL queries on a dataframe using R

Last Updated : 28 Jun, 2025
Comments
Improve
Suggest changes
Like Article
Like
Report

In R Programming Language we can use the sqldf package in R to execute SQL queries on a data frame. This can be useful for performing various data manipulation tasks using SQL syntax. The sqldf package provides a way to write SQL queries as strings and apply them to a data frame, allowing us to perform operations such as filtering, sorting, aggregation, joining and more.

Installing the Required Packages

To use the sqldf package, we first need to install it using the install.packages() function.

R
install.packages("sqldf")

Applying SQL Queries using sqldf

We will be performing some SQL operations on a sample data frame using the sqldf in R programming language. To do that we will first create a sample data frame.

Creating a Sample Dataset

We will create a data frame df with four columns (id, name, year_of_exp and role) each containing respective vectors for five individuals. We will first define vectors for to store the values for all the columns. Then, we will combine these vectors into a data frame using the data.frame() function with stringsAsFactors = FALSE to prevent automatic conversion of character vectors to factors. The result will be a table of data representing the individuals' IDs, names, years of experience and job roles.

R
ids <- c(1, 2, 3, 4, 5)
names <- c("Alice", "Bob", "Charlie", "David", "Eve")
years_of_exp <- c(5, 8, 3, 10, 6)
roles <- c("Engineer", "Manager", "Analyst", "Director", "Developer")

df <- data.frame(
  id = ids,
  name = names,
  year_of_exp = years_of_exp,
  role = roles,
  stringsAsFactors = FALSE  # Prevent strings from being converted to factors
)

df

Output:

df
Creating Dataset

1. Subsetting

We will execute an SQL query to select specific columns. The sqldf() function is used to execute this query in R programming language.

  • SELECT: Selects specific columns (id, name) from the df data frame.
R
library(sqldf)

result <- sqldf("SELECT id, name FROM df")
print(result)

Output:

subset
Subsetting

2. Sorting

We will execute an SQL query to order the data frame by year_of_exp in descending order.

  • ORDER BY: Orders the rows by the year_of_exp column in descending order (DESC).
R
library(sqldf)

result <- sqldf("SELECT * FROM df ORDER BY year_of_exp DESC")
print(result)

Output:

sort
Sorting

3. Aggregation

We will execute an SQL query to calculate the average years of experience.

  • AVG: Calculates the average value of the year_of_exp column and names it as avg_exp.
R
library(sqldf)

result <- sqldf("SELECT AVG(year_of_exp) AS avg_exp FROM df")
print(result)

Output:

aggregate
Aggregation

4. Joining

We will execute an SQL query to combine data from two data frames based on a common column.

  • LEFT JOIN: Combines rows from two data frames (df and df2) where matching id values exist, retaining all rows from df.
R
library(sqldf)

df2 <- data.frame(id = c(1, 2, 3, 4, 5), 
                  salary = c(50000, 60000, 70000, 80000, 90000)
                  )

result <- sqldf("SELECT df.*, df2.salary FROM df LEFT JOIN df2 ON df.id = df2.id")
print(result)

Output:

join
Joining

5. Grouping

We will execute an SQL query to group the rows by role and calculate the average years of experience for each role.

  • GROUP BY: Groups the rows by the role column and calculates the average year_of_exp for each role.
R
library(sqldf)

result <- sqldf("SELECT role, AVG(year_of_exp) AS avg_exp FROM df GROUP BY role")
print(result)

Output:

groupby
Grouping

6. Filtering

We will execute an SQL query to select rows where year_of_exp is greater than 5.

  • WHERE: Filters rows where the year_of_exp is greater than 5.
R
library(sqldf)

result <- sqldf("SELECT * FROM df WHERE year_of_exp > 5")
print(result)

Output:

filter
Filtering

7. Updating

We will execute an SQL query to update the year_of_exp column in the df data frame. Here we updated the year_of_exp column of id=1.

  • CASE WHEN: Applies conditional logic
  • WHEN id = 1 checks if id is 1.
  • THEN: If true, adds 1 to year_of_exp.
  • ELSE: If false, keeps year_of_exp unchanged.
  • AS: Renames the new year_of_exp column.
  • END: Closes the CASE expression.
R
library(sqldf)

result <- sqldf("SELECT *,
                CASE WHEN id = 1 
                     THEN year_of_exp + 1 
                     ELSE year_of_exp 
                END AS year_of_exp 
                FROM df")
result

Output:

updating
Updating

8. Deleting

We will execute an SQL query to delete the row from data frame where years of experience is less than 5.

Note: sqldf doesn't directly support DELETE in the same way as other SQL environments

  • WHERE: Filters and selects rows with year_of_exp less than or equal to 5, effectively deleting others.
R
library(sqldf)

result <- sqldf("SELECT * FROM df WHERE year_of_exp <=5")
result

Output:

delete
Deleting

In this article ,we learned how to use sqldf to run SQL commands on a data frame in R programming Language.


Next Article
Article Tags :

Similar Reads