Open In App

How to Filter Using 'in' and 'not in' Like in SQL in Polars

Last Updated : 02 Aug, 2024
Comments
Improve
Suggest changes
Like Article
Like
Report

Python Polars, a fast and versatile DataFrame library in Rust with bindings for Python, offers efficient data manipulation capabilities. For those transitioning from SQL or those who need to perform complex data manipulations, Polars provides a familiar and powerful set of tools that mimic SQL-like operations. This article explores how to use SQL-like in and not in operations in Polars to filter data effectively.

SQL-like Filtering in Python Polars

In SQL, IN and NOT IN are operators used to filter records against multiple possible values. For example, selecting records where a column's value is within a specified list. Polars provide similar functionality, enabling users to filter data frames according to the values inside or outside a specified list.

Prerequisites

Setting Up Your Environment

Before diving into the examples, ensure you have Polars installed. If not, you can install it using pip:

pip install polars

Basic DataFrame Creation

Let's create a simple data frame to demonstrate filtering:

Python
import polars as pl

# Sample data
data = {
    "id": [1, 2, 3, 4, 5],
    "name": ["Alice", "Bob", "Charlie", "David", "Eve"]
}

df = pl.DataFrame(data)
print(df)

Output:

shape: (5, 2)
┌─────┬─────────┐
│ id ┆ name │
│ --- ┆ --- │
│ i64 ┆ str │
╞═════╪═════════╡
│ 1 ┆ Alice │
│ 2 ┆ Bob │
│ 3 ┆ Charlie │
│ 4 ┆ David │
│ 5 ┆ Eve │
└─────┴─────────┘

Loading Data into Polars DataFrame

I have a data.csv file and let's load some data into a Polars DataFrame from CSV file.

import polars

df = polars.read_csv('data.csv')
print(df)

Output:

PolarsFilterLoadData
Loading csv data

Python Polars Filtering using 'in' and 'not in'

Polars provides methods for filtering DataFrame rows. To do filtering using ‘in’ and ‘not in’, you can use the .is_in() method and ~ operator to negate.

Using 'in'

Use the .is_in() method for filtering rows with specific column values.

# List of cities to filter(include)
cities = ["New York", "Chicago"]

# Filter rows where 'city' is in the list of cities
df1 = df.filter(polars.col("city").is_in(cities))
print(df1)

Output:

PolarsFilterUseIn
how to use in

Using ‘not in’

The .is_in() have to be negated using the ~ operator to filter out rows with certain columns values.

# List of cities to filter(exclude)
cities = ["New York", "Chicago"]

# Filter rows where 'city' is not in the list of cities
df2 = df.filter(~polars.col("city").is_in(cities))
print(df2)

Output:

PolarsFilterUseNotIn
how to use not in

Examples

Example 1. Filter by Age

Filter rows where the age is in a specific range.

Python
import polars

# Sample data
df = polars.read_csv('data.csv')
# List of ages to filter
ages = [24, 25, 28]

# Filter rows where 'age' is in the list of ages
df3 = df.filter(polars.col("age").is_in(ages))
print(df3)

Output:

PolarsFilterExample1
Include data by age

Example 2. Filter by Name

Filter rows where the name is not in a specific list.

Python
import polars

# Sample data
df = polars.read_csv('data.csv')
# List of names to exclude
names = ["Alice", "Eve", "Frank", "Ivy"]

# Filter rows where 'name' is not in the list of excluded names
df4 = df.filter(~polars.col("name").is_in(names))
print(df4)

Output:

PolarsFilterExample2
Exclude data by name

Conclusion

Polars makes filtering DataFrame rows with in and not in conditions easy and effective. This means that it is very simple to filter DataFrame rows using the ‘in’ and ‘not in’ conditions in Polars. You can get a better understanding of what Polars can do by trying different filtering criteria and incorporating them into your data processing workflows.


Next Article
Article Tags :
Practice Tags :

Similar Reads