How to Filter Using 'in' and 'not in' Like in SQL in Polars
Last Updated :
02 Aug, 2024
Python Polars, a fast and versatile DataFrame library in Rust with bindings for Python, offers efficient data manipulation capabilities. For those transitioning from SQL or those who need to perform complex data manipulations, Polars provides a familiar and powerful set of tools that mimic SQL-like operations. This article explores how to use SQL-like in
and not in
operations in Polars to filter data effectively.
SQL-like Filtering in Python Polars
In SQL, IN
and NOT IN
are operators used to filter records against multiple possible values. For example, selecting records where a column's value is within a specified list. Polars provide similar functionality, enabling users to filter data frames according to the values inside or outside a specified list.
Prerequisites
Setting Up Your Environment
Before diving into the examples, ensure you have Polars installed. If not, you can install it using pip:
pip install polars
Basic DataFrame Creation
Let's create a simple data frame to demonstrate filtering:
Python
import polars as pl
# Sample data
data = {
"id": [1, 2, 3, 4, 5],
"name": ["Alice", "Bob", "Charlie", "David", "Eve"]
}
df = pl.DataFrame(data)
print(df)
Output:
shape: (5, 2)
┌─────┬─────────┐
│ id ┆ name │
│ --- ┆ --- │
│ i64 ┆ str │
╞═════╪═════════╡
│ 1 ┆ Alice │
│ 2 ┆ Bob │
│ 3 ┆ Charlie │
│ 4 ┆ David │
│ 5 ┆ Eve │
└─────┴─────────┘
Loading Data into Polars DataFrame
I have a data.csv file and let's load some data into a Polars DataFrame from CSV file.
import polars
df = polars.read_csv('data.csv')
print(df)
Output:
Loading csv dataPython Polars Filtering using 'in' and 'not in'
Polars provides methods for filtering DataFrame rows. To do filtering using ‘in’ and ‘not in’, you can use the .is_in() method and ~ operator to negate.
Using 'in'
Use the .is_in() method for filtering rows with specific column values.
# List of cities to filter(include)
cities = ["New York", "Chicago"]
# Filter rows where 'city' is in the list of cities
df1 = df.filter(polars.col("city").is_in(cities))
print(df1)
Output:
how to use inUsing ‘not in’
The .is_in() have to be negated using the ~ operator to filter out rows with certain columns values.
# List of cities to filter(exclude)
cities = ["New York", "Chicago"]
# Filter rows where 'city' is not in the list of cities
df2 = df.filter(~polars.col("city").is_in(cities))
print(df2)
Output:
how to use not inExamples
Example 1. Filter by Age
Filter rows where the age is in a specific range.
Python
import polars
# Sample data
df = polars.read_csv('data.csv')
# List of ages to filter
ages = [24, 25, 28]
# Filter rows where 'age' is in the list of ages
df3 = df.filter(polars.col("age").is_in(ages))
print(df3)
Output:
Include data by ageExample 2. Filter by Name
Filter rows where the name is not in a specific list.
Python
import polars
# Sample data
df = polars.read_csv('data.csv')
# List of names to exclude
names = ["Alice", "Eve", "Frank", "Ivy"]
# Filter rows where 'name' is not in the list of excluded names
df4 = df.filter(~polars.col("name").is_in(names))
print(df4)
Output:
Exclude data by nameConclusion
Polars makes filtering DataFrame rows with in and not in conditions easy and effective. This means that it is very simple to filter DataFrame rows using the ‘in’ and ‘not in’ conditions in Polars. You can get a better understanding of what Polars can do by trying different filtering criteria and incorporating them into your data processing workflows.
Similar Reads
Filter Pandas dataframe in Python using 'in' and 'not in'
The in and not in operators can be used with Pandas DataFrames to check if a given value or set of values is present in the DataFrame or not using Python. The in-operator returns a boolean value indicating whether the specified value is present in the DataFrame, while the not-in-operator returns a b
3 min read
How to Use âNOT INâ Filter in Pandas?
The "NOT IN"(â¼) filter is a membership operator used to check whether the data is present in DataFrame or not. Pandas library does not have the direct NOT IN filter in Python, but we can perform the NOT IN filter by negating the isin() operator of Pandas. In this tutorial, we will provide a step-by
3 min read
How to Use NULL Values Inside NOT IN Clause in SQL?
In SQL, NULL holds a special status as it represents the absence of a value, making it fundamentally different from regular values. Unlike numbers or strings, NULL cannot be directly compared using operators like = or !=. This special status often leads to unexpected behavior in SQL queries, especia
4 min read
How to Filter Null Values to Last While Sorting Ascending in SQL?
In SQL, sorting a column in ascending order usually places NULL values at the top of the result set. While this is the default behavior, it may not always align with the desired outcome, especially when we want to prioritize non-NULL values. Fortunately, there are effective techniques to ensure NULL
5 min read
Is there a combination of "LIKE" and "IN" in SQL Server?
In SQL Server, the LIKE operator is commonly used for pattern matching within string columns, while the IN operator is utilized to filter data based on a set of specified values. By combining these two operators, we can create more efficient queries. This combination allows for filtering results bas
3 min read
How to Combine LIKE and IN in SQL Statement
The LIKE and IN operators in SQL are essential for building efficient, complex queries that filter data with precision. Combining these two operators in a single query enables users to target specific patterns in data while also filtering based on predefined values In this article, we will explain h
3 min read
How to Filter Data Using Conditions Joined by AND Operator
In the field of data analysis and database processing, efficient filtering is critical to obtain significant information. Filtering is based on the selection of data where the data criteria are applied. One commonly employed method is using the AND operator to join multiple conditions, allowing for
4 min read
How to find missing values in a matrix in R
In this article, we will examine various methods for finding missing values in a matrix by using R Programming Language. What are missing values?The data points in a dataset that are missing for a particular variable are known as missing values. These missing values are represented in various ways s
3 min read
How to Use âNOT INâ Operator in R?
In this article, we will discuss NOT IN Operator in R Programming Language. NOT IN Operator is used to check whether the element in present or not. The symbol used for IN operator is "%in%". For NOT IN operator we have to add " ! " operator before that , so the symbol for NOT IN operator is "! %in%"
2 min read
How to Combine MySQL IN and LIKE Operators?
In database management, MySQL stands as one of the most popular relational database management systems. As developers strive to extract specific data from their databases, the judicious use of SQL operators becomes imperative. Among these operators, the IN and LIKE operators play crucial roles in fa
4 min read