How to Apply a Custom Function in Polars that Does the Processing Row by Row?
Last Updated :
29 Jul, 2024
Polars is a fast DataFrame library in Rust and Python, designed to handle large datasets efficiently. It provides a powerful API for data manipulation, similar to pandas, but with performance optimizations that can significantly speed up your data processing tasks. One common task in data processing is applying custom functions to manipulate data row by row. In this article, we will explore how to do this using Polars.
Prerequisites
Before we begin, ensure you have Polars installed in your Python environment. You can install it using pip:
pip install polars
You should also have a basic understanding of Python programming and DataFrame operations.
Loading Data into Polars DataFrame
To demonstrate how to apply a custom function row by row in Polars, we'll first create a sample DataFrame. This code creates a DataFrame with three columns: name, age, and salary
Python
import polars as pl
# Create a sample DataFrame
data = {
"name": ["Alice", "Bob", "Charlie", "David", "Eva"],
"age": [25, 30, 35, 40, 45],
"salary": [50000, 60000, 70000, 80000, 90000]
}
df = pl.DataFrame(data)
print(df)
Output
shape: (5, 3)
┌─────────┬─────┬────────┐
│ name ┆ age ┆ salary │
│ --- ┆ --- ┆ --- │
│ str ┆ i64 ┆ i64 │
╞═════════╪═════╪════════╡
│ Alice ┆ 25 ┆ 50000 │
│ Bob ┆ 30 ┆ 60000 │
│ Charlie ┆ 35 ┆ 70000 │
│ David ┆ 40 ┆ 80000 │
│ Eva ┆ 45 ┆ 90000 │
└─────────┴─────┴────────┘
Applying a Custom Function Row by Row
In Polars, you can apply a custom function to each row using the apply method. The custom function can be defined to process data as needed. Here are three examples to illustrate this.
Example 1: Applying a Custom Function to Calculate Age Category
we define a custom function categorize_age that categorizes individuals into age groups: "Young," "Middle-aged," and "Senior." We then apply this function to each row of the DataFrame using pl.struct().apply(). The result is a new column named "age_category" that contains the age category for each individual.
Python
# Custom function to categorize age
def categorize_age(row):
age = row["age"]
if age < 30:
return "Young"
elif 30 <= age < 40:
return "Middle-aged"
else:
return "Senior"
# Apply the custom function row by row
df = df.with_columns([
pl.struct(["age"]).apply(categorize_age).alias("age_category")
])
print(df)
Output
shape: (5, 4)
┌─────────┬─────┬────────┬──────────────┐
│ name ┆ age ┆ salary ┆ age_category │
│ --- ┆ --- ┆ --- ┆ --- │
│ str ┆ i64 ┆ i64 ┆ str │
╞═════════╪═════╪════════╪══════════════╡
│ Alice ┆ 25 ┆ 50000 ┆ Young │
│ Bob ┆ 30 ┆ 60000 ┆ Middle-aged │
│ Charlie ┆ 35 ┆ 70000 ┆ Middle-aged │
│ David ┆ 40 ┆ 80000 ┆ Senior │
│ Eva ┆ 45 ┆ 90000 ┆ Senior │
└─────────┴─────┴────────┴──────────────┘
Example 2: Applying a Custom Function to Adjust Salary Based on Age
This example demonstrates a custom function adjust_salary that adjusts salaries based on age groups, applying different multipliers for each group. We use pl.struct().apply() to apply this function to each row, resulting in a new column called "adjusted_salary" that reflects the adjusted salaries.
Python
# Custom function to adjust salary
def adjust_salary(row):
age = row["age"]
salary = row["salary"]
if age < 30:
return salary * 1.1
elif 30 <= age < 40:
return salary * 1.05
else:
return salary * 1.03
# Apply the custom function row by row
df = df.with_columns([
pl.struct(["age", "salary"]).apply(adjust_salary).alias("adjusted_salary")
])
print(df)
Output
shape: (5, 5)
┌─────────┬─────┬────────┬──────────────┬─────────────────┐
│ name ┆ age ┆ salary ┆ age_category ┆ adjusted_salary │
│ --- ┆ --- ┆ --- ┆ --- ┆ --- │
│ str ┆ i64 ┆ i64 ┆ str ┆ f64 │
╞═════════╪═════╪════════╪══════════════╪═════════════════╡
│ Alice ┆ 25 ┆ 50000 ┆ Young ┆ 55000.0 │
│ Bob ┆ 30 ┆ 60000 ┆ Middle-aged ┆ 63000.0 │
│ Charlie ┆ 35 ┆ 70000 ┆ Middle-aged ┆ 73500.0 │
│ David ┆ 40 ┆ 80000 ┆ Senior ┆ 82400.0 │
│ Eva ┆ 45 ┆ 90000 ┆ Senior ┆ 92700.0 │
└─────────┴─────┴────────┴──────────────┴─────────────────┘
Example 3: Combining Multiple Columns in a Custom Function
we create a custom function combine_name_age that combines the name and age columns into a single string. By applying this function using pl.struct().apply(), we generate a new column "name_age" that contains the combined string for each row, providing a concise representation of name and age together.
Python
# Custom function to combine name and age
def combine_name_age(row):
return f"{row['name']} ({row['age']})"
# Apply the custom function row by row
df = df.with_columns([
pl.struct(["name", "age"]).apply(combine_name_age).alias("name_age")
])
print(df)
Output
shape: (5, 6)
┌─────────┬─────┬────────┬──────────────┬─────────────────┬──────────────┐
│ name ┆ age ┆ salary ┆ age_category ┆ adjusted_salary ┆ name_age │
│ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- │
│ str ┆ i64 ┆ i64 ┆ str ┆ f64 ┆ str │
╞═════════╪═════╪════════╪══════════════╪═════════════════╪══════════════╡
│ Alice ┆ 25 ┆ 50000 ┆ Young ┆ 55000.0 ┆ Alice (25) │
│ Bob ┆ 30 ┆ 60000 ┆ Middle-aged ┆ 63000.0 ┆ Bob (30) │
│ Charlie ┆ 35 ┆ 70000 ┆ Middle-aged ┆ 73500.0 ┆ Charlie (35) │
│ David ┆ 40 ┆ 80000 ┆ Senior ┆ 82400.0 ┆ David (40) │
│ Eva ┆ 45 ┆ 90000 ┆ Senior ┆ 92700.0 ┆ Eva (45) │
└─────────┴─────┴────────┴──────────────┴─────────────────┴──────────────┘
Conclusion
Applying custom functions row by row in Polars is straightforward and efficient. By using the apply method, you can implement various custom processing needs directly within your DataFrame operations. Polars' performance and flexibility make it an excellent choice for high-performance data processing tasks.
Similar Reads
Apply function to each row in Data.table in R
In this article, we are going to see how to apply functions to each row in the data.table in R Programming Language. For applying a function to each row of the given data.table, the user needs to call the apply() function which is the base function of R programming language, and pass the required p
1 min read
Apply Function to data.table in Each Specified Column in R
In this article, we are going to see that how to apply a function to data.table in each specified column in R Programming Language. The data.table library in R is used to create datasets and represent it in an organized manner. The library can be downloaded and installed into the working space using
3 min read
Apply function to each column of matrix in R
In this article, we will explore a method to apply the function to each matrix column by using R Programming Language. How to apply the function to each column of the matrix The function 'apply()' is used to apply the function to each column of the matrix. By using these methods provided by R, it is
3 min read
How to build a function that loops through data frames and transforms the data in R?
Working with multiple data frames in R can often require repetitive tasks. Automating these tasks with a function can save time and reduce errors. This article will guide you through building a function in R that loops through multiple data frames and applies transformations to them.What is transfor
3 min read
Apply a function to single or selected columns or rows in Pandas Dataframe
In this article, we will learn different ways to apply a function to single or selected columns or rows in Dataframe. We will use Dataframe/series.apply() method to apply a function. Apply a function to single row in Pandas DataframeHere, we will use different methods to apply a function to single r
5 min read
Apply a function to each row or column in Dataframe using pandas.apply()
Let's explore how to use the apply() function to perform operations on Pandas DataFrame rows and columns.pandas.DataFrame.apply() method is used to apply a function along the axis of a DataFrame (either rows or columns). Syntax: DataFrame.apply(func, axis=0, raw=False, result_type=None, args=None, *
5 min read
Apply function to every row in a Pandas DataFrame
Python is a great language for performing data analysis tasks. It provides a huge amount of Classes and functions which help in analyzing and manipulating data more easily. In this article, we will see how we can apply a function to every row in a Pandas Dataframe. Apply Function to Every Row in a P
7 min read
Apply a function to each group using Dplyr in R
In this article, we are going to learn how to apply a function to each group using dplyr in the R programming language. The dplyr package in R is used for data manipulations and modifications. The package can be downloaded and installed into the working space using the following command :Â install.p
4 min read
How to use data.table within functions and loops in R?
data. table is the R package that can provide the enhanced version of the data. frame for the fast aggregation, fast ordered joins, fast add/modify/delete of the columns by the reference, and fast file reading. It can be designed to provide a high-performance version of the base R's data. frame with
3 min read
How to Create a Decile Column in Python Polars
In this tutorial, we'll learn how to create a decile column using Python's Polars library. Deciles are a common way to divide data into ten equal parts, each containing 10% of the values. They are often used in statistics to understand data distribution, making them a powerful tool in data analysis.
2 min read