Check For A Substring In A Pandas Dataframe Column
Last Updated :
24 Apr, 2025
Pandas is a data analysis library for Python that has exploded in popularity over the past years. In technical terms, pandas is an in memory nosql database, that has sql-like constructs, basic statistical and analytic support, as well as graphing capability .One common task in data analysis is searching for substrings within a dataset, and Pandas offers efficient tools to accomplish this.
In this article, we will explore the ways by which we can check for a substring in a Pandas DataFrame column.
Check for a Substring in a DataFrame Column
Below are some of the ways by which check for a substring in a Pandas DataFrame column in Python:
- Using str.contains() method
- Using Regular Expressions
- apply() function
- List Comprehension with 'in' Operator
Check For a Substring in a Pandas Dataframe using str.contains() method
In this example, a pandas DataFrame is created with employee information. A new column, 'NameContainsSubstring,' is added, indicating whether the substring 'an' is present in each 'Name' entry using the str.contains
method.
Python3
import pandas as pd
data = {
'EmployeeID': [101, 102, 103, 104],
'Name': ['Aman', 'Bhavna', 'Madhav', 'Rohan'],
'Department': ['HR', 'IT', 'Finance', 'Marketing'],
'Salary': [60000, 75000, 90000, 65000]
}
df = pd.DataFrame(data)
# Checking for substring 'an' in the 'Name' column
substring = 'an'
df['NameContainsSubstring'] = df['Name'].str.contains(substring)
filtered_df = df[df['NameContainsSubstring']]
print(filtered_df)
Output:
EmployeeID Name Department Salary NameContainsSubstring
0 101 Aman HR 60000 True
3 104 Rohan Marketing 65000 True
Check For A Substring In A Pandas Dataframe Using Regular Expressions
In this example, a pandas DataFrame is created with employee information. A new column, 'NameContainsPattern,' is added, indicating whether the regular expression pattern 'ma' is present in each 'Name' entry.
In this example, the str.contains
method is used with the regex=True
parameter to interpret the pattern as a regular expression. The negative lookahead ensures that 'ma' is not immediately followed by the end of the string.
Python3
import pandas as pd
data = {
'EmployeeID': [101, 102, 103, 104],
'Name': ['aman', 'bhavna', 'madhav', 'rohan'],
'Department': ['HR', 'IT', 'Finance', 'Marketing'],
'Salary': [60000, 75000, 90000, 65000]
}
df = pd.DataFrame(data)
# regular expression pattern with negative lookahead
pattern = r'ma(?!$)'
df['NameContainsPattern'] = df['Name'].str.contains(pattern, regex=True)
filtered_df = df[df['NameContainsPattern']]
print(filtered_df)
Output:
EmployeeID Name Department Salary NameContainsPattern
0 101 aman HR 60000 True
2 103 madhav Finance 90000 True
Check For A Substring In A Pandas Dataframe Using apply() function
In this example, a pandas DataFrame is created with employee information, including 'EmployeeID', 'Name', 'Department', and 'Salary'. A new column, 'NameContainsSubstring,' is added, indicating whether the substring 'av' is present in each 'Name' entry using the apply() method with a lambda function.
Python3
import pandas as pd
# Creating a relevant 4-column DataFrame
data = {
'EmployeeID': [101, 102, 103, 104],
'Name': ['Aman', 'Bhavna', 'Madhav', 'Rohan'],
'Department': ['HR', 'IT', 'Finance', 'Marketing'],
'Salary': [60000, 75000, 90000, 65000]
}
df = pd.DataFrame(data)
# Checking for substring 'av' in the 'Name' column and adding a new column
substring = 'av'
df['NameContainsSubstring'] = df['Name'].apply(lambda x: substring in x)
filtered_df = df[df['NameContainsSubstring']]
print(filtered_df)
Output:
EmployeeID Name Department Salary NameContainsSubstring
1 102 Bhavna IT 75000 True
2 103 Madhav Finance 90000 True
Check For A Substring In A Pandas Dataframe Using List Comprehension with 'in' Operator
In this example, let's check whether the substring is present in each department key using list comprehension.
Python3
import pandas as pd
data = {
'EmployeeID': [101, 102, 103, 104],
'Name': ['Aman', 'Bhavna', 'Madhav', 'Rohan'],
'Department': ['HR', 'IT', 'Finance', 'Marketing'],
'Salary': [60000, 75000, 90000, 65000]
}
df = pd.DataFrame(data)
# Checking for substring
substring = 'Finance'
df['NameContainsSubstring'] = [substring in Department for Department in df['Department']]
filtered_df = df[df['NameContainsSubstring']]
print(filtered_df)
Output:
EmployeeID Name Department Salary NameContainsSubstring
2 103 Madhav Finance 90000 True
Similar Reads
Check if a column starts with given string in Pandas DataFrame?
In this program, we are trying to check whether the specified column in the given data frame starts with specified string or not. Let us try to understand this using an example suppose we have a dataset named student_id, date_of_joining, branch. Example: Python3 #importing library pandas as pd impor
2 min read
How to check for a substring in a PySpark dataframe ?
In this article, we are going to see how to check for a substring in PySpark dataframe. Substring is a continuous sequence of characters within a larger string size. For example, "learning pyspark" is a substring of "I am learning pyspark from GeeksForGeeks". Let us look at different ways in which w
5 min read
How to Access a Column in a DataFrame with Pandas
In this article we will explore various techniques to access a column in a dataframe with pandas with concise explanations and practical examples.Method 1: Accessing a Single Column Using Bracket NotationBracket notation is the most straightforward method to access a column. Use the syntax df['colum
4 min read
How to Get substring from a column in PySpark Dataframe ?
In this article, we are going to see how to get the substring from the PySpark Dataframe column and how to create the new column and put the substring in that newly created column. We can get the substring of the column using substring() and substr() function. Syntax: substring(str,pos,len) df.col_n
3 min read
How to lowercase strings in a column in Pandas dataframe
Analyzing real-world data is somewhat difficult because we need to take various things into consideration. Apart from getting the useful data from large datasets, keeping data in required format is also very important. One might encounter a situation where we need to lowercase each letter in any spe
2 min read
Filter pandas DataFrame by substring criteria
IntroductionPandas is a popular Python library for data analysis and manipulation. The DataFrame is one of the key data structures in Pandas, providing a way to store and work with structured data in a tabular format. DataFrames are useful for organizing and storing data in a consistent format, allo
10 min read
Check whether a given column is present in a Pandas DataFrame or not
Consider a Dataframe with 4 columns : 'ConsumerId', 'CarName', CompanyName, and 'Price'. We have to determine whether a particular column is present in the DataFrame or not in Pandas Dataframe using Python. Creating a Dataframe to check if a column exists in DataframePython3 # import pandas library
2 min read
How to Select Single Column of a Pandas Dataframe
In Pandas, a DataFrame is like a table with rows and columns. Sometimes, we need to extract a single column to analyze or modify specific data. This helps in tasks like filtering, calculations or visualizations. When we select a column, it becomes a Pandas Series, a one-dimensional data structure th
2 min read
Get all rows in a Pandas DataFrame containing given substring
Let's see how to get all rows in a Pandas DataFrame containing given substring with the help of different examples. Code #1: Check the values PG in column Position Python3 1== # importing pandas import pandas as pd # Creating the dataframe with dict of lists df = pd.DataFrame({'Name': ['Geeks', 'Pet
3 min read
Join Pandas DataFrames matching by substring
Prerequisites: Pandas In this article, we will learn how to join two Data Frames matching by substring with python. Functions used:join(): joins all the elements in an iteration into a single stringlambda(): an anonymous method which is declared without a name and can accept any number of parameter
1 min read