How to Find Matching Rows in Two Pandas DataFrames
Last Updated :
18 Dec, 2024
Let's learn how to find matching rows in two dataframes using Pandas.
Find Matching Rows Using merge()
merge() function is one of the most commonly used methods for finding matching rows in two DataFrames. It performs a SQL-style inner join, returning rows where matching values exist in both DataFrames.
Python
import pandas as pd
# Sample DataFrames
df1 = pd.DataFrame({
'A': [1, 2, 3, 4],
'B': ['a', 'b', 'c', 'd']
})
df2 = pd.DataFrame({
'A': [3, 4, 5, 6],
'B': ['c', 'd', 'e', 'f']
})
# Merge dataframes on columns 'A' and 'B' (default is inner join)
matching_rows = pd.merge(df1, df2, on=['A', 'B'], how='inner')
print(matching_rows)
The merge() function returns only the rows with matching values in both DataFrames, as shown in the output.
Using merge()
with a Common Column
You can also use merge()
to find matching rows based on a specific column, such as an ID column. The following example demonstrates an inner join between two DataFrames using the ID
column.
Python
import pandas as pd
# Sample DataFrames
df1 = pd.DataFrame({
'ID': [1, 2, 3, 4],
'Name': ['John', 'Alice', 'Bob', 'Eve']
})
df2 = pd.DataFrame({
'ID': [2, 3, 4, 5],
'Age': [25, 30, 22, 29]
})
# Perform inner join on the 'ID' column
result = pd.merge(df1, df2, on='ID', how='inner')
print(result)
Output ID Name Age
0 2 Alice 25
1 3 Bob 30
2 4 Eve 22
The merge() function matches rows based on the ID column and returns the rows where the ID values are common in both DataFrames.
Find Matching Rows Using isin()
The isin()
method allows you to filter rows in one DataFrame based on whether the values exist in another DataFrame. This method is particularly useful when you want to check for matches in a specific column.
Python
import pandas as pd
# Create the first DataFrame: df1
df1 = pd.DataFrame({
'ID': [1, 2, 3, 4],
'Flower': ['Rose', 'Lily', 'Tulip', 'Daisy']
})
# Create the second DataFrame: df2
df2 = pd.DataFrame({
'ID': [3, 4, 7],
'Category': ['Outdoor', 'Outdoor', 'Indoor']
})
# Find rows in df1 where 'ID' matches with 'ID' in df2
matching_rows = df1[df1['ID'].isin(df2['ID'])]
print(matching_rows)
Output ID Flower
2 3 Tulip
3 4 Daisy
Here, isin() checks if the ID in df1 matches any value in the ID column of df2, and returns the matching rows from df1.
Find Matching Rows Using query()
query()
method is another way to find matching rows in two DataFrames. You can use the in
operator within the query to filter rows based on common values between the DataFrames.
Python
import pandas as pd
# Create the first DataFrame: df1
df1 = pd.DataFrame({
'BookID': [101, 102, 103],
'Title': ['To Kill a Mockingbird', '1984', 'The Great Gatsby']
})
# Create the second DataFrame: df2
df2 = pd.DataFrame({
'BookID': [103, 105, 107],
'Genre': ['Classic', 'Romance', 'Science Fiction']
})
# Find rows in df1 where 'BookID' matches with 'BookID' in df2 using query()
# Convert df2['BookID'] to a list to avoid the ValueError
result = df1.query("BookID in @df2.BookID").merge(
df2.query("BookID in @df1.BookID"),
on="BookID",
how="inner"
)
# Print the result
print(result)
Output BookID Title Genre
0 103 The Great Gatsby Classic
In this example, query() is used to filter rows in df1 where the BookID matches any BookID in df2. The results are then merged using merge() to get the full details from both DataFrames.
Using concat() and duplicated() to Find Common Rows
You can concatenate two DataFrames using concat() and then use duplicated() to identify rows that appear in both DataFrames. This method works well when the column names are identical in both DataFrames.
Python
import pandas as pd
# Create the first DataFrame: df1
df1 = pd.DataFrame({
'A': [1, 2, 3, 4],
'B': ['a', 'b', 'c', 'd']
})
# Create the second DataFrame: df2
df2 = pd.DataFrame({
'A': [3, 4, 5, 6],
'B': ['c', 'd', 'e', 'f']
})
# Concatenate the two DataFrames
concatenated_df = pd.concat([df1, df2])
# Find duplicated rows (i.e., rows that appear in both DataFrames)
matching_rows = concatenated_df[concatenated_df.duplicated(keep=False)].drop_duplicates()
print(matching_rows)
Output:
Here, concat() merges the two DataFrames, and duplicated() identifies rows that appear in both, which are then filtered using drop_duplicates() to return the matching rows.
Similar Reads
How to Merge Two Pandas DataFrames on Index Merging two pandas DataFrames on their index is necessary when working with datasets that share the same row identifiers but have different columns. The core idea is to align the rows of both DataFrames based on their indices, combining the respective columns into one unified DataFrame. To merge two
3 min read
Join Pandas DataFrames matching by substring Prerequisites: Pandas In this article, we will learn how to join two Data Frames matching by substring with python. Functions used:join(): joins all the elements in an iteration into a single stringlambda(): an anonymous method which is declared without a name and can accept any number of parameter
1 min read
How to combine two DataFrames in Pandas? While working with data, there are multiple times when you would need to combine data from multiple sources. For example, you may have one DataFrame that contains information about a customer, while another DataFrame contains data about their transaction history. If you want to analyze this data tog
3 min read
How to Get the Common Index of Two Pandas DataFrames When working with large datasets in Python Pandas, having multiple DataFrames with overlapping or related data is common. In many cases, we may want to identify the common indices between two DataFrames to perform further analysis, such as merging, filtering, or comparison.This article will guide us
5 min read
How to Merge DataFrames of different length in Pandas ? Merging DataFrames of different lengths in Pandas can be done using the merge(), and concat(). These functions allow you to combine data based on shared columns or indices, even if the DataFrames have unequal lengths. By using the appropriate merge method (like a left join, right join, or outer join
3 min read