Open In App

How to Find Matching Rows in Two Pandas DataFrames

Last Updated : 18 Dec, 2024
Comments
Improve
Suggest changes
Like Article
Like
Report

Let's learn how to find matching rows in two dataframes using Pandas.

Find Matching Rows Using merge()

merge() function is one of the most commonly used methods for finding matching rows in two DataFrames. It performs a SQL-style inner join, returning rows where matching values exist in both DataFrames.

Python
import pandas as pd

# Sample DataFrames
df1 = pd.DataFrame({
    'A': [1, 2, 3, 4],
    'B': ['a', 'b', 'c', 'd']
})

df2 = pd.DataFrame({
    'A': [3, 4, 5, 6],
    'B': ['c', 'd', 'e', 'f']
})

# Merge dataframes on columns 'A' and 'B' (default is inner join)
matching_rows = pd.merge(df1, df2, on=['A', 'B'], how='inner')
print(matching_rows)

Output
   A  B
0  3  c
1  4  d

The merge() function returns only the rows with matching values in both DataFrames, as shown in the output.

Using merge() with a Common Column

You can also use merge() to find matching rows based on a specific column, such as an ID column. The following example demonstrates an inner join between two DataFrames using the ID column.

Python
import pandas as pd

# Sample DataFrames
df1 = pd.DataFrame({
    'ID': [1, 2, 3, 4],
    'Name': ['John', 'Alice', 'Bob', 'Eve']
})

df2 = pd.DataFrame({
    'ID': [2, 3, 4, 5],
    'Age': [25, 30, 22, 29]
})

# Perform inner join on the 'ID' column
result = pd.merge(df1, df2, on='ID', how='inner')
print(result)

Output
   ID   Name  Age
0   2  Alice   25
1   3    Bob   30
2   4    Eve   22

The merge() function matches rows based on the ID column and returns the rows where the ID values are common in both DataFrames.

Find Matching Rows Using isin()

The isin() method allows you to filter rows in one DataFrame based on whether the values exist in another DataFrame. This method is particularly useful when you want to check for matches in a specific column.

Python
import pandas as pd

# Create the first DataFrame: df1
df1 = pd.DataFrame({
    'ID': [1, 2, 3, 4],
    'Flower': ['Rose', 'Lily', 'Tulip', 'Daisy']
})

# Create the second DataFrame: df2
df2 = pd.DataFrame({
    'ID': [3, 4, 7],
    'Category': ['Outdoor', 'Outdoor', 'Indoor']
})

# Find rows in df1 where 'ID' matches with 'ID' in df2
matching_rows = df1[df1['ID'].isin(df2['ID'])]
print(matching_rows)

Output
   ID Flower
2   3  Tulip
3   4  Daisy

Here, isin() checks if the ID in df1 matches any value in the ID column of df2, and returns the matching rows from df1.

Find Matching Rows Using query()

query() method is another way to find matching rows in two DataFrames. You can use the in operator within the query to filter rows based on common values between the DataFrames.

Python
import pandas as pd

# Create the first DataFrame: df1
df1 = pd.DataFrame({
    'BookID': [101, 102, 103],
    'Title': ['To Kill a Mockingbird', '1984', 'The Great Gatsby']
})

# Create the second DataFrame: df2
df2 = pd.DataFrame({
    'BookID': [103, 105, 107],
    'Genre': ['Classic', 'Romance', 'Science Fiction']
})

# Find rows in df1 where 'BookID' matches with 'BookID' in df2 using query()
# Convert df2['BookID'] to a list to avoid the ValueError
result = df1.query("BookID in @df2.BookID").merge(
    df2.query("BookID in @df1.BookID"),
    on="BookID",
    how="inner"
)

# Print the result
print(result)

Output
   BookID             Title    Genre
0     103  The Great Gatsby  Classic

In this example, query() is used to filter rows in df1 where the BookID matches any BookID in df2. The results are then merged using merge() to get the full details from both DataFrames.

Using concat() and duplicated() to Find Common Rows

You can concatenate two DataFrames using concat() and then use duplicated() to identify rows that appear in both DataFrames. This method works well when the column names are identical in both DataFrames.

Python
import pandas as pd

# Create the first DataFrame: df1
df1 = pd.DataFrame({
    'A': [1, 2, 3, 4],
    'B': ['a', 'b', 'c', 'd']
})

# Create the second DataFrame: df2
df2 = pd.DataFrame({
    'A': [3, 4, 5, 6],
    'B': ['c', 'd', 'e', 'f']
})

# Concatenate the two DataFrames
concatenated_df = pd.concat([df1, df2])

# Find duplicated rows (i.e., rows that appear in both DataFrames)
matching_rows = concatenated_df[concatenated_df.duplicated(keep=False)].drop_duplicates()

print(matching_rows)

Output:

Screenshot-2024-12-15-233549

Here, concat() merges the two DataFrames, and duplicated() identifies rows that appear in both, which are then filtered using drop_duplicates() to return the matching rows.


Next Article

Similar Reads