How to Find Matching Rows in Two Pandas DataFrames

Last Updated : 18 Dec, 2024

Let's learn how to find matching rows in two dataframes using Pandas.

Find Matching Rows Using merge()

merge() function is one of the most commonly used methods for finding matching rows in two DataFrames. It performs a SQL-style inner join, returning rows where matching values exist in both DataFrames.

Python

import pandas as pd

# Sample DataFrames
df1 = pd.DataFrame({
    'A': [1, 2, 3, 4],
    'B': ['a', 'b', 'c', 'd']
})

df2 = pd.DataFrame({
    'A': [3, 4, 5, 6],
    'B': ['c', 'd', 'e', 'f']
})

# Merge dataframes on columns 'A' and 'B' (default is inner join)
matching_rows = pd.merge(df1, df2, on=['A', 'B'], how='inner')
print(matching_rows)

Output

   A  B
0  3  c
1  4  d

The merge() function returns only the rows with matching values in both DataFrames, as shown in the output.

Using `merge()` with a Common Column

You can also use merge() to find matching rows based on a specific column, such as an ID column. The following example demonstrates an inner join between two DataFrames using the ID column.

Python

import pandas as pd

# Sample DataFrames
df1 = pd.DataFrame({
    'ID': [1, 2, 3, 4],
    'Name': ['John', 'Alice', 'Bob', 'Eve']
})

df2 = pd.DataFrame({
    'ID': [2, 3, 4, 5],
    'Age': [25, 30, 22, 29]
})

# Perform inner join on the 'ID' column
result = pd.merge(df1, df2, on='ID', how='inner')
print(result)

Output

   ID   Name  Age
0   2  Alice   25
1   3    Bob   30
2   4    Eve   22

The merge() function matches rows based on the ID column and returns the rows where the ID values are common in both DataFrames.

Find Matching Rows Using `isin()`

The isin() method allows you to filter rows in one DataFrame based on whether the values exist in another DataFrame. This method is particularly useful when you want to check for matches in a specific column.

Python

import pandas as pd

# Create the first DataFrame: df1
df1 = pd.DataFrame({
    'ID': [1, 2, 3, 4],
    'Flower': ['Rose', 'Lily', 'Tulip', 'Daisy']
})

# Create the second DataFrame: df2
df2 = pd.DataFrame({
    'ID': [3, 4, 7],
    'Category': ['Outdoor', 'Outdoor', 'Indoor']
})

# Find rows in df1 where 'ID' matches with 'ID' in df2
matching_rows = df1[df1['ID'].isin(df2['ID'])]
print(matching_rows)

Output

   ID Flower
2   3  Tulip
3   4  Daisy

Here, isin() checks if the ID in df1 matches any value in the ID column of df2, and returns the matching rows from df1.

Find Matching Rows Using `query()`

query() method is another way to find matching rows in two DataFrames. You can use the in operator within the query to filter rows based on common values between the DataFrames.

Python

import pandas as pd

# Create the first DataFrame: df1
df1 = pd.DataFrame({
    'BookID': [101, 102, 103],
    'Title': ['To Kill a Mockingbird', '1984', 'The Great Gatsby']
})

# Create the second DataFrame: df2
df2 = pd.DataFrame({
    'BookID': [103, 105, 107],
    'Genre': ['Classic', 'Romance', 'Science Fiction']
})

# Find rows in df1 where 'BookID' matches with 'BookID' in df2 using query()
# Convert df2['BookID'] to a list to avoid the ValueError
result = df1.query("BookID in @df2.BookID").merge(
    df2.query("BookID in @df1.BookID"),
    on="BookID",
    how="inner"
)

# Print the result
print(result)

Output

   BookID             Title    Genre
0     103  The Great Gatsby  Classic

In this example, query() is used to filter rows in df1 where the BookID matches any BookID in df2. The results are then merged using merge() to get the full details from both DataFrames.

Using concat() and duplicated() to Find Common Rows

You can concatenate two DataFrames using concat() and then use duplicated() to identify rows that appear in both DataFrames. This method works well when the column names are identical in both DataFrames.

Python

import pandas as pd

# Create the first DataFrame: df1
df1 = pd.DataFrame({
    'A': [1, 2, 3, 4],
    'B': ['a', 'b', 'c', 'd']
})

# Create the second DataFrame: df2
df2 = pd.DataFrame({
    'A': [3, 4, 5, 6],
    'B': ['c', 'd', 'e', 'f']
})

# Concatenate the two DataFrames
concatenated_df = pd.concat([df1, df2])

# Find duplicated rows (i.e., rows that appear in both DataFrames)
matching_rows = concatenated_df[concatenated_df.duplicated(keep=False)].drop_duplicates()

print(matching_rows)

Output:

Here, concat() merges the two DataFrames, and duplicated() identifies rows that appear in both, which are then filtered using drop_duplicates() to return the matching rows.

How to Find Matching Rows in Two Pandas DataFrames

baidehi1874

Improve

Article Tags :

How to Find Matching Rows in Two Pandas DataFrames

Find Matching Rows Using merge()

Using merge() with a Common Column

Find Matching Rows Using isin()

Find Matching Rows Using query()

Using concat() and duplicated() to Find Common Rows

Similar Reads

Thank You!

What kind of Experience do you want to share?

Using `merge()` with a Common Column

Find Matching Rows Using `isin()`

Find Matching Rows Using `query()`