What is the difference between join and merge in Pandas?
Last Updated :
09 Apr, 2025
In Pandas, join() combines DataFrames based on their indices and defaults to a left join, while merge() joins on specified columns and defaults to an inner join. Choosing the right method depends on how your data is aligned. To illustrate the difference between join() and merge() visually, Let's understand with help of examples.
Using join() for index- based merging
The join() method is useful when merging DataFrames based on their index values. By default, it performs a left join, meaning all rows from the left DataFrame are retained, and matching rows from the right DataFrame are added based on the index.
Example: In this example, two DataFrames are joined on their indexes. An outer join retains all index values, filling missing ones with NaN.
Python
import pandas as pd
df1 = pd.DataFrame({'A': [1, 2, 3]}, index=['a', 'b', 'c'])
df2 = pd.DataFrame({'B': [4, 5, 6]}, index=['a', 'b', 'd'])
print(df1, "\n")
print(df2, "\n")
# joins on index
res = df1.join(df2, how='outer')
print(res)
Output
Using join()
Explanation: join() method merges df1 and df2 on their indexes. With how='outer', it retains all unique index values, combining common ones and filling missing values with NaN.
For different types of joins, refer to Different Types of Joins in Pandas to explore alternatives like merge() for column-based joins.
Using merge() for coloumn-based merging
The merge() function is more versatile than join() and is commonly used for column-based merging. By default, merge() performs an inner join, which means it returns only the rows with matching values in both DataFrames based on the specified key columns. Unlike join(), which merges based on index values, merge() allows you to explicitly specify which columns to use as keys for merging.
Example 1 : In this example, we are merging two DataFrames based on the common column 'ID'. This allows us to combine matching rows from both DataFrames where the 'ID' values are the same.
Python
df1 = pd.DataFrame({'ID': [1, 2, 3], 'Name': ['Alice', 'Bob', 'Charlie']})
df2 = pd.DataFrame({'ID': [1, 3, 4], 'Score': [90, 85, 80]})
print(df1, "\n")
print(df2, "\n")
# Performing merge on 'ID'
res = pd.merge(df1, df2, on='ID')
print(res)
Output
Using merge()Explanation: merge() function combines df1 and df2 using the common column 'ID'. It performs an inner join, so only rows with matching 'ID' values in both DataFrames are included in the result.
Refer to Joining Two Pandas DataFrames Using merge() to understand the various types of joins available in Pandas.
Example 2: In this example, we are merging two DataFrames using their indexes. Since both DataFrames have a column named 'P', suffixes are used to differentiate the overlapping column names in the merged result.
Python
import pandas as pd
df1 = pd.DataFrame([['a', 'b'], ['d', 'f']], index=['X', 'Y'], columns=['P', 'Q'])
df2 = pd.DataFrame([[1, 2], [4, 5]], index=['X', 'Y'], columns=['P', 'R'])
print(df1, "\n")
print(df2, "\n")
res = df1.merge(df2, left_index=True, right_index=True, suffixes=['_', ''])
print(res)
Output
Using merge()Explanation: merge() function merges df1 and df2 using their shared index labels ('X' and 'Y') by setting left_index=True and right_index=True. Since both have a 'P' column, suffixes=['_', ''] differentiates them as 'P_' (from df1) and 'P' (from df2).
Note: For simple index-based joins, use join(). For more complex merges involving specific columns or multiple keys, use merge().
Difference between join() and merge()
In real-world datasets, combining or enriching data is common. Choosing the right method, join() or merge() ensures accurate relationships, efficient processing and clear code. Understanding their differences helps avoid issues like missing rows, duplicates or incorrect joins.
Feature | join() | merge() |
---|
Merge Basis | Index-based | Column-based (default) |
---|
Default Join Type | Left join | Inner join |
---|
Specify Merge Columns? | No (works on index) | Yes (can specify columns) |
---|
Can Merge on Both Index & Columns? | No | Yes |
---|
Handles Overlapping Columns? | Requires suffix | Allows explicit column selection |
---|
Performance | Generally faster for index-based joins | More flexible but can be slower |
---|
Similar Reads
Python Pandas - Difference between INNER JOIN and LEFT SEMI JOIN
In this article, we see the difference between INNER JOIN and LEFT SEMI JOIN. Inner Join An inner join requires two data set columns to be the same to fetch the common row data values or data from the data table. In simple words, and returns a data frame or values with only those rows in the data fr
3 min read
Difference Between Left Join and Right Join
In DBMS(Database Management System) Join is an operation that combines the row of two or more tables based on related columns between them. The main purpose of Join is to retrieve the data from multiple tables in other words Join is used to perform multi-table queries. So for that purpose, joins com
5 min read
Difference Between Right Join and Right Outer Join
Joins in a Database (SQL) are mostly used for combining data or the rows of two or more table records that are based on the same or common attribute. There are various types of Joins like Right Join, Left Join, Full Join, etc. Each join has its own syntax and data-returning capability. In this artic
5 min read
Difference between Inner Join and Outer Join in SQL
JOINS in SQL are fundamental operations used to combine data from multiple tables based on related columns. They are essential for querying data that is distributed across different tables, allowing you to retrieve and present it as a single or similar result set.In this article, We will learn about
5 min read
Difference Between âINNER JOINâ and âOUTER JOINâ
Are you working with SQL and struggling to understand the differences between INNER JOIN and OUTER JOIN? These two types of joins are among the most commonly used tools for combining tables in SQL queries. Whether you're analyzing student data or managing customer records, knowing when to use INNER
5 min read
Difference Between Left Join and Left Outer Join
In SQL language, different joins are used to assemble rows from two or more tables from the related column. The terms "Left Join" and "Left Outer Join" are used interchangeably in SQL but they refer to the same concept. A Left Join retrieves all records from the left table (the first table in the qu
5 min read
Difference between JOIN and UNION in SQL
Pre-requisites: JOIN, UNION JOIN in SQL is used to combine data from many tables based on a matched condition between them. The data combined using the JOIN statement results in new columns. Consider the two tables: Boys Girls Example: sql> SELECT Boys.Name, Boys.Age, Girls.Address, FROM Boys INN
2 min read
Difference Between Anti-Join and Semi-Join
In the context of SQL, Anti-join, and semi-join are two essential operations in relational databases used for querying and manipulating data. These operations focus on comparing data from two related tables, but they serve distinct purposes. In this article let us discuss these two operations in det
4 min read
Difference between Pandas and PostgreSQL
Pandas: Python supports an in-built library Pandas, to perform data analysis and manipulation is a fast and efficient way. Pandas library handles data available in uni-dimensional arrays, called series, and multi-dimensional arrays called data frames. It provides a large variety of functions and uti
4 min read
Git - Difference Between Merging and Rebasing
When working with Git, two common strategies for integrating changes from different branches are merging and rebasing. Both techniques serve the purpose of combining code from multiple branches, but they do so in different ways. This article will help you understand the differences between merging a
3 min read