Open In App

How To Concatenate Two or More Pandas DataFrames?

Last Updated : 16 Apr, 2025
Comments
Improve
Suggest changes
Like Article
Like
Report

In real-world data the information is often spread across multiple tables or files. To analyze it properly we need to bring all that data together. This is where the pd.concat() function in Pandas comes as it allows you to combine two or more DataFrames in:

  • Vertically (stacking rows on top of each other)
  • Horizontally (joining columns side by side)
Python
import pandas as pd

df1 = pd.DataFrame({'A': [1, 2], 'B': [3, 4]})
df2 = pd.DataFrame({'A': [5, 6], 'B': [7, 8]})

vertical_concat = pd.concat([df1, df2], axis=0)   
horizontal_concat = pd.concat([df1, df2], axis=1) 
print("Vertical:")
display(vertical_concat)
print("Horizontal:")
display(horizontal_concat)

Output:

_-Concatenate-Two-or-More-Pandas-DataFrames

Dataframe Concatenation

1. Vertical Concatenation (Row-wise)

We can see that the vertically concatenated DataFrame has duplicate index. When axis=0 is used, Pandas stacks the rows one on top of the other but retains the original indices from each DataFrame. This can result in non-sequential indices (0, 1, 0, 1). Preserving index values from both DataFrames.

Python
import pandas as pd

df1 = pd.DataFrame({'A': [1, 2], 'B': [3, 4]})
df2 = pd.DataFrame({'A': [5, 6], 'B': [7, 8]})

vertical_concat = pd.concat([df1, df2], axis=0, ignore_index=True) # concatenating along rows

print("Vertical:")
display(vertical_concat)

Output:

Vertical-ConcatenationRowwise

Vertical Concatenation (Row-wise)

2. Horizontal Concatenation (Column-wise)

With horizontal concatenation (axis=1) the columns are combined side by side and you may see repeated column names like A and B. This horizontal arrangement might not make sense in cases like this discussed above as it can lead to ambiguous columns.

Generally horizontal concatenation is best suited for cases where:

  • The DataFrames have different columns: Each DataFrame has unique column names.
  • You are appending additional features: You want to combine data along different dimensions where each DataFrame represents a different set of features for the same index.
Python
df1 = pd.DataFrame({'A': [1, 2], 'B': [3, 4]})
df2 = pd.DataFrame({'C': [5, 6], 'D': [7, 8]})
horizontal_concat = pd.concat([df1, df2], axis=1)
display(horizontal_concat)

Output:

Horizontal-ConcatenationColumnwise

Horizontal Concatenation (Column-wise)

Using Keys After Concatenating DataFrames

When concatenating DataFrames you can use the keys argument to create a hierarchical index also known as a MultiIndex. This helps you organize and distinguish the data more clearly by assigning a label to each DataFrame being concatenated. The resulting DataFrame will have a multi-level index that helps track the origin of each data point. This is useful when the labels are same or overlapping.

Python
import pandas as pd
df1 = pd.DataFrame({'A': [1, 2], 'B': [3, 4]})
df2 = pd.DataFrame({'A': [5, 6], 'B': [7, 8]})

result = pd.concat([df1, df2], axis=0, keys=['First', 'Second'])
print(result)

Output:

Concatenating_Dataframes

Concatenating dataframes

Handling Missing Values in Concatenated DataFrame

If the DataFrames being concatenated don’t have matching columns or indexes. Pandas will fill in missing values with NaN to maintain the structure of the resulting DataFrame.

Python
import pandas as pd

df1 = pd.DataFrame({'A': [1, 2], 'B': [3, 4]})
df2 = pd.DataFrame({'B': [5, 6], 'C': [7, 8]})

result = pd.concat([df1, df2], axis=0)
print(result)

Output:

Handle_missing_values

Handle missing values

In this example Both df1 and df2 lack column C and column A respectively. Pandas adds NaN values to indicate data unavailability in the respective rows for missing spots. During Using the .fillna() function to replace NaN values with a specific value. This is useful if you have a default value to apply like 0, average or a string.

Python
result_filled = result.fillna(0)
print(result_filled)

Output:

using_fillna

using fillna() function

Here we filled NaN values with 0. This method is widely used as it gives us more control over missing values.

Below are more methods for Handle Missing Values in Concatenated DataFrames.



Next Article

Similar Reads