How to join datasets with same columns and select one using Pandas?

Last Updated : 18 Mar, 2022

It is usual that in Data manipulation operations, as the data comes from different sources, there might be a need to join two datasets to one. In this article, let us discuss how to join datasets with the same columns in python.

Using Pandas concat()

Python has a package called pandas that provides a function called concat that helps us to join two datasets as one.

Syntax:

pandas.concat(objs, axis=0, join='outer', ignore_index=False, keys=None)

Parameters:

objs -sequence of Series or DataFrame objects
axis - 0 - index’, 1 - columns
join - inner, default - outer
ignore_index - bool, default False. If True the dataframe must not contain an index

The datasets used for demonstration can be downloaded here data_1 and data_2.

Example:

Here, we have made the ignore_index as False, which means, the concat function will ignore the original index of the individual datasets and create a new index.

Python3

import pandas as pd

# read the datasets
df1 = pd.read_csv(r"your_path/data_1.csv")
df2 = pd.read_csv(r"your_path/data_2.csv")

# print the datasets
print(df1.head())
print(df2.head())
concat_data = pd.concat([df1, df2], ignore_index=True)
print(concat_data)

Output:

Using Pandas Merge()

Pandas provide a single function, merge(), as the entry point for all standard database join operations between DataFrame objects. There are four basic ways to handle the join (inner, left, right, and outer), depending on which rows must retain their data.

Syntax: pandas.merge(left, right, how)

Parameters:

left - dataframe (left reference)
right - dataframe (right reference)
how - {‘left’, ‘right’, ‘outer’, ‘inner’, ‘cross’}, default ‘inner’

Example:

Similar to the concat() function is the merge() function where we can join datasets with the same columns. In the merge function, we can pass the datasets and use the Outer join mode to join the datasets with the same columns as shown,