How to join datasets with same columns and select one using Pandas?
Last Updated :
18 Mar, 2022
It is usual that in Data manipulation operations, as the data comes from different sources, there might be a need to join two datasets to one. In this article, let us discuss how to join datasets with the same columns in python.
Python has a package called pandas that provides a function called concat that helps us to join two datasets as one.
Syntax:
pandas.concat(objs, axis=0, join='outer', ignore_index=False, keys=None)
Parameters:
- objs -sequence of Series or DataFrame objects
- axis - 0 - index’, 1 - columns
- join - inner, default - outer
- ignore_index - bool, default False. If True the dataframe must not contain an index
The datasets used for demonstration can be downloaded here data_1 and data_2.
Example:
Here, we have made the ignore_index as False, which means, the concat function will ignore the original index of the individual datasets and create a new index.
Python3
import pandas as pd
# read the datasets
df1 = pd.read_csv(r"your_path/data_1.csv")
df2 = pd.read_csv(r"your_path/data_2.csv")
# print the datasets
print(df1.head())
print(df2.head())
concat_data = pd.concat([df1, df2], ignore_index=True)
print(concat_data)
Output:

Pandas provide a single function, merge(), as the entry point for all standard database join operations between DataFrame objects. There are four basic ways to handle the join (inner, left, right, and outer), depending on which rows must retain their data.

Syntax: pandas.merge(left, right, how)
Parameters:
- left - dataframe (left reference)
- right - dataframe (right reference)
- how - {‘left’, ‘right’, ‘outer’, ‘inner’, ‘cross’}, default ‘inner’
Example:
Similar to the concat() function is the merge() function where we can join datasets with the same columns. In the merge function, we can pass the datasets and use the Outer join mode to join the datasets with the same columns as shown,
Python3
merge_data = pd.merge(df1, df2, how='outer')
print(merge_data)
Output:
Similar Reads
How to Select Rows & Columns by Name or Index in Pandas Dataframe - Using loc and iloc When working with labeled data or referencing specific positions in a DataFrame, selecting specific rows and columns from Pandas DataFrame is important. In this article, weâll focus on pandas functionsâloc and ilocâthat allow you to select rows and columns either by their labels (names) or their int
4 min read
How to Select Single Column of a Pandas Dataframe In Pandas, a DataFrame is like a table with rows and columns. Sometimes, we need to extract a single column to analyze or modify specific data. This helps in tasks like filtering, calculations or visualizations. When we select a column, it becomes a Pandas Series, a one-dimensional data structure th
2 min read
How to select multiple columns in a pandas dataframe Python is a great language for doing data analysis, primarily because of the fantastic ecosystem of data-centric Python packages. Pandas is one of those packages and makes importing and analyzing data much easier. In this article, we will discuss all the different ways of selecting multiple columns
5 min read
Select a single column of data as a Series in Pandas In this article, we will discuss how to select a single column of data as a Series in Pandas. For example, Suppose we have a data frame : Name Age MotherTongue Akash 21 Hindi Ashish 23 Marathi Diksha 21 Bhojpuri Radhika 20 Nepali Ayush 21 Punjabi Now when we select column Mother Tongue as a Series w
1 min read
How to merge two csv files by specific column using Pandas in Python? In this article, we are going to discuss how to merge two CSV files there is a function in pandas library pandas.merge(). Merging means nothing but combining two datasets together into one based on common attributes or column. Syntax: pandas.merge() Parameters : data1, data2: Dataframes used for mer
2 min read