0% found this document useful (0 votes)

65 views

Panda Joins

1. There are three main types of join functions in Pandas: merge(), join(), and concat(). Merge() allows more flexible joins by columns or indices, while concat() is less structured and join() combines on indices only. 2. The document explains four types of pandas joins - inner, outer, left, and right - and provides examples of each using the merge() function and data from Facebook and Meta. 3. An inner join returns the intersection of rows from two data frames. An outer join returns all rows and fills unmatched rows with NA, while left and right joins return all rows from the left or right data frame respectively and fill unmatched rows with NA.

Uploaded by

Tanishi Gupta

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

65 views

Panda Joins

Uploaded by

Tanishi Gupta

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 25

Types of Pandas Joins and How to

use them in Python

Here are different types of pandas joins and how to use them in Python. Inner,
Outer, Right, and Left joins are explained with examples from Amazon and Meta.

Pandas is an important Python tool to do data analysis and manipulation. Its typical use is
working with data frames while analyzing and manipulating data. While working on different data
frames, you can combine them using three different functions in four different ways.

How many Methods for Joining Data Frames are there in

Pandas?
There are three types of join functions in Python.

1. Merge: It merges two data frames on columns or indices.

2. Join: The join() function will join two data frames together on a key column or index.
3. Concat: The concat() function bonds two data frames across the rows or columns.

It sounds rather similar, so what is the difference between these three approaches?

Merge() allows you to perform more flexible table joins because it provides you more
combinations, yet concat() is less structured. Join() combines data frames on the index but
not on columns, yet merge() gives you a chance to specify the column you want to join on.

In our examples, we will use merge() to show you how different types of joins in python work.

What’s the Point of Joins in Pandas?

In simple terms, pandas joins in python are used to combine two data frames. When doing that,
you have to specify the type of join. The defined pandas join type specifies how the data frames
will be joined.

Now, let’s look at the types of python pandas joins with the merge function.
Types of Pandas Joins in Python?
There are four main types of pandas joins in python, which we will explain in this article.
● Inner
● Outer
● Left
● Right

Here is the official page of the Python Pandas “merge” function, which we will use to join two
data frames.

Now, let’s get started with Inner Join.

Inner Join in Python

What is Inner Join in Python?

Inner join in pandas is used to merge two data frames at the intersection. It returns the mutual
rows of both.
Inner Join Example
Now, let’s look at the example from the platform. Our Inner Join question from the
Meta(Facebook).

Meta developed a new programming language called Hack. To measure its popularity, they ran
a survey with their employees. Due to an error location, data was not collected, but your
supervisor demands a report showing the average popularity of Hach by office location.
Now the aim is to find the average popularity of the Hack per office location.
Output has to contain the location along with the average popularity.

Link to the question: https://platform.stratascratch.com/coding/10061-popularity-of-hack

Data
We have two data frames. Our first data frame is facebook_employees. The table has the
following columns.
The data preview is shown below.

The second data frame is facebook_hack_survey, and it has the following columns.

Also, the data preview is shown below.

Solution Approach
1. Let’s load the libraries.
2. Now, we have to merge two data frames to find the popularity of the location.
3. Since the question wants us to show popularity and location, we will group by two
columns, and then we will use the mean() function to find the average and
reset_index() to remove indexes that the groupby() function creates.

Coding
1. Let’s import the NumPy and Pandas libraries first to manipulate the data and use the
statistical methods with it.

import pandas as pd
import numpy as np

If you want to know how to import pandas as pd in python and its importance for doing data
science, check out our article “How to Import Pandas as pd in Python”.

2. Now, question asks us to return to a location with popularity.

We have the location in the first data frame and the popularity in our second data frame. So to
draw popularity and location together, let’s merge two data frames using the inner join on id.
The age and gender columns are in common, yet the id column has a different name in both
data frames. That's why we matched the left_on argument with id and the right_on
argument with employee_id.

We want to find the popularity of the Hack per office location. So the location and the popularity
should match, that’s why we need the intersection, so we will use inner join.
Selecting the right python join type is crucial to get the correct answer. In this case, the left and
inner join will return the same result. They will both return 14 rows, which are the commons of
both tables.

Yet, the right join will return the whole right data frame, which contains 17 rows, and for the rest,
there will be NA assigned on the left data frame.

Below is the info table of three data frames to see the information of the rows of the first, the
second, and the merged data frames.
OK, let’s get back to writing the answer using the inner join.
import pandas as pd
import numpy as np
merged = pd.merge(facebook_employees,facebook_hack_survey, left_on = 'id',
right_on = 'employee_id', how = 'inner')

3. The question wants us to return the average popularity based on the location, so let’s
use the groupby() function with mean() and reset indexes that groupby() creates
using the reset_index() method.

import pandas as pd
import numpy as np

merged = pd.merge(facebook_employees,facebook_hack_survey, left_on = 'id',

right_on = 'employee_id', how = 'inner')
result = merged.groupby(['location'])['popularity'].mean().reset_index()

Output
Here is the output, the average popularity based on the locations.

By the way, if you want to learn more about Pandas, here are Pandas Interview Questions for
Data Science.
Outer Join in Python

What is Outer Join in Python?

It might also be called Full Outer Join and returns all rows from both DataFrames. It will match
all rows that exist in both data frames. The rows found only in one of the two DataFrames will
get the NA value.

Outer Join Example

Now, let’s look at an example of Outer join from the platform.

An app has product features to help guide users through a marketing funnel. Each funnel has
steps as a guide to complete the funnel. Meta asks us to find the average percentage of
completion for each feature.
Link to the question: https://platform.stratascratch.com/coding/9792-user-feature-completion

Data
We have two data frames. Our first data frame is facebook_product_features. The data frame
has the following columns.

Also, the data preview is shown below.

The second data frame is facebook_product_features_realizations with the following

columns.
The data preview is shown below.

Solution Approach
1. Load the pandas library.
2. Group by the feature_id and user id, and calculate the max step reached.
3. Merge two data frames on feature_id using the outer join and fill NAs with zero.
4. Calculate the share of completion by dividing the step reached with n_step times 100 to
find the percentage.
5. Group the data frame by feature_id and select the share of completion, calculate the
mean, reset the index, and save the results to frame.

Coding
1. Let’s import the pandas library first to manipulate the data.

import pandas as pd
2. Now here is the time to find the maximum step by grouping by the feature_id and
user_id first. Then select the step reached and use the max() function afterward. After
that, we will reset the index that the groupby() function creates.

import pandas as pd

max_step = facebook_product_features_realizations.groupby(["feature_id",
"user_id"])[
"step_reached"].max().reset_index()

3. Next, we have to calculate the share of completion. We will divide the step reached by
n_steps and multiply by 100. So we have to select n_steps from the first data frame and
step_reached from the second data frame. We will combine them on feature_id using
the outer join because we need all values from both data frames to do the math. The
non-matching values will be NA, so we will replace these values with zero after merging.

import pandas as pd

4. At this stage, we will calculate the share of completion by dividing the step reached by
the number of steps and multiplying by 100.

import pandas as pd

max_step = facebook_product_features_realizations.groupby(["feature_id",
"user_id"])[
"step_reached"].max().reset_index()
df = pd.merge(facebook_product_features, max_step, how='outer',
on='feature_id').fillna(0)
df["share_of_completion"] = (df["step_reached"] / df["n_steps"])*100
5. Now, we will group by the data frame by feature_id, select the share of completion and
calculate the mean. Then we will assign these results to the column
avg_share_of_completion and reset the index.

import pandas as pd

max_step = facebook_product_features_realizations.groupby(["feature_id",
"user_id"])[
"step_reached"].max().reset_index()
df = pd.merge(facebook_product_features, max_step, how='outer',
on='feature_id').fillna(0)
df["share_of_completion"] = (df["step_reached"] / df["n_steps"])*100
result =
df.groupby("feature_id")["share_of_completion"].mean().to_frame("avg_share_
of_completion").reset_index()

Output
Here is the expected output.

If you want to enhance your Python skills, here are Python Interview Questions and Answers.
Left Outer Join in Python

What is Left Outer Join in Python?

The left outer join returns all rows from the left data frame, which will be merged with the
corresponding rows from the right data frame. It fills the unmatched rows with NA.

Left Outer Join Example

This time, our coding question is from Amazon.

Amazon asks us to sort records based on the customer's first name and the order details in
ascending order.
Link to the question: https://platform.stratascratch.com/coding/9891-customer-details

Data
We have two data frames. The first data frame is customers. The data frame has the following
columns.

The data preview is shown below.

Our second data frame is orders. The data frame has the following columns.

Also, the data preview is shown below.

Solution Approach
1. Let’s load the libraries.
2. Merge the data frames from the left on id and cust_id.
3. Sort values by the first names and order details and show first_name, last_name,
city, and order_details in the output.

Coding
1. Now first, let’s import pandas and NumPy libraries.

import pandas as pd
import numpy as np

2. Here, we will merge both data frames using the left join because the output should
contain the sorted records based on the customer's first name and the order details. We
need the list of all customers.

import pandas as pd
import numpy as np

merged = pd.merge(customers, orders, left_on = 'id', right_on = 'cust_id',

how = 'left')

3. It is time to sort values according to the first name and order details and select the first
name, last name, city, and order details.

import pandas as pd
import numpy as np
merged = pd.merge(customers, orders, left_on = 'id', right_on = 'cust_id',
how = 'left')
result =
merged[['first_name','last_name','city','order_details']].sort_values(['fir
st_name','order_details'])

Output
Here is the expected output.
Right Outer Join in Python

What is Right Outer Join in Python?

The right outer join returns all rows from the right data frame and the remaining data from the
left. The data which does not correspond to the right data frame will have NAs assigned.

Right Outer Join Example

This question is from Amazon.

Amazon asks us to find the number of customers without an order.

Link to the question:
https://platform.stratascratch.com/coding/10089-find-the-number-of-customers-without-an-order

Data
We have two data frames.The orders data frame has the following columns.

Here is the preview of the data.

The second data frame is customers. The data frame has the following columns.
Also, here is the preview.

Solution Approach
1. First, let’s load the libraries.
2. Merge two data frames from the right.
3. Find the customers without an order by using the is_null() method.
4. Find the number of customers without an order by using the len() method.

Coding
1. Let’s import the NumPy and pandas libraries first to manipulate the data and use the
statistical methods with it.

import pandas as pd
import numpy as np
2. Now, we will merge these two data frames from the right to find the number of customers
without an order. After merging two data frames from the right, the customer's order data
column will be null if there’s no order. And we can find the customers who haven’t had
any orders in the next step.

import pandas as pd
import numpy as np

merged =
pd.merge(orders,customers,left_on='cust_id',right_on='id',how='right')

3. Here we will use the isnull() function to find the customer ids that don't have any
orders.

import pandas as pd
import numpy as np

merged =
pd.merge(orders,customers,left_on='cust_id',right_on='id',how='right')
null_cust = merged[merged['cust_id'].isnull()]

4. By using the len() function, we find the number of customers that have not had any
orders.

import pandas as pd
import numpy as np

merged =
pd.merge(orders,customers,left_on='cust_id',right_on='id',how='right')
null_cust = merged[merged['cust_id'].isnull()]
result = len(null_cust)

Output
Here is the expected output.
If you want to discover the join in SQL too, here are Different Types of SQL JOINs.

Conclusion
In this article, you learned about four pandas joins in python through the interview questions by
the companies like Meta and Amazon. These questions showed you how to use the joins in
python and, more specifically, where to use them while doing data manipulation step by step.

Practicing similar interview questions will keep you ready for interviews. You should turn it into a
habit. So join the StrataScratch community and sign up today to help us find your dream job.

Questions and Answers On IP Addressing
100% (2)
Questions and Answers On IP Addressing
13 pages
AI For USA
No ratings yet
AI For USA
19 pages
Python Cheat Sheet: Pandas - Numpy - Sklearn Matplotlib - Seaborn BS4 - Selenium - Scrapy
100% (3)
Python Cheat Sheet: Pandas - Numpy - Sklearn Matplotlib - Seaborn BS4 - Selenium - Scrapy
9 pages
Python Cheat Sheet: Pandas - Numpy - Sklearn Matplotlib - Seaborn BS4 - Selenium - Scrapy
100% (4)
Python Cheat Sheet: Pandas - Numpy - Sklearn Matplotlib - Seaborn BS4 - Selenium - Scrapy
11 pages
Chapter 2 Python Pandas - II
No ratings yet
Chapter 2 Python Pandas - II
19 pages
IV Unit Fds
No ratings yet
IV Unit Fds
16 pages
4th Unit Answer Bank
No ratings yet
4th Unit Answer Bank
40 pages
Pandas
No ratings yet
Pandas
94 pages
Praveen PPT
No ratings yet
Praveen PPT
9 pages
Python Interview Questions 1653100147
No ratings yet
Python Interview Questions 1653100147
24 pages
Pandas Cheat Sheet
100% (2)
Pandas Cheat Sheet
6 pages
Python For DS Unit4
No ratings yet
Python For DS Unit4
11 pages
Pandas Cheat Sheet
100% (1)
Pandas Cheat Sheet
2 pages
Chapter-2 Python Pandas
100% (2)
Chapter-2 Python Pandas
33 pages
Python Libraries Cheat Sheets
No ratings yet
Python Libraries Cheat Sheets
6 pages
UNIT IV Material
No ratings yet
UNIT IV Material
23 pages
Introduction To Pandas in Data Analytics
No ratings yet
Introduction To Pandas in Data Analytics
12 pages
Pandas
No ratings yet
Pandas
13 pages
Unit 4 DSE
No ratings yet
Unit 4 DSE
9 pages
Pandas Cheat Sheet - Python For Data Science
No ratings yet
Pandas Cheat Sheet - Python For Data Science
5 pages
python 2.1.3 (2)
No ratings yet
python 2.1.3 (2)
6 pages
UU Python Training Session 4 2022 03 01 v02
No ratings yet
UU Python Training Session 4 2022 03 01 v02
22 pages
Pandas - Dataframe - Merging or Joining
No ratings yet
Pandas - Dataframe - Merging or Joining
29 pages
Pandas Cheat Sheet
No ratings yet
Pandas Cheat Sheet
2 pages
python interviews
No ratings yet
python interviews
154 pages
Pandas Cheat Sheet CN
No ratings yet
Pandas Cheat Sheet CN
4 pages
Pandas Cheat Sheet
100% (4)
Pandas Cheat Sheet
2 pages
Pandas Cheat Sheet
83% (12)
Pandas Cheat Sheet
2 pages
OOM Unit 2
No ratings yet
OOM Unit 2
145 pages
Combining Datasets
No ratings yet
Combining Datasets
36 pages
Combining Data in Pandas With Merge, .Join, and Concat - Real Python
No ratings yet
Combining Data in Pandas With Merge, .Join, and Concat - Real Python
2 pages
Unit 4 Fod
100% (1)
Unit 4 Fod
21 pages
Python Lecture 5 (2025)
No ratings yet
Python Lecture 5 (2025)
29 pages
Data Wrangling With Python and Pandas
No ratings yet
Data Wrangling With Python and Pandas
7 pages
Lecture 14
No ratings yet
Lecture 14
33 pages
dataframing_in_csv
No ratings yet
dataframing_in_csv
14 pages
3Y3Z2Xzqn7 U Y%K : 2. How To Create A Data Frame Using A Dictionary of Pre-Existing Columns or Numpy 2D Arrays?
No ratings yet
3Y3Z2Xzqn7 U Y%K : 2. How To Create A Data Frame Using A Dictionary of Pre-Existing Columns or Numpy 2D Arrays?
8 pages
Python CheatSheet
No ratings yet
Python CheatSheet
2 pages
EXP-3
No ratings yet
EXP-3
10 pages
Loki Temp PPT Pandas 2
No ratings yet
Loki Temp PPT Pandas 2
31 pages
MCQ
No ratings yet
MCQ
8 pages
Mastering Data Analyst Interview Scenarios
No ratings yet
Mastering Data Analyst Interview Scenarios
20 pages
EDP-3[1]
No ratings yet
EDP-3[1]
16 pages
Python Cheat Sheet For Excel Users
100% (2)
Python Cheat Sheet For Excel Users
5 pages
Python CSBS Bhavya Lab Manual
No ratings yet
Python CSBS Bhavya Lab Manual
14 pages
Data Wrangling and Analysis
100% (1)
Data Wrangling and Analysis
36 pages
panda.ipynb - Colab
No ratings yet
panda.ipynb - Colab
1 page
Pandas+With+Python+ +DATAhill+Solutions
No ratings yet
Pandas+With+Python+ +DATAhill+Solutions
24 pages
Content Pandas Cheat Sheet
No ratings yet
Content Pandas Cheat Sheet
9 pages
What is pandas
No ratings yet
What is pandas
9 pages
Part A Assignment_No_1
No ratings yet
Part A Assignment_No_1
7 pages
01-Numpy & Pandas
No ratings yet
01-Numpy & Pandas
69 pages
Optimize Python Pandas 1713973016
No ratings yet
Optimize Python Pandas 1713973016
6 pages
Pandas
No ratings yet
Pandas
9 pages
Pandas - Powerful Python Data Analysis Toolkit
No ratings yet
Pandas - Powerful Python Data Analysis Toolkit
95 pages
Pandas Cheat Sheet Final
No ratings yet
Pandas Cheat Sheet Final
1 page
99c949c0-5910-425f-9ac5-155882800fa5
No ratings yet
99c949c0-5910-425f-9ac5-155882800fa5
36 pages
Chapter 2 - Python Pandas II
No ratings yet
Chapter 2 - Python Pandas II
71 pages
Murali Internship
No ratings yet
Murali Internship
34 pages
CO3_3_Indexing and Sorting, Loading Data From CSV
No ratings yet
CO3_3_Indexing and Sorting, Loading Data From CSV
29 pages
Python: Advanced Guide to Programming Code with Python
From Everand
Python: Advanced Guide to Programming Code with Python
Charlie Masterson
No ratings yet
Python: Advanced Guide to Programming Code with Python: Python Computer Programming, #4
From Everand
Python: Advanced Guide to Programming Code with Python: Python Computer Programming, #4
Charlie Masterson
No ratings yet
50 Excel Interview Questions 1685459119
No ratings yet
50 Excel Interview Questions 1685459119
7 pages
Machine Learning Notes 1686281543
No ratings yet
Machine Learning Notes 1686281543
113 pages
Outlier Detection
No ratings yet
Outlier Detection
41 pages
Let's Solve A Consulting Case
100% (1)
Let's Solve A Consulting Case
49 pages
2023 Fast - Dacc
No ratings yet
2023 Fast - Dacc
2 pages
Ex 2 - Data Preprocessing and Exploration - Fortune 500 Dataset With Comments
No ratings yet
Ex 2 - Data Preprocessing and Exploration - Fortune 500 Dataset With Comments
17 pages
12 Practical Q and Answer
No ratings yet
12 Practical Q and Answer
13 pages
ITU Regional Global Key ICT Indicator Aggregates Nov 2022 Revised 15feb2023
No ratings yet
ITU Regional Global Key ICT Indicator Aggregates Nov 2022 Revised 15feb2023
35 pages
Iris Recognition Thesis PDF
100% (3)
Iris Recognition Thesis PDF
6 pages
Dataset For Practice - Session 1 - RFM
No ratings yet
Dataset For Practice - Session 1 - RFM
2,338 pages
AWUS036NHR User Guide
No ratings yet
AWUS036NHR User Guide
24 pages
Online Marks Entry - 17 Internal PDF
No ratings yet
Online Marks Entry - 17 Internal PDF
2 pages
05) Assignment Digital Literacy
No ratings yet
05) Assignment Digital Literacy
3 pages
Asus - A6j Schematic PDF
No ratings yet
Asus - A6j Schematic PDF
63 pages
1.3.1.6 Lab Threat Identification
100% (1)
1.3.1.6 Lab Threat Identification
2 pages
Es6 Cheatsheet PDF
100% (1)
Es6 Cheatsheet PDF
3 pages
GitHub - Avast-Tl - Pelib - PE File Manipulation Library
No ratings yet
GitHub - Avast-Tl - Pelib - PE File Manipulation Library
2 pages
Assignment No.1
No ratings yet
Assignment No.1
31 pages
HPE Networking Comware Switch Series 5520 HI-PSN1013625618ILEN
No ratings yet
HPE Networking Comware Switch Series 5520 HI-PSN1013625618ILEN
4 pages
Whatsapp Onboarding - Doc For Client Help
No ratings yet
Whatsapp Onboarding - Doc For Client Help
4 pages
Aws Global Infrastructure Slides
No ratings yet
Aws Global Infrastructure Slides
30 pages
Will Alvarez - CV - UI UX
No ratings yet
Will Alvarez - CV - UI UX
3 pages
CS Quiz 1 2 3 5 6
No ratings yet
CS Quiz 1 2 3 5 6
86 pages
Xsens DOT CE Test Reports
No ratings yet
Xsens DOT CE Test Reports
27 pages
April, 2007 Fundamental IT Engineer Examination (Afternoon)
No ratings yet
April, 2007 Fundamental IT Engineer Examination (Afternoon)
58 pages
Modeling Data
No ratings yet
Modeling Data
24 pages
Bridge Course 1 For Computer Fundamentals
No ratings yet
Bridge Course 1 For Computer Fundamentals
7 pages
Total Quality Management: BITS Pilani
No ratings yet
Total Quality Management: BITS Pilani
23 pages
McAfee KnowledgeBase - How To Manually Remove VirusScan Enterprise 8 PDF
No ratings yet
McAfee KnowledgeBase - How To Manually Remove VirusScan Enterprise 8 PDF
11 pages
Beckhoff Twincat 3 Basics
No ratings yet
Beckhoff Twincat 3 Basics
69 pages
CyberSec Review3 Team10
No ratings yet
CyberSec Review3 Team10
28 pages
CD Lab 1
No ratings yet
CD Lab 1
8 pages
(Ebook) Verilog Digital System Design by Zainalabedin Navabi ISBN 9780071445641, 9780071588928, 0071445641, 0071588922 instant download
No ratings yet
(Ebook) Verilog Digital System Design by Zainalabedin Navabi ISBN 9780071445641, 9780071588928, 0071445641, 0071588922 instant download
59 pages

Panda Joins

Uploaded by

Panda Joins

Uploaded by

Types of Pandas Joins and How to

use them in Python

How many Methods for Joining Data Frames are there in

1. Merge: It merges two data frames on columns or indices.

What’s the Point of Joins in Pandas?

Now, let’s get started with Inner Join.

Inner Join in Python

What is Inner Join in Python?

Link to the question: https://platform.stratascratch.com/coding/10061-popularity-of-hack

Also, the data preview is shown below.

2. Now, question asks us to return to a location with popularity.

merged = pd.merge(facebook_employees,facebook_hack_survey, left_on = 'id',

What is Outer Join in Python?

Outer Join Example

Also, the data preview is shown below.

The second data frame is facebook_product_features_realizations with the following

What is Left Outer Join in Python?

Left Outer Join Example

The data preview is shown below.

Also, the data preview is shown below.

merged = pd.merge(customers, orders, left_on = 'id', right_on = 'cust_id',

What is Right Outer Join in Python?

Right Outer Join Example

Amazon asks us to find the number of customers without an order.

Here is the preview of the data.

You might also like