Python Pandas - DataFrame.copy() function

Last Updated : 28 Nov, 2024

The DataFrame.copy() function in Pandas allows to create a duplicate of a DataFrame. This duplication can be either a deep copy, where the new DataFrame is entirely independent of the original, or a shallow copy, where changes to the original data reflect in the copy. The main takeaway is that copy() helps avoid unintended modifications to the original data. Let’s see with a quick example to show why this function is so essential with syntax:

df_copy = df.copy(deep=True)
deep: A boolean value (True by default) that specifies whether to make a deep or shallow copy.

Python

import pandas as pd

data = {"name": ["Sally", "Mary", "John"], "qualified": [True, False, False]}
df = pd.DataFrame(data)

# Create a deep copy of the DataFrame
df_copy = df.copy()

print("Original DataFrame:")
print(df)
print("\nCopied DataFrame:")
print(df_copy)

In this example, df_copy is a deep copy of df, meaning any changes made to df_copy will not affect df.

Deep Copy vs. Shallow Copy

The copy() function works by duplicating the structure and content of a DataFrame. The parameter can either copy just the "pointers" to the data (shallow copy) or make a completely independent copy of the data and structure (deep copy).

Deep Copy: When deep=True (the default setting), a new DataFrame is created with its own set of data and indices. This means any changes made to the copied DataFrame will not affect the original. This is particularly useful when you want to experiment with data transformations without altering the original dataset.
Shallow Copy: When deep=False, the new DataFrame shares the same data and indices as the original. Thus, changes in one will reflect in the other. While this method is more memory-efficient, it requires caution to avoid unintended side effects.

The significance of using DataFrame.copy() lies in its ability to safeguard original data during analysis or transformation processes. By creating a duplicate that can be modified independently, one can perform operations without risking alterations to their initial dataset.

New Shallow Copy Behavior in Pandas 3.0

Starting from Pandas 3.0, shallow copies behave differently due to a new lazy copy mechanism (also called "copy-on-write"). This can also enabled in earlier versions by setting pd.options.mode.copy_on_write = True.

Lazy Copy Mechanism: Even with deep=False, a shallow copy will no longer directly share data with the original DataFrame. Changes to either the original or the copy will not affect the other. Instead of duplicating data immediately, the copy is created "lazily," and changes trigger the actual duplication behind the scenes.
Backward Compatibility: Before Pandas 3.0, shallow copies (deep=False) shared data between the original and the copy, meaning changes in one reflected in the other. This behavior can be controlled in earlier versions of Pandas as well by enabling lazy copying with:

pd.options.mode.copy_on_write = True

Python Pandas - DataFrame.copy() function

svrrrsvr

Improve

Article Tags :

Practice Tags :

Python Pandas - DataFrame.copy() function

Deep Copy vs. Shallow Copy

New Shallow Copy Behavior in Pandas 3.0

Similar Reads

Thank You!

What kind of Experience do you want to share?