Open In App

How to Convert to Best Data Types Automatically in Pandas?

Last Updated : 03 Dec, 2024
Comments
Improve
Suggest changes
Like Article
Like
Report

Let’s learn how to automatically convert columns to the best data types in a Pandas DataFrame using the convert_dtypes() method.

Convert Data Type of a Pandas Series using convert_dtypes() Function

To convert the data type of a pandas series, simply use the following syntax:

Syntax: series_name.convert_dtypes()

Let’s consider the following example:

Python
import pandas as pd

# Creating a sample Series
s = pd.Series(['Geeks', 'for', 'Geeks'])

# Before using convert_dtypes()
print("Original Series:")
print(s)

# Automatically converting data types
print("\nAfter convert_dtypes:")
print(s.convert_dtypes())

Output
Original Series:
0    Geeks
1      for
2    Geeks
dtype: object

After convert_dtypes:
0    Geeks
1      for
2    Geeks
dtype: string

Here, the object data type is converted to the more optimized string type, making it more memory-efficient.

convert_dtypes() is a pandas function introduced in version 1.1.4 that allows automatic conversion of DataFrame and Series columns to the most appropriate data types. This function helps pandas intelligently adjust data types to optimize memory usage, reduce processing time, and enhance the performance.

Convert Data Types in a Pandas DataFrame

You can apply convert_dtypes() to Pandas DataFrame using the following syntax:

dataframe_name.convert_dtypes().dtypes

Let’s consider the following example:

Python
import pandas as pd
import numpy as np

# Creating a sample DataFrame
df = pd.DataFrame({
    "Roll_No.": [1, 2, 3],
    "Name": ["Raj", "Ritu", "Rohan"],
    "Result": ["Pass", "Fail", np.nan],
    "Promoted": [True, False, np.nan],
    "Marks": [90.33, 30.6, np.nan]
})

# Before using convert_dtypes()
print("Original DataFrame:")
display(df)

# Checking the data types before conversion
print("\nData Types Before Conversion:")
print(df.dtypes)

# Automatically converting data types
print("\nData Types After Conversion:")
print(df.convert_dtypes().dtypes)

Output:

Automatically-convert-data-types-in-Pandas-DataFrame

Converted Data Types Using the convert_dtypes() Function

As shown, convert_dtypes() optimizes the column data types:

  • The Name and Result columns are converted to the string type.
  • The Promoted column is converted to the boolean type.
  • The Roll_No. column is converted to int32 to optimize memory usage.

Creating a DataFrame with Explicit Data Types

You can also create a DataFrame with specified data types and use convert_dtypes() to further optimize the columns.

Python
import pandas as pd
import numpy as np

# Creating a DataFrame with explicit data types for each column
df = pd.DataFrame({
    "Column_1": pd.Series([1, 2, 3], dtype=np.dtype("int32")),
    "Column_2": pd.Series(["Apple", "Ball", "Cat"], dtype=np.dtype("object")),
    "Column_3": pd.Series([True, False, np.nan], dtype=np.dtype("object")),
    "Column_4": pd.Series([10, np.nan, 20], dtype=np.dtype("float")),
    "Column_5": pd.Series([np.nan, 100.5, 200], dtype=np.dtype("float"))
})

# Before using convert_dtypes()
print("Original DataFrame:")
print(df)

# Checking the data types before conversion
print("\nData Types Before Conversion:")
print(df.dtypes)

# Automatically converting data types
print("\nData Types After Conversion:")
print(df.convert_dtypes().dtypes)

Output:

Creating-a-DataFrame-with-Explicit-Data-Types

Creating a DataFrame with Explicit Data Types




Next Article

Similar Reads