Pandas: Detect Mixed Data Types and Fix it
Last Updated :
06 Oct, 2023
The Python library commonly used for working with data sets and can help users in analyzing, exploring, and manipulating data is known as the Pandas library. When any column of the Pandas data frame doesn't contain a single type of data, either numeric or string, but contains mixed type of data, both numeric as well as string, such column is called a mixed data type column.
What are mixed types in Pandas columns?
As you know, Pandas data frame can have multiple columns, thus when a certain column doesn't have a specified kind of data, i.e., doesn't have a certain data type, but contains mixed data, i.e., numeric as well as string values, then that column is tend to have mixed data type.
For example:
data_frame = pd.DataFrame( [['tom', 10], ['nick', '15'], ['juli', 14.8]], columns=['Name', 'Age'])
Here, the Age column contains string as well as the numeric type of data, the Age column has a mixed data type.
Causes of mixed data types
- Missing Values (NaN)
- Inconsistent Formatting
- Data Entry Errors
Missing Values (NaN):
A floating-point value that represents undefined or unrepresentable data is known as NaN. The most common use case of NaN occurrence is the 0/0 case, which leads to mixed data types and ultimately leads to incorrect results.
Inconsistent Formatting:
The inconsistent formatting in the Pandas data frame is observed due to the cells with wrong format. Thus, it is crucial to transform each cell of column to a correct format.
Data Entry Errors:
There occurs various instances when the user makes a mistake while entering the data in a column in Pandas data frame. It can be any error, entering string data in numeric type column or leaving null value in the column or anything. Such errors can also lead to mixed data types and thus need to be fixed.
How to identify mixed types in Pandas columns
You might have used info() function to detect the data type of Pandas data frame, but using info() function is not possible in case of mixed data types. For detecting the mixed data types, you need to traverse each column of Pandas data frame, and get the data type using api.types.infer_dtypes() function.
Syntax:
for column in data_frame.columns:
print(pd.api.types.infer_dtype(data_frame[column]))
Here,
- data_frame: It is the Pandas data frame for which you want to detect if it has mixed data types or not.
Example:
The data frame used in this example to detect mixed data type is as follows:
Python3
# Python program to detect mixed data types in Pandas data frame
# Import the library Pandas
import pandas as pd
# Create the pandas DataFrame
data_frame = pd.DataFrame( [['tom', 10], ['nick', '15'], ['juli', 14.8]], columns=['Name', 'Age'])
# Traverse data frame to detect mixed data types
for column in data_frame.columns:
print(column,':',pd.api.types.infer_dtype(data_frame[column]))
Output:
Name : string
Age : mixed-integer
How to deal with mixed types in Pandas columns
For fixing the mixed data types in Pandas data frame, you need to convert entire column into one data type. This can be done using astype() function or to_numeric() function.
Using astype() function:
A crucial function in Pandas which is used to cast an object to a specified data type is known as astype() function. In this way, we will see how we can fix mixed data types using astype() function.
Syntax:
data_frame[column] = data_frame[column].astype(int)
Here,
- data_frame: It is the Pandas data frame for which you want to fix mixed data types.
- column: It defines all the columns of the Pandas data frame.
- int: Here, int is the data type in which you want to transform type of each column of Pandas data frame. You can also use str, float, etc. here depending on which data type you want to transform.
Example:
The data frame used in this example to fix mixed data type is as follows:
Python3
# Python program to fix mixed data types using astype() in Pandas data frame
# Import the library Pandas
import pandas as pd
# Create the pandas DataFrame
data_frame = pd.DataFrame( [['tom', 10], ['nick', '15'], ['juli', 14.8]], columns=['Name', 'Age'])
# Transforming mixed data types to single data type
data_frame[column] = data_frame[column].astype(int)
# Traverse data frame to detect data types after fix
for column in data_frame.columns:
print(column,':',pd.api.types.infer_dtype(data_frame[column]))
Output:
Name : string
Age : integer
Using to_numeric() function:
The to_numeric() function is used to convert an argument to a numeric data type. In this way, we will see how we can fix mixed data types using to_numeric() function.
Syntax:
data_frame[column] = data_frame[column].apply(lambda x: pd.to_numeric(x, errors = 'ignore'))
Here,
- data_frame: It is the Pandas data frame for which you want to fix mixed data types.
- column: It defines all the columns of the Pandas data frame.
Example:
The data frame used in this example to fix mixed data type is as follows:
Python3
# Python program to fix mixed data types using to_numeric() in Pandas data frame
# Import the library Pandas
import pandas as pd
# Create the pandas DataFrame
data_frame = pd.DataFrame( [['tom', 10], ['nick', '15'], ['juli', 14.8]], columns=['Name', 'Age'])
# Transforming mixed data types to single data type
data_frame[column] = data_frame[column].apply(lambda x: pd.to_numeric(x, errors = 'ignore'))
# Traverse data frame to detect data types after fix
for column in data_frame.columns:
print(pd.api.types.infer_dtype(data_frame[column]))
Output:
Name : string
Age : floating
Conclusion
Pandas columns with mixed types can cause problems when analyzing data, but they can be found and resolved using the techniques in this article. Data scientists and software developers can guarantee the accuracy and dependability of their analysis by properly cleaning and preparing the data.
Similar Reads
Nullable Integer Data Type in Pandas
The concept of a nullable integer data type in Pandas addresses a common challenge in data handling, managing integer data that may contain missing values. Before the introduction of nullable integer types, missing values in integer arrays were typically handled by upcasting to floating-point types,
4 min read
Python pandas.api.types.is_dict_like() Function
In this article, we will be looking at the insight of the is_dict_like() function from the pandas.api.types package with various examples in a python programming language. is_dict_like is a method that helps to specify whether the given object for the is_dict_like method is a dictionary or not. Syn
2 min read
Change Data Type for one or more columns in Pandas Dataframe
When working with data in Pandas working with right data types for your columns is important for accurate analysis and efficient processing. Pandas offers several simple ways to change or convert the data types of columns in a DataFrame. In this article, we'll look at different methods to help you e
3 min read
Unnest (Explode) Multiple List Columns In A Pandas Dataframe
An open-source manipulation tool that is used for handling data is known as Pandas. Have you ever encountered a dataset that has columns with data as a list? In such cases, there is a necessity to split that column into various columns, as Pandas cannot handle such data. In this article, we will dis
6 min read
Append data to an empty Pandas DataFrame
Let us see how to append data to an empty Pandas DataFrame. Creating the Data Frame and assigning the columns to it python # importing the module import pandas as pd # creating the DataFrame of int and float a = [[1, 1.2], [2, 1.4], [3, 1.5], [4, 1.8]] t = pd.DataFrame(a, columns =["A",
2 min read
Get the data type of column in Pandas - Python
Letâs see how to get data types of columns in the pandas dataframe. First, Letâs create a pandas dataframe. Example: Python3 # importing pandas library import pandas as pd # List of Tuples employees = [ ('Stuti', 28, 'Varanasi', 20000), ('Saumya', 32, 'Delhi', 25000), ('Aaditya', 25, 'Mumbai', 40000
3 min read
Python | Pandas Series/Dataframe.any()
Python is a great language for doing data analysis, primarily because of the fantastic ecosystem of data-centric python packages. Pandas is one of those packages and makes importing and analyzing data much easier. Pandas any() method is applicable both on Series and Dataframe. It checks whether any
3 min read
How to Check the Data Type in Pandas DataFrame?
Pandas DataFrame is a Two-dimensional data structure of mutable size and heterogeneous tabular data. There are different Built-in data types available in Python. Â Two methods used to check the datatypes are pandas.DataFrame.dtypes and pandas.DataFrame.select_dtypes. Creating a Dataframe to Check Dat
2 min read
Get the datatypes of columns of a Pandas DataFrame
Let us see how to get the datatypes of columns in a Pandas DataFrame. TO get the datatypes, we will be using the dtype() and the type() function.Example 1 :Â Â python # importing the module import pandas as pd # creating a DataFrame dictionary = {'Names':['Simon', 'Josh', 'Amen', 'Habby', 'Jonathan',
2 min read
Map True/False to 1/0 in a Pandas DataFrame
In this article, we will see how to map True/False to 1/0 in a Pandas DataFrame. The transformation of True/False to 1/0 is vital when carrying out computations and makes it easy to analyze the data. So, we will see how we can convert the same. Pandas DataFrame Map True/False to 1/0Below are the met
5 min read