Assignment no 12
write a python program to creating a dataframe to implement one hot
encoding from csv file.
import csv
# Define the header and rows
header = ['Name', 'Color', 'Size']
rows = [
['Alice', 'Red', 'Small'],
['Bob', 'Blue', 'Medium'],
['Charlie', 'Red', 'Large'],
['David', 'Green', 'Medium'],
['Eva', 'Blue', 'Small']
# Specify the file name
filename = '[Link]'
# Write to the CSV file
with open(filename, 'w', newline='') as file:
writer = [Link](file)
[Link](header) # Write the header
[Link](rows) # Write the rows
print(f"CSV file '{filename}' created successfully.")
import pandas as pd
# Load the CSV file into a DataFrame
df = pd.read_csv('[Link]')
# Display the original DataFrame
print("Original DataFrame:")
print(df)
# Perform one-hot encoding on categorical columns
df_encoded = pd.get_dummies(df, columns=['Color', 'Size'])
# Display the DataFrame with one-hot encoding
print("\nDataFrame with One-Hot Encoding:")
print(df_encoded)
# Optionally, save the encoded DataFrame to a new CSV file
df_encoded.to_csv('data_encoded.csv', index=False)
output:-
Original DataFrame:
Name Color Size
0 Alice Red Small
1 Bob Blue Medium
2 Charlie Red Large
3 David Green Medium
4 Eva Blue Small
DataFrame with One-Hot Encoding:
Name Color_Blue Color_Green ... Size_Large Size_Medium
Size_Small
0 Alice 0 0 ... 0 0 1
1 Bob 1 0 ... 0 1
0
2 Charlie 0 0 ... 1 0
0
3 David 0 1 ... 0 1
0
4 Eva 1 0 ... 0 0
1
[5 rows x 7 columns]
To perform one-hot encoding on categorical data from a CSV file using
Pandas, you will follow these steps:
1. Load the CSV File: Use Pandas to read the data from a CSV file
into a DataFrame.
2. Perform One-Hot Encoding: Use Pandas' get_dummies() function
to encode categorical features into one-hot vectors.
3. Display or Save the Result: Show the DataFrame with the one-hot
encoded features, and optionally save it to a new CSV file.
Explanation
1. Load CSV File:
o pd.read_csv('[Link]') reads the CSV file into a DataFrame.
2. One-Hot Encoding:
o pd.get_dummies(df, columns=['Color', 'Size']) performs one-
hot encoding on the specified categorical columns. The
columns parameter specifies which columns to encode.
o Each unique value in the categorical columns is transformed
into a binary vector.
3. Display the Result:
o Print the original and the one-hot encoded DataFrame to
compare.
4. Save to CSV (Optional):
o df_encoded.to_csv('data_encoded.csv', index=False) saves the
one-hot encoded DataFrame to a new CSV file.