0% found this document useful (0 votes)

1 views3 pages

f10

The document outlines a feature engineering process for a restaurant dataset, creating new features such as name length, address length, cuisine count, and binary indicators for table booking and online delivery. It also categorizes ratings and costs, normalizing costs to USD for comparison. Finally, it analyzes and visualizes the correlation between these new features and the aggregate rating using correlation matrices and box plots.

Uploaded by

sambhaviasingh

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as TXT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

1 views3 pages

f10

Uploaded by

sambhaviasingh

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as TXT, PDF, TXT or read online on Scribd

You are on page 1/ 3

# LEVEL 2 - TASK 3: FEATURE ENGINEERING

print("LEVEL 2 - TASK 3: FEATURE ENGINEERING")

print("===================================")

# Create a copy of the dataframe for feature engineering

df_features = df_processed.copy()

# 1. Extract additional features from existing columns

# Length of restaurant name
df_features['Name_Length'] = df_features['Restaurant Name'].apply(lambda x:
len(str(x)) if pd.notna(x) else 0)

# Length of address
df_features['Address_Length'] = df_features['Address'].apply(lambda x: len(str(x))
if pd.notna(x) else 0)

# Number of cuisines
df_features['Cuisine_Count'] = df_features['Cuisines'].apply(
lambda x: len(str(x).split(',')) if pd.notna(x) else 0
)

# 2. Create new features by encoding categorical variables

# Has Table Booking (binary)
df_features['Has_Table_Booking_Binary'] = df_features['Has Table
booking'].map({'Yes': 1, 'No': 0})

# Has Online Delivery (binary)

df_features['Has_Online_Delivery_Binary'] = df_features['Has Online
delivery'].map({'Yes': 1, 'No': 0})

# Is Delivering Now (binary)

df_features['Is_Delivering_Now_Binary'] = df_features['Is delivering
now'].map({'Yes': 1, 'No': 0})

# Rating Category
def categorize_rating(rating):
if rating >= 4.5:
return 'Excellent'
elif rating >= 4.0:
return 'Very Good'
elif rating >= 3.5:
return 'Good'
elif rating >= 3.0:
return 'Average'
elif rating >= 2.0:
return 'Poor'
else:
return 'Very Poor'

df_features['Rating_Category'] = df_features['Aggregate
rating'].apply(categorize_rating)

# Cost Category
def categorize_cost(cost, currency):
if pd.isna(cost) or pd.isna(currency):
return 'Unknown'

# Normalize to USD for comparison (very simplified)

if currency == 'Dollar($)':
normalized_cost = cost
elif currency == 'Indian Rupees(Rs.)':
normalized_cost = cost / 75 # Approximate conversion
elif currency == 'Pounds(£)':
normalized_cost = cost * 1.3 # Approximate conversion
elif currency == 'Turkish Lira(TL)':
normalized_cost = cost / 8 # Approximate conversion
elif currency == 'Brazilian Real(R$)':
normalized_cost = cost / 5 # Approximate conversion
elif currency == 'Indonesian Rupiah(IDR)':
normalized_cost = cost / 14000 # Approximate conversion
else:
normalized_cost = cost # Default case

if normalized_cost < 20:

return 'Budget'
elif normalized_cost < 50:
return 'Moderate'
elif normalized_cost < 100:
return 'Expensive'
else:
return 'Very Expensive'

df_features['Cost_Category'] = df_features.apply(
lambda row: categorize_cost(row['Average Cost for two'], row['Currency']),
axis=1
)

# Display the new features

print("\nNew Features Created:")
print(df_features[['Name_Length', 'Address_Length', 'Cuisine_Count',
'Has_Table_Booking_Binary', 'Has_Online_Delivery_Binary',
'Is_Delivering_Now_Binary', 'Rating_Category',
'Cost_Category']].head())

# Analyze the relationship between new features and rating

print("\nCorrelation between new features and rating:")
feature_corr = df_features[['Name_Length', 'Address_Length', 'Cuisine_Count',
'Has_Table_Booking_Binary',
'Has_Online_Delivery_Binary',
'Is_Delivering_Now_Binary', 'Aggregate rating']].corr()
print(feature_corr['Aggregate rating'].sort_values(ascending=False))

# Visualize the correlation

plt.figure(figsize=(12, 10))
sns.heatmap(feature_corr, annot=True, cmap='coolwarm', fmt='.2f', linewidths=0.5)
plt.title('Correlation Between Features and Rating', fontsize=16)
plt.xticks(fontsize=12)
plt.yticks(fontsize=12)
plt.tight_layout()
plt.show()

# Analyze the relationship between categorical features and rating

plt.figure(figsize=(14, 6))
sns.boxplot(x='Rating_Category', y='Aggregate rating', data=df_features,
palette='viridis')
plt.title('Rating Distribution by Rating Category', fontsize=16)
plt.xlabel('Rating Category', fontsize=14)
plt.ylabel('Aggregate Rating', fontsize=14)
plt.xticks(fontsize=12)
plt.yticks(fontsize=12)
plt.grid(axis='y', alpha=0.3)
plt.show()

plt.figure(figsize=(14, 6))
sns.boxplot(x='Cost_Category', y='Aggregate rating', data=df_features,
palette='viridis')
plt.title('Rating Distribution by Cost Category', fontsize=16)
plt.xlabel('Cost Category', fontsize=14)
plt.ylabel('Aggregate Rating', fontsize=14)
plt.xticks(fontsize=12)
plt.yticks(fontsize=12)
plt.grid(axis='y', alpha=0.3)
plt.show()

Foodhub Project Full Code .HTML
88% (8)
Foodhub Project Full Code .HTML
30 pages
Data Mining
No ratings yet
Data Mining
10 pages
Nagareddy 18-Nov-2023
No ratings yet
Nagareddy 18-Nov-2023
20 pages
Multiple - Linear - Regression - AirBNB - Student - File0.2 - New (1) .Ipynb - Colaboratory
No ratings yet
Multiple - Linear - Regression - AirBNB - Student - File0.2 - New (1) .Ipynb - Colaboratory
8 pages
EDA Zomato 1681401606
No ratings yet
EDA Zomato 1681401606
15 pages
Zomato Rating Prediction
No ratings yet
Zomato Rating Prediction
11 pages
Zomoto Data Analysis Using Python_1
No ratings yet
Zomoto Data Analysis Using Python_1
10 pages
f14
No ratings yet
f14
3 pages
Data Preprocessing 2
No ratings yet
Data Preprocessing 2
5 pages
Project Template Notebook Ipynb 1
No ratings yet
Project Template Notebook Ipynb 1
23 pages
Zomoto Data analysis using python
No ratings yet
Zomoto Data analysis using python
10 pages
Final
No ratings yet
Final
14 pages
f12
No ratings yet
f12
3 pages
Machine Learning Lab
No ratings yet
Machine Learning Lab
20 pages
Updated
No ratings yet
Updated
8 pages
Multiple - Linear - Regression - AirBNB - Solution-0.2 - New - Ipynb - Colaboratory
No ratings yet
Multiple - Linear - Regression - AirBNB - Solution-0.2 - New - Ipynb - Colaboratory
11 pages
Predicting Home Prices in Bangalore
No ratings yet
Predicting Home Prices in Bangalore
18 pages
f9
No ratings yet
f9
2 pages
PYF_Project_LearnerNotebook_LowCode
No ratings yet
PYF_Project_LearnerNotebook_LowCode
6 pages
Data Clearning
No ratings yet
Data Clearning
7 pages
Railway Price Prediction
No ratings yet
Railway Price Prediction
20 pages
Feature Selection Techniques in ML With Python-1
No ratings yet
Feature Selection Techniques in ML With Python-1
7 pages
f5
No ratings yet
f5
2 pages
vertopal.com_MSML603_HW_Assignment_5
No ratings yet
vertopal.com_MSML603_HW_Assignment_5
4 pages
DWM Project
No ratings yet
DWM Project
16 pages
INN_Hotels_Project.docx
No ratings yet
INN_Hotels_Project.docx
26 pages
1722414346054
No ratings yet
1722414346054
18 pages
Another Project-Creating Customer Segments
No ratings yet
Another Project-Creating Customer Segments
31 pages
Project Data Mining (AMAN YADAV)
No ratings yet
Project Data Mining (AMAN YADAV)
12 pages
Report
No ratings yet
Report
40 pages
Food Recommendation System
No ratings yet
Food Recommendation System
13 pages
Random Forest Model
No ratings yet
Random Forest Model
16 pages
221IT027_DA_lab3 (2)
No ratings yet
221IT027_DA_lab3 (2)
5 pages
DA - Project 1
No ratings yet
DA - Project 1
12 pages
Data_preprocessing_example_programs1
No ratings yet
Data_preprocessing_example_programs1
9 pages
Setup: Chapter 2 - End-To-End Machine Learning Project
No ratings yet
Setup: Chapter 2 - End-To-End Machine Learning Project
31 pages
Final Project Report DA
No ratings yet
Final Project Report DA
3 pages
Boston Housing Solutions
No ratings yet
Boston Housing Solutions
3 pages
Exercise5 Solution
No ratings yet
Exercise5 Solution
22 pages
Kaggle Machine Learning
No ratings yet
Kaggle Machine Learning
6 pages
Task 6
No ratings yet
Task 6
14 pages
Naan Mudhalvan Phase 2
No ratings yet
Naan Mudhalvan Phase 2
13 pages
Restaurant Success Prediction
No ratings yet
Restaurant Success Prediction
14 pages
devesh
No ratings yet
devesh
11 pages
RECORD BOOK PROGRAMS 2024-2025
No ratings yet
RECORD BOOK PROGRAMS 2024-2025
11 pages
DS_P3_TANVI
No ratings yet
DS_P3_TANVI
3 pages
Sales Analysis Using Python and SQL
No ratings yet
Sales Analysis Using Python and SQL
15 pages
Restaurants Rating Prediction Using Machine Learning Algorithms
No ratings yet
Restaurants Rating Prediction Using Machine Learning Algorithms
4 pages
Documentation Final
No ratings yet
Documentation Final
53 pages
Feature Engineering
No ratings yet
Feature Engineering
10 pages
COMPUTER_SC_PROJECT
No ratings yet
COMPUTER_SC_PROJECT
13 pages
APRIORI Algorithms
No ratings yet
APRIORI Algorithms
4 pages
PANDAS_LEC_2
No ratings yet
PANDAS_LEC_2
21 pages
289
No ratings yet
289
1 page
Zomato Recommendation and Price Prediction System
No ratings yet
Zomato Recommendation and Price Prediction System
5 pages
Supermarket Sales Analysis Project
No ratings yet
Supermarket Sales Analysis Project
8 pages
Reading Data: #Importing Required Libraries
No ratings yet
Reading Data: #Importing Required Libraries
16 pages
Copy of Project 4 _ House Price Prediction.ipynb - Colab
No ratings yet
Copy of Project 4 _ House Price Prediction.ipynb - Colab
5 pages
Report-Converted Sip
No ratings yet
Report-Converted Sip
14 pages
Introduction to PHP, Part 2, Second Edition
From Everand
Introduction to PHP, Part 2, Second Edition
Adam Majczak
No ratings yet
SCM - Troubleshooting Issues in Oracle Manufacturing Cloud Work Definitions
No ratings yet
SCM - Troubleshooting Issues in Oracle Manufacturing Cloud Work Definitions
24 pages
Flux - Energy - Proposal - 10kw - Grid Tied - GM
No ratings yet
Flux - Energy - Proposal - 10kw - Grid Tied - GM
9 pages
Business 1
No ratings yet
Business 1
84 pages
Complete C7
No ratings yet
Complete C7
15 pages
Acc466 Test 2 July2022 - Question (PW)
No ratings yet
Acc466 Test 2 July2022 - Question (PW)
9 pages
Fillable Confidentiality Agreement
No ratings yet
Fillable Confidentiality Agreement
2 pages
Exim Bank of India
No ratings yet
Exim Bank of India
6 pages
SMC + Wwap
No ratings yet
SMC + Wwap
14 pages
Spector 2e Instructors Manual
0% (1)
Spector 2e Instructors Manual
105 pages
Zomato Blinkit
No ratings yet
Zomato Blinkit
3 pages
Doctoral Seminar On Transfer Pricing and Related Issues Stockholm University, 30-31 August 2022
No ratings yet
Doctoral Seminar On Transfer Pricing and Related Issues Stockholm University, 30-31 August 2022
5 pages
Memo Invoice Template
No ratings yet
Memo Invoice Template
1 page
Tax Invoice-Cum-Receipt: Railtel Corporation of India Limited. Gstin Pan
No ratings yet
Tax Invoice-Cum-Receipt: Railtel Corporation of India Limited. Gstin Pan
1 page
Feasibility Survey
No ratings yet
Feasibility Survey
1 page
Performance Snapshot
No ratings yet
Performance Snapshot
1 page
Quiz - Wasting Assets - EQUIPADO
No ratings yet
Quiz - Wasting Assets - EQUIPADO
2 pages
Case Study On - Why Lavasa Project Failed
100% (2)
Case Study On - Why Lavasa Project Failed
10 pages
Hapter: Need For Cost Accounting
No ratings yet
Hapter: Need For Cost Accounting
28 pages
South Dhahran Home Ownership Project: Request For Information (RFI)
No ratings yet
South Dhahran Home Ownership Project: Request For Information (RFI)
1 page
Case Study Plastic Waste Reduction Linked Bond
No ratings yet
Case Study Plastic Waste Reduction Linked Bond
2 pages
SAG 23 Heavy-Duty Vehicle Weight Restrictions in The EU
No ratings yet
SAG 23 Heavy-Duty Vehicle Weight Restrictions in The EU
28 pages
Invoice
No ratings yet
Invoice
1 page
Contract: Organisation Details Buyer Details
No ratings yet
Contract: Organisation Details Buyer Details
3 pages
Job Description - Deloitte
No ratings yet
Job Description - Deloitte
4 pages
GeM Bidding 2278824
No ratings yet
GeM Bidding 2278824
3 pages
1z0 1046 21 Demo
No ratings yet
1z0 1046 21 Demo
6 pages
Term Paper
No ratings yet
Term Paper
4 pages
Prevention of Sexual Harassment at Workplace Policy MBI
No ratings yet
Prevention of Sexual Harassment at Workplace Policy MBI
11 pages
A Case Study On Robin Hood
No ratings yet
A Case Study On Robin Hood
8 pages
AD Ports Group Integrated Annual Report 22 EN
No ratings yet
AD Ports Group Integrated Annual Report 22 EN
175 pages

f10

Uploaded by

f10

Uploaded by

# LEVEL 2 - TASK 3: FEATURE ENGINEERING

print("LEVEL 2 - TASK 3: FEATURE ENGINEERING")

# Create a copy of the dataframe for feature engineering

# 1. Extract additional features from existing columns

# 2. Create new features by encoding categorical variables

# Has Online Delivery (binary)

# Is Delivering Now (binary)

# Normalize to USD for comparison (very simplified)

if normalized_cost < 20:

# Display the new features

# Analyze the relationship between new features and rating

# Visualize the correlation

# Analyze the relationship between categorical features and rating

You might also like