0% found this document useful (0 votes)
0 views

ML_assignment_lab_7

The document outlines a multi-task learning project for predicting real estate prices in urban Indian markets, focusing on regression and classification tasks. It details the dataset, key features, mathematical formulations, optimization strategies, and the rationale for multi-task learning. The submission guidelines include a comprehensive report and Python code implementation with hyperparameter tuning.

Uploaded by

2022mcb1318
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
0 views

ML_assignment_lab_7

The document outlines a multi-task learning project for predicting real estate prices in urban Indian markets, focusing on regression and classification tasks. It details the dataset, key features, mathematical formulations, optimization strategies, and the rationale for multi-task learning. The submission guidelines include a comprehensive report and Python code implementation with hyperparameter tuning.

Uploaded by

2022mcb1318
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

Multi-Task Learning for Real Estate

Prediction
Course: AI211/ CS503 Machine Learning
Instructor: Dr. Santosh Kumar Vipparthi
Due Date: 01-04-2025

1 Problem Scenario
You are a data scientist at BharatHomes, a real estate analytics firm focus-
ing on urban Indian markets. The dataset includes 79 features describing
residential properties across Mumbai, Delhi, Bangalore, and Chennai.

1.1 Key Features in the Indian Context


Make sure when you analyze the values for the further regularization and
optimization of your own choices. All the feature values must be in the
euclidean space to obtain your choices for the considered regularization and
optimization.
Feature Indian Contextualization
SalePrice Price in INR (lakhs/crores)
Neighborhood Mumbai: Bandra, Andheri; Delhi: Gurgaon, Noida
Condition1 Proximity to metro stations (e.g., Delhi Metro)
Utilities Water supply consistency (24/7 vs. tanker-dependent)
LandContour Flood-prone zones (e.g., Chennai’s coastal areas)
OverallQual Builder reputation (e.g., Tata, DLF)
GarageQual Security features (gated community vs. standalone)
PoolQC Presence of clubhouse amenities
Objective:
1. Regression: Predict SalePrice (INR).
2. Classification: Classify properties as ”Premium” (top 20% prices)
or ”Standard”.

1
2 Mathematical Formulation
Let the dataset be:
• Features: X ∈ Rn×79 (e.g., GrLivArea, OverallQual, Neighborhood)

• Targets:

– Regression: yr ∈ Rn (SalePrice)
– Classification: yc ∈ {0, 1}n (Premium and Standard)

2.1 Loss Functions


1. Regression: Mean Squared Error (MSE)
n
1 X (i) 2
Lr = yr − f (x(i) ) (1)
n i=1

2. Classification: Cross-Entropy
n
1 X (i)
Lc = − y log g(x(i) ) (2)
n i=1 c

3. Joint Loss
L(θ) = αLr + βLc + λ∥θ∥22 (3)

3 Optimization
3.1 Key Analysis
1. Convexity:
• Prove whether L(θ) is convex under linear models for both tasks.

• Discuss the implications of non-convexity in deep neural networks.


2. Hessian Conditioning:
• Let Hr = ∇2 Lr and Hc = ∇2 Lc . Derive the condition number of the
joint Hessian H = αHr + βHc .

• Show how κ(H) affects optimization convergence.


3. Regularization:

2
• Compare L1 (sparse feature selection) vs. L2 (smooth feature weight-
ing).

• Justify the choice of regularization strength λ using the bias-variance


trade-off.

4 Multi-Task Learning Model


1. Problem Formulation:

• Why is multi-task learning suitable for this problem?


• How does feature sharing between tasks improve model perfor-
mance?

2. Data Understanding:

• Compute the correlation between OverallQual and SalePrice in


Delhi.
• Handle missing values in YearBuilt using imputation techniques.

3. Model Selection:

• Justify using linear regression + logistic regression vs. decision


trees.
• Discuss the role of feature engineering in improving model accu-
racy.

5 Data and Statistical Understanding


• Predict the 25th, 50th, and 75th quantiles of SalePrice in Mumbai.

• Analyze the distribution of SalePrice (e.g., normality, skewness).

• Define the hypothesis space for both regression and classification tasks.

• Explain how the choice of hypothesis space affects model performance.

• Modify hyperparameters (α, β, λ) based on quantile analysis.

3
6 Submission Guidelines
1. Report (12 pages) should include:

• Proofs of convexity and Hessian bounds.

• Feature importance plots for OverallQual and YearBuilt.

• Regularization path analysis for λ.

• Quantile predictions and data distribution analysis.

• Hypothesis space definition and its impact on performance.

• Hyperparameter tuning results and justification.

2. Code: Python implementation with adaptive hyperparameter tuning


as mentioned with stepwise instructions in the sample report.

You might also like