LAB5_Regularization
LAB5_Regularization
Prerequisites:
# let's use limited columns which makes more sense for serving our
purpose
cols_to_use = ['Suburb', 'Rooms', 'Type', 'Method', 'SellerG',
'Regionname', 'Propertycount',
'Distance', 'CouncilArea', 'Bedroom2', 'Bathroom',
'Car', 'Landsize', 'BuildingArea', 'Price']
dataset = dataset[cols_to_use]
# other continuous features can be imputed with mean for faster results
since our focus is on Reducing overfitting
# using Lasso and Ridge Regression
dataset['Landsize'] =
dataset['Landsize'].fillna(dataset.Landsize.mean())
dataset['BuildingArea'] =
dataset['BuildingArea'].fillna(dataset.BuildingArea.mean())
dataset.dropna(inplace=True)
dataset.shape
X = dataset.drop('Price', axis=1)
y = dataset['Price']
reg.score(test_X, test_y)
reg.score(train_X, train_y)
ridge_reg.score(test_X, test_y)
ridge_reg.score(train_X, train_y)
lasso_reg.score(test_X, test_y)
lasso_reg.score(train_X, train_y)
10.Visualizations
# Store R² scores
lin_train_score = reg.score(train_X, train_y)
lin_test_score = reg.score(test_X, test_y)
1. Data Preprocessing:
• What are the R² scores for the Linear Regression model on training and
testing data?
• What does the difference between the train and test scores indicate?
• What are the train and test scores for Ridge Regression?
• How does Ridge Regression help in reducing overfitting?
• What are the train and test scores for Lasso Regression?
• How does Lasso affect feature selection compared to Ridge?
6. Regularization Impact:
• What happens when you increase the alpha value in Ridge and Lasso
Regression?
• If you had to choose one model for this dataset, which one would it be and
why?