0% found this document useful (0 votes)

4 views

Advanced Regression Assignment

The document discusses issues related to overfitting in logistic regression models, highlighting the importance of using techniques like cross-validation and regularization to improve model performance. It compares L1 (Lasso) and L2 (Ridge) regularization methods, detailing their differences in coefficient handling and computational efficiency. Additionally, it emphasizes the need for a balance between bias and variance to ensure model robustness and generalizability.

Uploaded by

s.v.dharshan2005

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

4 views

Advanced Regression Assignment

Uploaded by

s.v.dharshan2005

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 5

Question 1

Rahul built a logistic regression model with a training accuracy of 97% and a test accuracy of 48%.
What could be the reason for the gap between the test and train accuracies, and how can this
problem be solved?

Answer: The regression model can have a high training accuracy at the same time it can also have a
really low test accuracy as shown in the above model. The reason for such a difference between the
two accuracies is simply a condition that is known as Overfitting.

Overfitting occurs in a scenario where the model memorizes the training data completely and can
appear 100% of the variance in the data. In such a case the model hasn’t captured the trends in the
data rather it simply has memorized every data point in the training set. This problem mainly arises
due to the complexity of the model. As we increase the number of independent variables we do get a
more accurate model on the training data however this model will not perform well on test data. In
regression analysis, overfitting can lead to misleading R-squared values, regression coefficients and p-
values.

To avoid overfitting, we should draw a large random sample size to handle all of the terms that one
can expect to include in the model. The goal here is to find which are the relevant variables and terms
that need to be used to provide an optimal model.

In some cases, we don’t have enough sample size. In such cases we can use what is called as Cross-
validation.

Cross Validation is a technique to efficiently use the data regardless of how big our dataset is. The idea
is to use your initial training data to generate multiple mini train-test splits. These multiple splits can
be used to tune the model. In a standard k-fold cross-validation, we divide the data into k subsets,
also known as folds. Then, we iteratively train the algorithm on k-1 folds while using the remaining
folds as the test set,at each iteration the model is trained on all but one paratition and this model is
used to predict the remaining partition

Another technique we can use is Regularization. It is the process of penalizing the model when it
includes more independent variables. This is done by adding hyperparameter to the regression
formula. Adjusted R2 ,AIC,BIC are used to validate the accuracy of model. two common techniques of
regularization are Lasso and ridge.
Question 2

List at least four differences in detail between L1 and L2 regularisation in regression.

Answer: The two regularisation techniques work on the same principle i.e. they add a regularisation
term at the end.

1) L1 or Lasso adds absolute sum of coefficient values whereas L2 or Ridge adds the sum of the
square of coefficient values.

2) Lasso or L1 regularization is computationally inefficient on non-sparse cases(cases in which

coefficients are not going to become 0) while Ridge or L2 regression is computationally more
efficient in such cases.

3) Both the techniques L1 and L2 regularise the coefficients by reducing the magnitude to as
minimum as possible almost close to 0 or 0 . They cause shrinkage of the coefficients but
differently. L1 or Lasso shrinks some of the coefficients to zero, thus performing Feature
selection.

4) L2 or Ridge always has a matrix representation of the solution, whereas L1 or Lasso requires
a few iterations to get to the final solution.

Question 3

Consider two linear models:

L1: y = 39.76x + 32.648628 And

L2: y = 43.2x + 19.8

Given the fact that both the models perform equally well on the test data set, which one would you
prefer and why?

Answer: At first glance we don’t really find much difference In the two solutions. Both L1 and L2 are
comprise of one on slope (m) and an intercept (c). So the only thing separating the two solutions is
the number of decimal places on the slope(m) and intercept(c).
Consider L1:

Y= 39.76x + 32.648628

Taking It into consideration as the standard format y=mx+c

m=39.76

c=32.648628

m has 2 values after decimal points and c has 6 values after decimal point.
Consider L2:

y = 43.2x + 19.8

Taking into consideration format y=mx+c

M=43.2x

C= 19.8

m has 1 value after decimal point and c has 1 value after decimal point

Therefore it is evident that the model L2 is optimal to be chosen as it requires lesser number of bits
to represent its slope(m) and intercept(c)

Question 4

How can you make sure that a model is robust and generalisable? What are the implications of the
same for the accuracy of the model and why?

Answer: To make sure that a model is robust and generalisable the easiest solution is to make the
model simple. Simple model requires lesser amount of training data and is easily generalizable.
However, if the model is too simple it can lead to underfitting. Thus, one needs to find the right
amount of complexity to be achieved in a model to maintain optimal accuracy. This is represented by
what is known as bias-variance trade-off.

Bias is the difference between the average prediction of our model and the correct value which we
are trying to predict.

Variance is the variability of model prediction for a given data point or a value which tells us spread
of our data.

An overly simple model will have bias and low variance and will lead to high error on training and test
data. But a very complex model will have high variance and low bias and will perform very well on
training data but will have high error rates on test data.

To build a good model, we need to find the right balance between bias and variance such that it
minimizes the total error.
Thus, An optimal balance of bias and variance would never overfit or underfit the model

Question 5

You have determined the optimal value of lambda for ridge and lasso regression during the
assignment. Now, which one will you choose to apply and why?

Answer: After performing the ridge and lasso regression from the dataset we found the optimal value
of ridge regression to be at 2 and the optimal value for lasso regression to be at 100. On fitting the
model on our dataset. We can see in Ridge Regression 389 non zero values and Lasso regression has
109 non zero values . So essentially Lasso regression has performed feature selection and thus we will
apply lasso regression on the dataset as it is computationally efficient as it only adds the absolute sum
of value of coefficients also as it has performed feature selection which even RFE wasn’t able to
successfully get.

Hourglass Workout Program by Luisagiuliet 2
76% (21)
Hourglass Workout Program by Luisagiuliet 2
51 pages
12 Week Program: Summer Body Starts Now
87% (46)
12 Week Program: Summer Body Starts Now
70 pages
Read People Like A Book by Patrick King-Edited
57% (83)
Read People Like A Book by Patrick King-Edited
12 pages
Livingood, Blake - Livingood Daily Your 21-Day Guide To Experience Real Health
77% (13)
Livingood, Blake - Livingood Daily Your 21-Day Guide To Experience Real Health
260 pages
Cheat Code To The Universe
94% (79)
Cheat Code To The Universe
34 pages
Facial Gains Guide (001 081)
91% (45)
Facial Gains Guide (001 081)
81 pages
Curse of Strahd
95% (467)
Curse of Strahd
258 pages
The Psychiatric Interview - Daniel Carlat
91% (34)
The Psychiatric Interview - Daniel Carlat
473 pages
The Borax Conspiracy
91% (57)
The Borax Conspiracy
14 pages
The Secret Language of Attraction
86% (108)
The Secret Language of Attraction
278 pages
How To Develop and Write A Grant Proposal
83% (542)
How To Develop and Write A Grant Proposal
17 pages
Penis Enlargement Secret
60% (124)
Penis Enlargement Secret
12 pages
Workbook For The Body Keeps The Score
89% (53)
Workbook For The Body Keeps The Score
111 pages
Donald Trump & Jeffrey Epstein Rape Lawsuit and Affidavits
83% (1016)
Donald Trump & Jeffrey Epstein Rape Lawsuit and Affidavits
13 pages
KamaSutra Positions
78% (69)
KamaSutra Positions
55 pages
7 Hermetic Principles
93% (30)
7 Hermetic Principles
3 pages
27 Feedback Mechanisms Pogil Key
77% (13)
27 Feedback Mechanisms Pogil Key
6 pages
Frank Hammond - List of Demons
92% (92)
Frank Hammond - List of Demons
3 pages
Phone Codes
79% (28)
Phone Codes
5 pages
36 Questions That Lead To Love
91% (35)
36 Questions That Lead To Love
3 pages
How 2 Setup Trust
97% (307)
How 2 Setup Trust
3 pages
100 Questions To Ask Your Partner
78% (36)
100 Questions To Ask Your Partner
2 pages
The 36 Questions That Lead To Love - The New York Times
91% (35)
The 36 Questions That Lead To Love - The New York Times
3 pages
Satanic Calendar
25% (56)
Satanic Calendar
4 pages
The 36 Questions That Lead To Love - The New York Times
95% (21)
The 36 Questions That Lead To Love - The New York Times
3 pages
14 Easiest & Hardest Muscles To Build (Ranked With Solutions)
100% (8)
14 Easiest & Hardest Muscles To Build (Ranked With Solutions)
27 pages
Jeffrey Epstein39s Little Black Book Unredacted PDF
77% (13)
Jeffrey Epstein39s Little Black Book Unredacted PDF
95 pages
Machine Learning With Ridge and Lasso Regression
No ratings yet
Machine Learning With Ridge and Lasso Regression
19 pages
1001 Songs
70% (73)
1001 Songs
1,798 pages
The 4 Hour Workweek, Expanded and Updated by Timothy Ferriss - Excerpt
23% (954)
The 4 Hour Workweek, Expanded and Updated by Timothy Ferriss - Excerpt
38 pages
Zodiac Sign & Their Most Common Addictions
63% (30)
Zodiac Sign & Their Most Common Addictions
9 pages
Assignment - 1 - Alt - Lab 3.5 - Varinder
No ratings yet
Assignment - 1 - Alt - Lab 3.5 - Varinder
8 pages
Machine Learning Interview Questions
From Everand
Machine Learning Interview Questions
Tech Interviews
4.5/5 (2)
Machine Learning Interview Questions.
50% (2)
Machine Learning Interview Questions.
43 pages
Supervised Regression Notes
No ratings yet
Supervised Regression Notes
11 pages
Describe in Brief Different Types of Regression Algorithms
No ratings yet
Describe in Brief Different Types of Regression Algorithms
25 pages
ML Unit 3
No ratings yet
ML Unit 3
2 pages
1. Lecture+Notes+-+Advanced+Regression
No ratings yet
1. Lecture+Notes+-+Advanced+Regression
12 pages
Machine learning
No ratings yet
Machine learning
19 pages
21csc305p Ml Unit 2 Ppt
No ratings yet
21csc305p Ml Unit 2 Ppt
115 pages
w4 Generalisation
No ratings yet
w4 Generalisation
42 pages
Introduction To Machine Learning Week 2 Assignment
100% (1)
Introduction To Machine Learning Week 2 Assignment
8 pages
ML Unit 03 MCQ
No ratings yet
ML Unit 03 MCQ
20 pages
Ridge Lasso Regression Bias Variance Tradeoff 71
No ratings yet
Ridge Lasso Regression Bias Variance Tradeoff 71
19 pages
9_Linear Regression-Problems and Solutions
No ratings yet
9_Linear Regression-Problems and Solutions
23 pages
ML Solved Endsem
No ratings yet
ML Solved Endsem
16 pages
MLT Content
No ratings yet
MLT Content
3 pages
LLM ML Interview Q
No ratings yet
LLM ML Interview Q
43 pages
SubjectiveQuestions
No ratings yet
SubjectiveQuestions
4 pages
Machine Learning Questions and Answers For Interview
No ratings yet
Machine Learning Questions and Answers For Interview
20 pages
5-LR Doc - R Sqared-Bias-Variance-Ridg-Lasso
No ratings yet
5-LR Doc - R Sqared-Bias-Variance-Ridg-Lasso
26 pages
Regularization
No ratings yet
Regularization
5 pages
Unit 2
No ratings yet
Unit 2
8 pages
Regularization and Feature Selectio N
No ratings yet
Regularization and Feature Selectio N
102 pages
Regularization_(1)
No ratings yet
Regularization_(1)
3 pages
ML Question bank
No ratings yet
ML Question bank
13 pages
ML Assignment
No ratings yet
ML Assignment
5 pages
Certified Lean Six Sigma Green Belt (ICGB) Practice Questions And Exam Tests ICGB Exam Guidebook And Updated Questions
From Everand
Certified Lean Six Sigma Green Belt (ICGB) Practice Questions And Exam Tests ICGB Exam Guidebook And Updated Questions
Idea Link
No ratings yet
RGRSSN Assgnmnt
No ratings yet
RGRSSN Assgnmnt
11 pages
ML-1
No ratings yet
ML-1
24 pages
Regression_Questionnaire
No ratings yet
Regression_Questionnaire
10 pages
PA Notes 2
No ratings yet
PA Notes 2
23 pages
Epfl Machine Learning Final Exam 2021 Solutions
No ratings yet
Epfl Machine Learning Final Exam 2021 Solutions
21 pages
DIP Assignment Essajan
No ratings yet
DIP Assignment Essajan
2 pages
Lasso Vs Ridge Vs Elastic 1
No ratings yet
Lasso Vs Ridge Vs Elastic 1
5 pages
Bias Variance Ridge Regression
No ratings yet
Bias Variance Ridge Regression
4 pages
linear regression
No ratings yet
linear regression
37 pages
q6-5 Solution (Ridge and Lasso)
No ratings yet
q6-5 Solution (Ridge and Lasso)
7 pages
Module 3.3 Classification Models, An Overview
No ratings yet
Module 3.3 Classification Models, An Overview
11 pages
Lasso and Ridge Regression
No ratings yet
Lasso and Ridge Regression
30 pages
Fundamental Math
From Everand
Fundamental Math
Russell Pead
No ratings yet
Bias Varience Trade Off
100% (2)
Bias Varience Trade Off
35 pages
0 Regularization PDF
No ratings yet
0 Regularization PDF
88 pages
3 Da
No ratings yet
3 Da
16 pages
Linear Regression Algorithm
No ratings yet
Linear Regression Algorithm
16 pages
Chapter 3. Linear Regression
No ratings yet
Chapter 3. Linear Regression
41 pages
Int 354 ML-1
No ratings yet
Int 354 ML-1
4 pages
Unit 3 MCQ
No ratings yet
Unit 3 MCQ
20 pages
ML models and when to choose one over others
No ratings yet
ML models and when to choose one over others
7 pages
Ridge and Lasso Regression in Python
No ratings yet
Ridge and Lasso Regression in Python
18 pages
Dependent Independent Variable (S) : Regression: What Is Regression
No ratings yet
Dependent Independent Variable (S) : Regression: What Is Regression
15 pages
AI34
No ratings yet
AI34
3 pages
Question 1
No ratings yet
Question 1
2 pages
Test 1 With Key 10-3
No ratings yet
Test 1 With Key 10-3
16 pages
ML_AI
No ratings yet
ML_AI
53 pages
Retail Sales Prediction Slides
No ratings yet
Retail Sales Prediction Slides
83 pages
Lecture 7 - Part A - Mutli Class and Overfitting and Regularization
No ratings yet
Lecture 7 - Part A - Mutli Class and Overfitting and Regularization
43 pages
Q and A BIS
No ratings yet
Q and A BIS
7 pages
Linear - Regression & Evaluation Metrics
No ratings yet
Linear - Regression & Evaluation Metrics
31 pages
Random Sample Consensus: Robust Estimation in Computer Vision
From Everand
Random Sample Consensus: Robust Estimation in Computer Vision
Fouad Sabry
No ratings yet
Logistic Regression
No ratings yet
Logistic Regression
42 pages
Lec 9
No ratings yet
Lec 9
22 pages
Essentials of Biostatistics - Second Edi
0% (2)
Essentials of Biostatistics - Second Edi
13 pages
Faculty of Information Science & Technology (FIST) : PSM 0325 Introduction To Probability and Statistics
No ratings yet
Faculty of Information Science & Technology (FIST) : PSM 0325 Introduction To Probability and Statistics
10 pages
CH5019 Mathematical Foundations of Data Science Test 8 Questions
No ratings yet
CH5019 Mathematical Foundations of Data Science Test 8 Questions
4 pages
Syllabus_Principle of Data Science
No ratings yet
Syllabus_Principle of Data Science
4 pages
Lesson Correlation Analysis Jan 7 2021
No ratings yet
Lesson Correlation Analysis Jan 7 2021
16 pages
Kode B: Final Exam Second Semester 2019/2020
100% (2)
Kode B: Final Exam Second Semester 2019/2020
12 pages
Probability: A P B P A P
No ratings yet
Probability: A P B P A P
17 pages
CTL - SC0x Supply Chain Analytics: Key Concepts Document
No ratings yet
CTL - SC0x Supply Chain Analytics: Key Concepts Document
11 pages
Logistic Regression From Introductory to Advanced Concepts and Applications 1st Edition Scott Menard download
No ratings yet
Logistic Regression From Introductory to Advanced Concepts and Applications 1st Edition Scott Menard download
50 pages
An Alternative To Cohen's Standardized Mean Difference Effect Size: A Robust Parameter and Confidence Interval in The Two Independent Groups Case.
No ratings yet
An Alternative To Cohen's Standardized Mean Difference Effect Size: A Robust Parameter and Confidence Interval in The Two Independent Groups Case.
12 pages
Kolmogorov-Smirnov Two Sample Test
No ratings yet
Kolmogorov-Smirnov Two Sample Test
17 pages
Statistics Dispersion
No ratings yet
Statistics Dispersion
34 pages
Mann-Whitney U Test
No ratings yet
Mann-Whitney U Test
12 pages
Exercise - 01 - Machine Learning
No ratings yet
Exercise - 01 - Machine Learning
2 pages
Flexmix Intro
No ratings yet
Flexmix Intro
18 pages
Statistical Hypothesis
No ratings yet
Statistical Hypothesis
70 pages
Topic 2 The General Linear Model - Powerpoint PDF
No ratings yet
Topic 2 The General Linear Model - Powerpoint PDF
43 pages
Wooldridge Control Function Approach
No ratings yet
Wooldridge Control Function Approach
31 pages
F10M17 (01) LiaoFinal
No ratings yet
F10M17 (01) LiaoFinal
14 pages
(Ebook PDF) Introduction To Research in Education 9th Edition 2024 Scribd Download
100% (3)
(Ebook PDF) Introduction To Research in Education 9th Edition 2024 Scribd Download
41 pages
Moments and Measures of Skewness and Kurtosis
No ratings yet
Moments and Measures of Skewness and Kurtosis
19 pages
Factor Analysis Using SPSS: Example
No ratings yet
Factor Analysis Using SPSS: Example
14 pages
PSY 320 L1 Introduction to Educational Statistics
No ratings yet
PSY 320 L1 Introduction to Educational Statistics
5 pages
Biostatistics (Correlation and Regression)
100% (1)
Biostatistics (Correlation and Regression)
29 pages
IMDB Movie Analysis
No ratings yet
IMDB Movie Analysis
17 pages
Introduction to Research Methods and Data Analysis in Psychology Darren Langdridgedownload
100% (2)
Introduction to Research Methods and Data Analysis in Psychology Darren Langdridgedownload
52 pages
Stat Slides 5
No ratings yet
Stat Slides 5
30 pages
Tutorial 1
No ratings yet
Tutorial 1
2 pages

Advanced Regression Assignment

Uploaded by

Advanced Regression Assignment

Uploaded by

Question 1

List at least four differences in detail between L1 and L2 regularisation in regression.

2) Lasso or L1 regularization is computationally inefficient on non-sparse cases(cases in which

Consider two linear models:

L1: y = 39.76x + 32.648628 And

L2: y = 43.2x + 19.8

Taking It into consideration as the standard format y=mx+c

Taking into consideration format y=mx+c

You might also like