Linear Regression for ML ass
Linear Regression for ML ass
1
Learning! Knowledge ! Intelligence!
• Learning: is the ability to
• see pattern
• recognize pattern
• add constraints
• Knowledge:
• is the variability in constraints [patterns with variability].
• Intelligence:
• is invoke the knowledge on given test case.
7/10/2024 5
Machine Learning
7/10/2024 6
Machine Learning
Task-relevant Data
Data Cleaning
Data Integration
Databases
July 10, 2024 Data Mining: Concepts and Techniques 16
Machine Learning
Traditional Programming:
Data
Computer Output
Program
Machine Learning:
Data
Computer Program
Output
10-07-2024 17
School of Computer Science and Engineering 17
Machine Learning
past future
• Seeds = Algorithms
• Nutrients = Data
• Gardener = You
• Plants = Programs
Figure 2: Regression
Machine Learning - Supervised
Machine Learning - Supervised
Machine Learning - Supervised
Machine Learning – Un-Supervised
.
Why do we use regression analysis?
• There are multiple benefits of using regression analysis. They are
as follows:
• It indicates the significant relationships between dependent
variable and independent variable.
• It indicates the strength of impact of multiple independent
variables on a dependent variable.
.
Regression Means….
.
Linear Regression
• Linear regression is usually among the first few topics which people pick while learning
predictive modeling.
• In this technique, the dependent variable is continuous, independent variable(s) can
be continuous or discrete, and nature of regression line is linear.
• The difference between simple linear regression and multiple linear regression is that,
multiple linear regression has (>1) independent variables, whereas simple linear
regression has only 1 independent variable. Now, the question is “How do we
obtain best fit line?”.
.
Linear regression: Introduction
• A scatter plot of the data on 2-Dimensional plane.
Pizza Price
Pizza Size
• Choose a linear (straight line) model, and tweak it to match the data points by
changing its intercept/bias.
X Y
1 1
2 2
3 3
update
and
simultaneously
J()
J()
J()
J()
Since only a single training example is Entire training data is considered Subset of training examples is
considered before taking a step in the before taking a step in the direction of considered, hence it can make quick
direction of gradient. gradient. updates in the model parameters.
we are forced to loop over the training Therefore it takes a lot of time for It can also exploit the speed
set and thus cannot exploit the speed making a single update. associated with vectorizing the code.
associated with vectorizing the code.
It makes very noisy updates in the It makes smooth updates in the model Depending upon the batch size, the
parameters. parameters. updates can be made less noisy –
greater the batch size less noisy is the
update.
2104 5 1 45 460
1416 3 2 40 232
1534 3 2 30 315
852 2 1 36 178
… … … … …
Notation:
= number of features
= input (features) of training example.
= value of feature in training example.
Gradient descent:
Repeat
(simultaneously update )
2
Limitation of Ridge Regression
• It is not good for feature selection.
• Ridge regression decreases the complexity of a model but does not reduce
the number of variables since it never leads to a coefficient been zero rather
only minimizes it.
Lasso Regression
• Lasso regression stands for Least Absolute Shrinkage and Selection
Operator.
• It adds penalty term to the cost function.
• This term is the absolute sum of the coefficients. As the value of
coefficients increases from 0 this term penalizes, cause model, to
decrease the value of coefficients in order to reduce loss.
• The difference between ridge and lasso regression is that it tends to
make coefficients to absolute zero as compared to Ridge which never
sets the value of coefficient to absolute zero.
Limitations of Lasso Regression
• Lasso sometimes struggles with some types of data.
• If the number of predictors (p) is greater than the number of observations (n),
Lasso will pick at most n predictors as non-zero, even if all predictors are
relevant (or may be used in the test set).
• If there are two or more highly collinear variables then LASSO regression
select one of them randomly which is not good for the interpretation of data
The Elastic Net: