0% found this document useful (0 votes)

38 views

CS550 Regression

This document provides an overview of regression analysis and machine learning methodology for a course on machine learning. It discusses the objectives of regression analysis, including prediction problems and modeling continuous target variables. It also covers regression analysis concepts like response and predictor variables, bias-variance tradeoff in modeling, and methods like k-nearest neighbors and linear regression. The document uses advertising data as an example to illustrate regression concepts.

Uploaded by

dipsresearch

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

38 views

CS550 Regression

Uploaded by

dipsresearch

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 62

CS550: Machine Learning

Regression

Dr. Gagan Gupta

Slides based on Aurelien Geron’s book; ISL and ELS book and Harvard’s CS109A
11/24/2023 Regression 1
Lecture Objectives
• What we will learn in this lecture?
• Regression Analysis and examples
• Machine Learning Methodology
• How to assess goodness of the models
• Understand the bias-variance trade-off
• Understand the method of K nearest neighbors
• Understand the method of linear (least-squares) regression

11/24/2023 Regression 2
Prediction problem motivation
• The variable we'd like to predict may be:
• more difficult to measure
• is more important than the other(s)
• or may be directly or indirectly influenced by the values of the other
variable(s)
• Thus, we'd like to define two categories of variables:
• variables whose value we want to predict
• variables whose values we use to make our prediction

11/24/2023 Regression 3
Regression Analysis
• Definition: A class of techniques that seeks to make predictions about
unknown continuous target variables given observed input variables.
• Applications:
• Predicting a person’s height given the height of their parents.
• Predicting the amount of time someone will take to pay back a loan
given their credit history.
• Predicting what time a package will arrive given current weather and
traffic conditions.
• Predicting the production of a particular crop given the rainfall

11/24/2023 Regression 4
Response and Predictor Variables
• We are observing numerical variables and we are making sets of
observations.

• We call the variable we'd like to predict the outcome or response variable;
typically, we denote this variable by and the individual measurements

• The variables we use in making the predictions the features or predictor

variables; typically, we denote these variables by and the individual
measurements .
Note: indexes the observation (and indexes the value of the -th predictor variable (.
Total number of predictor variables, J=p

11/24/2023 Regression 5
True vs. Statistical Model
• We will assume that the response variable, , relates to the predictors, , through
some unknown function ‘f’ expressed generally as:

• Here, is the unknown function expressing an underlying rule for relating to , is

the random amount (unrelated to ) that differs from the rule

• A statistical model is any algorithm that estimates . We denote the estimated

function as

11/24/2023 Regression 6
Machine Learning
Data Science Process
• ML algorithms have the objective of
Ask an interesting question generalization, i.e. use one dataset to
generate models that perform well
Data preparation on data that they have not seen.
• Thus, they prove to be effective in
Explore the Data generating predictive models.
• There are many ML techniques and
Model the Data ML models have several parameters
• How to choose the best one?
Communicate/Visualize the Results
11/24/2023 Regression 7
Machine Learning Methodology
Train Dataset Random • Input dataset is divided into a
(Historical) samples random split (80/20) or (90/10) to
be used for training and testing
respectively
Training
Test set • For each split, model is generated
set
and tested for accuracy
Apply • This is repeated 5 times or 10
ML algos times and average error is
computed
Test Data Best Validate • The best model is selected and
(Real) Model Results used in real world scenario
11/24/2023 Regression 8
Flexibility vs. Interpretability Tradeoff
• There are many methods of
regression (that estimate f)
• Some are less flexible but
more interpretable
• These are useful for
inference problems where
we want to study the
relationships between
predictor variables
• But highly flexible methods
can also lead to over-fitting!
11/24/2023 Regression 9
Error Evaluation
In order to quantify how well a model performs, we define a loss or error function.
A common loss function for quantitative outcomes is the Mean Squared Error
(MSE):

The quantity is called a residual and measures the error at the i-th prediction.
The square root of MSE is RMSE:

10
R-squared Error

• If our model is as good as the mean value, , then

• If our model is perfect then
• can be negative if the model is worst than the average. This can happen when
we evaluate the model in the real life test set.

11
Bias Variance Tradeoff

• Total Error = Bias2 + Variance + Irreducible Error

Bias is the average distance of estimate from the true mean of f(x)
Variance is the sq. dev of the estimate around its mean
11/24/2023 Regression 12
Bias Variance Tradeoff
“All models are wrong, but some models are useful.” : George Box (1919-2013)
• Occam’s razor: This philosophical principle states that “the
simplest explanation is best”.
• Bias is error from erroneous assumptions in the model, like
making it linear/simplistic. (underfitting)
• Variance is error from sensitivity to small fluctuations in the
training set, indicating it will not work in real world. (overfitting)
• First-principle models likely to suffer from bias, with data-driven
models in greater danger of overfitting.
11/24/2023 Regression 13
Example Problem (Advertising)

The Advertising data set consists of the sales of that product in 200
different markets, along with advertising budgets for the product in each
of those markets for three different media: TV, radio, and newspaper.
Everything is given in units of $1000.
Some of the figures in this presentation are taken from ISL book: "An Introduction to Statistical Learning, with applications in R"
(Springer, 2013) with permission from the authors: G. James, D. Witten, T. Hastie and R. Tibshirani "
11/24/2023 Regression 14
Response vs. Predictor Variables
X Y
predictors outcome
features response variable
covariates dependent variable

TV radio newspaper sales

n observations

230.1 37.8 69.2 22.1

44.5 39.3 45.1 10.4
17.2 45.9 69.3 9.3
151.5 41.3 58.5 18.5
180.8 10.8 58.4 12.9

11/24/2023 p predictors
Regression
k-Nearest Neighbors
The k-Nearest Neighbor (kNN) model is an intuitive way to predict a
quantitative response variable:

to predict a response for a set of observed predictor values, we use

the responses of other observations most similar to it

Note: this strategy can also be applied in classification to predict a

categorical variable.

11/24/2023 Regression 16
k-Nearest Neighbors
For a fixed a value of k, the predicted response for the -th observation as the
average of the observed response of the k-closest observations:

where are the k observations most similar to (similar refers to a notion of

distance between predictors).
Usually, Euclidean distance is chosen (sqrt of squared coordinate
differences)

Python: sklearn.neighbors.KNeighborsRegressor(n_neighors=3)
11/24/2023 Regression 17
4-Nearest Neighbors

• Very few assumptions made here about the nature of ‘f’

• Equal weights for the values of y, regardless of their distance from x
• As dimensionality increases, becomes hard to find neighbors closeby and f
may change significantly
11/24/2023 Regression 18
Model Comparison

Do the same for all k’s and

Q. Reason for discontinuity
compare the RMSEs
Q. Bias Variance tradeoff
Which k is best?
19
kNN: Kernel Regression
• What if we took all the points, not just the k nearest points, and
introduced a weighting function that weights by distance (so that we
weight the value of closer points more)?

• Traditional kNN method can be seen as a 0/1 weighting model

• Examples of kernels: Epanechnikov quadratic kernel, Tri-cube Kernel,
and Gaussian Kernel.
11/24/2023 Regression 20
Comparison of Kernels

11/24/2023 Regression 21
Linear Models
• In the kNN approach, we didn’t assume a form of the function ‘f’
• Such approaches are called non-parametric approaches
• In the linear regression approach, we assume that the response is a
linear function of the predictor variables
• Note that this technique can be easily extended by creating extra
predictor variables (features) from a combination (transformation) of
the original predictor variables.
• So lets assume:
Linear Regression

• … then it follows that our estimate is:

• where and are estimates of and respectively, that we compute

using observations.

23
Estimate of the regression coefficients

For a given data set

24
Estimate of the regression coefficients (cont)

Which of the above three lines fit the data points the best?
a. One which goes through maximum number of points
b. One with least slope
c. One from which no point is too far, i.e. it is approximately in middle of
all points
25
Estimate of the regression coefficients (cont)
To compute the best fit, we first calculate the residuals

26
Python package

11/24/2023 Regression 27
So how do Linear Regression solvers work?
• Matrix Methods
• Exact methods that solve the set of linear equations
• Involve computation of matrix inverse or pseudoinverse (more efficient)
• Gradient Descent
• A generic method of solving optimization problems
• Begin with a random point and reach the optimal solution through a
sequence of improvements
• Faster improvements could be done by stochastic methods

11/24/2023 Regression 28
Matrix Algebra for n-dimensions
• Loss (L) = MSE(β) =
• MSE(β) = ), where X is a nx(p+1) matrix with each row as an input
vector (including ‘1’ for the intercept) and y is a n dimensional
vector of the outputs in the training set
• To minimize, we differentiate with respect to β and we get
• )=0
• If is non-singular, meaning, inverse exists, then
•β=

11/24/2023 Regression 29
Matrix Algebra for n-dimensions
• Computational complexity of computing the matrix inverse: O(p2.4 ) to
O(p3 ) depending on implementation.
• Scikit-learn’s Linear Regression class uses SVD approach O(p2 )
• SVD stands for singular value decomposition
• Uses pseudoinverse approach (Moore-Penrose) :numpy.linalg.pinv()
• β=
• They both have linear complexity in terms of the number of
instances, n; but at least quadratic in p
• So, we need to look at alternate techniques if p is very large, e.g.,
100,000
11/24/2023 Regression 30
Gradient Descent Approach
• Start from a random point, i.e. generate a random β
1. Determine which direction will reduce the MSE ( 𝑖 +1) ( 𝑖) 𝑑L
β =β −𝜆
2. Compute the slope of the function (its derivative) at 𝑑β
this point and go in the reverse direction
3. is the learning rate parameter
L - +
4. Go to #1, until convergence, i.e. MSE is minimized
• For Linear regression, MSE is a convex function
• There is no local minima, just a global minimum
• Continuous and slope that never changes abruptly

β
11/24/2023 Regression 31
Stochastic Approaches
• Batch GD update equation
• Uses the whole batch of training data at each gradient step
• Stochastic GD
• Picks only a random instance of training data to update gradients
• Causes irregular descent, but better chance of finding global minimum
• Simulated annealing: Reduce the learning rate gradually to reduce
irregularity
• Mini-batch GD
• Small set of random instances of training data are used

11/24/2023 Regression 32
Parametric or Non-Parametric?
Linear Regression (parametric) k-NN Approach (non-parametric)

Assumption on A linear function is assumed Can work even if the function is non-
function f linear. But it has to be locally constant
High Complexity problems which can be Difficult to find neighbors nearby
dimensions overcome by efficient algorithms which can cause errors
Bias Low Small K=> Low bias
Large K => High Bias
Variance Depends on the problem Small K=> High Variance
Large K => Low Variance
Computations Once during the model fitting phase. Every time a prediction has to be
After that predictions are quick made, we look at all the training points

11/24/2023 Regression 33
Lecture Objectives
• Understanding the outputs of a Linear Model
• Limitations of Linear Models and their extensions
• How to reduce over-fitting/variance via regularization
• Support Vector Machines (SVM)

11/24/2023 Regression 34
Brief review of the linear model

11/24/2023 Regression 35
Possible Questions
• How accurately do we know our model parameters?
• Is at least one predictor variable useful in the prediction?
• We have to examine the p-values
• Which subset of the predictor variables are important?
• There are several techniques of predictor variable/feature selection
• What would be the accuracy of predictions on unseen data?
• We can generate confidence intervals on our estimates
• Cross-validation gives us an estimate.
• Do I need more predictor variables/features?
• Look at patterns in the residual errors
11/24/2023 Regression 36
Confidence intervals for predictor estimators
• What causes errors in estimation of ?

• we do not know the exact form of

• limited sample size
• Variance of is called as standard error,
• To estimate SE, we use Bootstrapping
• sampling from the training data (X,Y) to estimate its statistical properties.
• In our case, we can sample with replacement
• Compute multiple times by random sampling
• Variance of multiple estimates approximates the true variance
11/24/2023 Regression 37
Standard Errors Intuition from Formulae
• Better model:

• More data: and

• Larger coverage: or
• Better data:
−1
𝐺𝑒𝑛𝑒𝑟𝑎𝑙 𝐹𝑜𝑟𝑚𝑢𝑙𝑎 : 𝑆𝐸 ( 𝛽 ) =𝜎 ( 𝑋 𝑋 )
2 2 𝑇

38
Significance of predictor variables
• As we saw, there are inherent uncertainties in estimation of β
• We evaluate the importance of predictors using hypothesis testing, using
the t-statistics and p-values (Small p-value(<0.05) => significant)
• Null hypothesis is that βi=0

Test statistic here would be

Which measures the distance of the
mean from zero in units of standard
deviation.

11/24/2023 Regression 39
Sample Results

import statsmodels.api as sm
est = sm.OLS(y, X2)
est2 = est.fit()
print(est2.summary())

11/24/2023 Regression 40
Subset Selection Techniques
• Total number of subsets of a set of size J = ?
• Goal: All the variables in the model should have sufficiently low p-
values, and all the variables outside the model should have a large p-
value if added to the model.
• Three possible approaches
• Forward selection
• Backward selection
• Mixed selection

11/24/2023 Regression 41
Subset Selection Techniques
• Forward selection:
• Begin will a null set, S
• Perform J linear regressions, each with exactly one variable
• Add the variable that results in lowest Cross-validation error to the set, S
• Again, perform J-1 linear regressions with 2 variables
• Add the variable that results in lowest Cross-validation error to the set, S
• Continue until some stopping criteria is reached… eg. CV error is not decreasing
• Backward selection begins with all the variables and removes the
variable with highest p-value at successive steps
• Mixed selection is similar to Forward Selection, but it may also remove
a variable if it doesn’t yield any improvement to the model
11/24/2023 Regression 42
Do I need more predictors/change of model?
• When we estimated the variance of ϵ, we assumed that the residuals
were uncorrelated and normally distributed with mean 0 and fixed
variance.
• These assumptions need to be verified using the data. In residual
analysis, we typically create two types of plots:
1. a plot ofwith respect to or . This allows us to compare the
distribution of the noise at different values of .
2. a histogram of . This allows us to explore the distribution of the
noise independent of or .

11/24/2023 Regression 43
Patterns in Residuals

• Residuals are easier to interpret than the model

• We plot () with , so the graph is always 2-D
11/24/2023 Regression 44
Confidence intervals on predictions of y

• Depends on confidence on β
• Different β => different values of y
• Given , examine distribution of , determine the mean and standard deviation.
• For each of these the prediction for
11/24/2023 Regression 45
Potential problems of Linear Models
• Non-linearity
• Can use polynomial linear regression or design better features
• Outliers
• Disturbs the models because of quadratic penalty, Discard outliers carefully
• High-leverage points
• Outliers in the predictor variables
• Collinearity (2 or more predictor variables have high correlation)
• Keep one of them or design a good combined feature
• Correlation of error terms, Non-constant variance of error terms
• Gives higher confidence in the model, can’t trust the CI on model parameter
11/24/2023 Regression 46
Polynomial Regression
• The simplest non-linear model we can consider, for a
response Y and a predictor X, is a polynomial model of
degree M,
• Just as in the case of linear regression with cross terms,
polynomial regression is a special case of linear regression -
we treat each as a separate predictor. Thus, we can write

11/24/2023 Regression 47
Polynomial Regression

• Which of the above three is the best model?

• Check RMSE
• Check R2
• Remember bias and variance??
11/24/2023 Regression 48
Benefit of Cross-Validation

𝐾
1
𝐶𝑉 ( Model )= ∑
𝐾 𝑖=1
𝐿¿ ¿ ¿

• Using cross-validation, we generate validate the models on a portion

of training data which our learning algorithm has never seen.
• Leave-one out method is used when the number of sample points is
very small.
11/24/2023 Regression 49
Regularization of Linear Models
• Goal: Reduce over-fitting of the data by reducing degrees of freedom
• For a linear model, regularization is typically achieved by constraining
the weights of the model

where is a scalar that gives the weight (or importance) of the

regularization term.

• Fitting the model using the modified loss function Lreg would result in
model parameters with desirable properties (specified by R).

11/24/2023 Regression 50
Ridge Regression
• Alternatively, we can choose a regularization term that penalizes the
squares of the parameter magnitudes. Then, our regularized loss function
is:

• Works best when least-square estimates have high variance

• As increases, flexibility decreases, variance decreases, bias increases
slightly

• Note that is the l2 norm of the vector β

51
Ridge Regression
• We often say that Lridge is the loss function for l2 regularization.

• Finding the model parameters β ridge that minimize the l2 regularized loss
function is called ridge regression.

52
LASSO (least absolute shrinkage and selection operator) Regression
• Ridge regression reduces the parameter values but doesn’t force them
to go to zero. LASSO is very effective in doing that.
• It uses the following regularized loss function is:

• Note that is the l1 norm of the vector β

53
LASSO Regression
• Hence, we often say that LLASSO is the loss function for l1 regularization.

• Finding the model parameters β LASSO that minimize the l1 regularized loss function
is called LASSO regression.

54
Choosing l
• In both ridge and LASSO regression, we see that the larger our choice of the
regularization parameter , the more heavily we penalize large values in β,
• If is close to zero, we recover the MSE, i.e. ridge and LASSO regression is just
ordinary regression.
• If is sufficiently large, the MSE term in the regularized loss function will be
insignificant and the regularization term will force β ridge and β LASSO to be close
to zero.
• To avoid ad-hoc choices, we should select using cross-validation.
• Once the model is trained, we use the unregularized performance measure to
evaluate the model’s performance.
55
Elastic Net
• Middle ground between Ridge and Lasso regression
• Regularization term is a simple mix with parameter ‘r’

• Elastic Net has better convergence features over Lasso.

Sklearn.linear_model import ElasticNet

Elastic_net = ElasticNet(alpha=0.1, l1_ratio=0.5)
11/24/2023 Regression 56
SVM (Support Vector Machines)
• Uses a different approach to regression
• Instead of thinking of the fit as a line, let us think of it as a channel
• Fit as many instances as possible on the channel while limiting the
margin violations, (i.e. instances off the channel)
• Width of the channel is the hyper-parameter ‘ε’
• Adding more training instances within the channel doesn’t change
the model parameters
• Hence these models are more robust against over-fitting

11/24/2023 Regression 57
SVM Regression

11/24/2023 Regression 58
ε insensitive Loss function

-e e

11/24/2023 Regression 59
SVM Regression

Sklearn.svm import ElasticNet

Svm_reg = LinearSVR(epsilon=1, C=2)
11/24/2023 Regression 60
Parameters in SVM regression
• Parameter ε controls the width of the channel and can affect the
number of support vectors used to construct the regression function.
• Adding more training vectors
• Bigger ε => fewer support vectors
• Parameter C determines the trade-off between the model complexity
and the degree to which the deviations larger than ε can be tolerated
• It is interpreted as a traditional regularization parameter that can be
estimated by Cross Validation, for example

11/24/2023 Regression 61
Non-linear data
• SVM allow for a computationally efficient method of transforming
the dataset to higher dimensions using kernel trick.
• Common kernels that are used are
• Linear, polynomial, Gaussian RBF, Sigmoid

11/24/2023 Regression 62

How to Build a Machine Learning Model | by Chanin Nantasenamat | Towards Data Science
No ratings yet
How to Build a Machine Learning Model | by Chanin Nantasenamat | Towards Data Science
37 pages
Download Complete (Ebook) Fighting Churn with Data by Carl Gold ISBN 9781617296529, 161729652X PDF for All Chapters
100% (7)
Download Complete (Ebook) Fighting Churn with Data by Carl Gold ISBN 9781617296529, 161729652X PDF for All Chapters
65 pages
Machine Learning Interview Questions
From Everand
Machine Learning Interview Questions
Tech Interviews
4.5/5 (2)
MATH6183 Introduction+Regression
No ratings yet
MATH6183 Introduction+Regression
70 pages
Ms 236 N 0
No ratings yet
Ms 236 N 0
63 pages
AAI Lecture 10 Sp 25
No ratings yet
AAI Lecture 10 Sp 25
37 pages
Machine learning
No ratings yet
Machine learning
62 pages
3. Linear Regression
No ratings yet
3. Linear Regression
49 pages
Chapter 1. Elements in Predictive Analytics
No ratings yet
Chapter 1. Elements in Predictive Analytics
66 pages
Teit ML2
No ratings yet
Teit ML2
11 pages
DSA Shotnotes for 2 units
No ratings yet
DSA Shotnotes for 2 units
5 pages
Chapter2 Annotated Part2
No ratings yet
Chapter2 Annotated Part2
30 pages
Linear Regression Algorithm
No ratings yet
Linear Regression Algorithm
16 pages
lecture 9-10
No ratings yet
lecture 9-10
28 pages
Linear Regression Logistic Regression Classification
No ratings yet
Linear Regression Logistic Regression Classification
66 pages
2EL1730 ML Lecture02 Linear and Logistic Regression
No ratings yet
2EL1730 ML Lecture02 Linear and Logistic Regression
65 pages
Understanding The Geometry of Predictive Models: Workshop at S P Jain School Institute of Management and Research
No ratings yet
Understanding The Geometry of Predictive Models: Workshop at S P Jain School Institute of Management and Research
78 pages
Regression v33
No ratings yet
Regression v33
81 pages
Classical Machine Learning: Linear Regression: Ramesh S
No ratings yet
Classical Machine Learning: Linear Regression: Ramesh S
28 pages
Simple Linear Regression
No ratings yet
Simple Linear Regression
27 pages
m2 Data analytic and visualization
No ratings yet
m2 Data analytic and visualization
53 pages
FML Unit2
No ratings yet
FML Unit2
13 pages
Machine Learning Unit2
No ratings yet
Machine Learning Unit2
31 pages
DMV Unit 3 PPT_RSK_250419_125620 jfhuehiwhu
No ratings yet
DMV Unit 3 PPT_RSK_250419_125620 jfhuehiwhu
89 pages
07 Regression
No ratings yet
07 Regression
25 pages
Unit2 ML Notes
No ratings yet
Unit2 ML Notes
19 pages
W2 Ecs7020p
No ratings yet
W2 Ecs7020p
54 pages
Module05 Notes
No ratings yet
Module05 Notes
19 pages
Data Analytics Unit 3 Notes
100% (2)
Data Analytics Unit 3 Notes
28 pages
Supervised Learning Algorithms
No ratings yet
Supervised Learning Algorithms
20 pages
NOTES - UNIT 2 - Machine Learning
No ratings yet
NOTES - UNIT 2 - Machine Learning
33 pages
MachineLearning Unit II
No ratings yet
MachineLearning Unit II
45 pages
Linear Regression
No ratings yet
Linear Regression
16 pages
Chap01-3 (Autosaved)
No ratings yet
Chap01-3 (Autosaved)
51 pages
Rohini 73149042113
No ratings yet
Rohini 73149042113
11 pages
ML_Introduction
No ratings yet
ML_Introduction
76 pages
Hair PPT Ch05
No ratings yet
Hair PPT Ch05
18 pages
Linear Regression - Jupyter Notebook
100% (3)
Linear Regression - Jupyter Notebook
56 pages
Ba All Notes Merge - Merged
No ratings yet
Ba All Notes Merge - Merged
385 pages
Machine Learning and Linear Regression
100% (1)
Machine Learning and Linear Regression
55 pages
W1.2_Regression_1
No ratings yet
W1.2_Regression_1
28 pages
Progression Linaire
No ratings yet
Progression Linaire
187 pages
module 2 modified
No ratings yet
module 2 modified
67 pages
LP III Lab Manual
100% (1)
LP III Lab Manual
8 pages
Linear Regression
No ratings yet
Linear Regression
60 pages
Regression Analysis
No ratings yet
Regression Analysis
49 pages
Lecture #2: Prediction, K-Nearest Neighbors: CS 109A, STAT 121A, AC 209A: Data Science
No ratings yet
Lecture #2: Prediction, K-Nearest Neighbors: CS 109A, STAT 121A, AC 209A: Data Science
28 pages
Module 5.2
No ratings yet
Module 5.2
51 pages
intro to regression
No ratings yet
intro to regression
4 pages
6_Classification and Regression Tasks
No ratings yet
6_Classification and Regression Tasks
115 pages
Unit 2
No ratings yet
Unit 2
92 pages
s&Ml Unit 5- q & A
No ratings yet
s&Ml Unit 5- q & A
15 pages
Intro To Regresion: Codergirl Data Analysis
No ratings yet
Intro To Regresion: Codergirl Data Analysis
32 pages
Stat 378
No ratings yet
Stat 378
73 pages
Linear Models - Numeric Prediction
No ratings yet
Linear Models - Numeric Prediction
7 pages
Machine Learning and Deep Learning Course
No ratings yet
Machine Learning and Deep Learning Course
23 pages
Supervised Machine Learning - Regression
No ratings yet
Supervised Machine Learning - Regression
34 pages
Forecasting and Learning Theory
No ratings yet
Forecasting and Learning Theory
46 pages
Regression: Unit Iii
No ratings yet
Regression: Unit Iii
54 pages
Acceptance-Rejection Sampling and Multi-dimensional Monte Carlo Integrations Utilizing Mathematica®
From Everand
Acceptance-Rejection Sampling and Multi-dimensional Monte Carlo Integrations Utilizing Mathematica®
SUJAUL CHOWDHURY
No ratings yet
Multi-dimensional Monte Carlo Integrations Utilizing Mathematica
From Everand
Multi-dimensional Monte Carlo Integrations Utilizing Mathematica
SUJAUL CHOWDHURY
No ratings yet
Process Performance Models: Statistical, Probabilistic & Simulation
From Everand
Process Performance Models: Statistical, Probabilistic & Simulation
Vishnuvarthanan Moorthy
No ratings yet
A Deep Trend-Following Trading Strategy For Equity Markets
No ratings yet
A Deep Trend-Following Trading Strategy For Equity Markets
27 pages
Geostatistical Analyst Tutorial
No ratings yet
Geostatistical Analyst Tutorial
57 pages
WEKA Lab Manual
100% (1)
WEKA Lab Manual
107 pages
Big Data Computing - Assignment 6
No ratings yet
Big Data Computing - Assignment 6
3 pages
Modeling Habitat Suitability of The Invasive Clam
No ratings yet
Modeling Habitat Suitability of The Invasive Clam
9 pages
Forecasting-Future-College-Admissions-Using-Machine-Learning
No ratings yet
Forecasting-Future-College-Admissions-Using-Machine-Learning
19 pages
Isgsr2025 Paper 20250309 Fel Final
No ratings yet
Isgsr2025 Paper 20250309 Fel Final
4 pages
CyPhyss23 DD PPT
No ratings yet
CyPhyss23 DD PPT
35 pages
A Financial Distress Pre-Warning Study by Fuzzy Regression Model of Tse-Listed Companies
No ratings yet
A Financial Distress Pre-Warning Study by Fuzzy Regression Model of Tse-Listed Companies
19 pages
Statistical Fallacies and Errors in Medical Research
No ratings yet
Statistical Fallacies and Errors in Medical Research
41 pages
Unit 1
No ratings yet
Unit 1
19 pages
Artificial Intelligence Approach For Modeling House Price Prediction
No ratings yet
Artificial Intelligence Approach For Modeling House Price Prediction
5 pages
Erotophobia Erotophilia As A Dimension of Personality
No ratings yet
Erotophobia Erotophilia As A Dimension of Personality
31 pages
Detection of Virtual Private Network Traffic v3
No ratings yet
Detection of Virtual Private Network Traffic v3
8 pages
Scikit-Learn Cheat Sheet Python For Data Science: Preprocessing The Data Evaluate Your Model's Performance
100% (1)
Scikit-Learn Cheat Sheet Python For Data Science: Preprocessing The Data Evaluate Your Model's Performance
1 page
Microsoft Test-DP-100
100% (1)
Microsoft Test-DP-100
50 pages
Hippopotamus Optimization Algorithm: A Novel Nature Inspired Optimization Algorithm
No ratings yet
Hippopotamus Optimization Algorithm: A Novel Nature Inspired Optimization Algorithm
18 pages
Artificial Intelligence - (Unit - 1)
No ratings yet
Artificial Intelligence - (Unit - 1)
47 pages
A Rapid and Nondestructive Method To Determine The Distribution Map Prot Carbo Sialic Acid On EBN - Shi
No ratings yet
A Rapid and Nondestructive Method To Determine The Distribution Map Prot Carbo Sialic Acid On EBN - Shi
7 pages
(Ebook) Statistically Sound Machine Learning for Algorithmic Trading of Financial Instruments by David Aronson, Timothy Masters ISBN 9781489507716, 148950771X download
100% (1)
(Ebook) Statistically Sound Machine Learning for Algorithmic Trading of Financial Instruments by David Aronson, Timothy Masters ISBN 9781489507716, 148950771X download
53 pages
Wine5 PDF
No ratings yet
Wine5 PDF
29 pages
Application - ASTM 7678 - Oil and Grease in Water
0% (1)
Application - ASTM 7678 - Oil and Grease in Water
5 pages
UNIT 1 Notes
No ratings yet
UNIT 1 Notes
13 pages
ML -1_Sovan_Introduction to ML
No ratings yet
ML -1_Sovan_Introduction to ML
83 pages
Predicting The Outcome of English Premier League Matches Using Machine Learning
No ratings yet
Predicting The Outcome of English Premier League Matches Using Machine Learning
6 pages
14616
No ratings yet
14616
57 pages
Team9 Report (2
No ratings yet
Team9 Report (2
29 pages
AI For The Individual Trader: John Jonelis
No ratings yet
AI For The Individual Trader: John Jonelis
24 pages

CS550 Regression

Uploaded by

CS550 Regression

Uploaded by

CS550: Machine Learning

Dr. Gagan Gupta

• The variables we use in making the predictions the features or predictor

• Here, is the unknown function expressing an underlying rule for relating to , is

• A statistical model is any algorithm that estimates . We denote the estimated

• If our model is as good as the mean value, , then

• Total Error = Bias2 + Variance + Irreducible Error

TV radio newspaper sales

230.1 37.8 69.2 22.1

to predict a response for a set of observed predictor values, we use

Note: this strategy can also be applied in classification to predict a

where are the k observations most similar to (similar refers to a notion of

• Very few assumptions made here about the nature of ‘f’

Do the same for all k’s and

• Traditional kNN method can be seen as a 0/1 weighting model

• … then it follows that our estimate is:

• where and are estimates of and respectively, that we compute

For a given data set

• we do not know the exact form of

• More data: and

Test statistic here would be

• Residuals are easier to interpret than the model

• Which of the above three is the best model?

• Using cross-validation, we generate validate the models on a portion

where is a scalar that gives the weight (or importance) of the

• Works best when least-square estimates have high variance

• Note that is the l2 norm of the vector β

• Note that is the l1 norm of the vector β

• Elastic Net has better convergence features over Lasso.

Sklearn.linear_model import ElasticNet

Sklearn.svm import ElasticNet

You might also like