0% found this document useful (0 votes)

53 views

Elements of Statistical Learning II - Ch.6 Kernel Smoothing Methods - Notes

Uploaded by

andreas.theodoulou3

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

53 views

Elements of Statistical Learning II - Ch.6 Kernel Smoothing Methods - Notes

Uploaded by

andreas.theodoulou3

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 5

Ch.

6 Kernel Smoothing methods

Applications

- (Non-parametric) Regression
- Unsupervised Learning – Density estimation
- (Non-parametric) Classification
- To do better: Gaussian Mixture models

Kernel Smoothing Regression

Summary – Kernel smoother

A kernel smoother is a statistical technique to estimate a real valued function f: Rp -> R as the
weighted average of neighbouring observed data. The weight is defined by the kernel, such that
closer points are given higher weights. The estimated function is smooth, and the level of
smoothness is set by a single parameter.

This technique is most appropriate when the dimension of the predictor is low (p < 3), for
example for data visualization.

Further info
- A class of regression techniques that achieve flexibility in estimating the
regression function f(X) over the domain Rp by fitting a different but simple model
separately at each query point x0.
- This is done by using only those observations close to the target point x0 to fit the
simple model, and in such a way that the resulting estimated function ˆf(X) is
smooth in Rp.
o This localization is achieved via a weighting function or kernel Kλ(x0, xi),
which assigns a weight to xi based on its distance from x0.
o The kernels Kλ are typically indexed by a parameter λ that dictates the
width of the neighbourhood.
Training
- These memory-based methods require in principle little or no training; all the work
gets done at evaluation time.
- The only parameter that needs to be determined from the training data is λ. The
model, however, is the entire training data set.

Simple kernel smoothing (locally weighted average of our points)

Mathematically (for 1-d kernel smoother)

- Model is the entire training data set because all data go into Kλ(x0, xi) through xi.
x0 is any point within the continuous x axis. By taking all x’s across the x-axis as
x0 then we can construct our function approximation

- hλ(x0) is a width function (indexed by λ) that determines the width of the

neighbourhood at x0.
o Can be adaptive or constant (e.g. hλ(x0) = λ for epanechikov quadratic
kernel)
- E.g. D(t) = ¾ * (1 – t2) if |t| < 1, 0 otherwise (Epanechnikov quadratic kernel)
Extra Kernel Smoothing Regression parts

Kernel smoothing of local regressions (kernel weighted local regression) (Improvement to

simple kernel smoothing)
- Local Linear regression
- Local Polynomial regression

What are they

- With kernel smoothing essentially instead of taking raw moving average we take
a smoothly varying locally weighted average (through the kernel weighting)
- We will now move a step further and take a weighed local regression instead of
taking the locally weighted average around a point (do this again for all points)

What do they help in?

We have progressed from the raw moving average to a smoothly varying locally
weighted average by using kernel weighting. The smooth kernel fit still has problems,
however, as exhibited in Figure 6.3 (left panel). Locally weighted averages can be badly
biased on the boundaries of the domain, because of the asymmetry of the kernel in that
region. By fitting straight lines rather than constants locally, we can remove this bias
exactly to first order; see Figure 6.3 (right panel). Actually, this bias can be present in the
interior of the domain as well, if the X values are not equally spaced (for the same
reasons, but usually less severe). Again locally weighted linear regression will make a
first-order correction.

To summarize some collected wisdom on this issue:

- Local linear fits can help bias dramatically at the boundaries at a modest cost in
variance. Local quadratic fits do little at the boundaries for bias, but increase the
variance a lot.
- Local quadratic fits tend to be most helpful in reducing bias due to curvature in
the interior of the domain.
- Asymptotic analysis suggest that local polynomials of odd degree dominate those
of even degree. This is largely due to the fact that asymptotically the MSE is
dominated by boundary effects.

Selecting width of kernel

In each of the kernels Kλ, λ is a parameter that controls its width:
- For the Epanechnikov or tri-cube kernel with metric width, λ is the radius of the
support region.
- For the Gaussian kernel, λ is the standard deviation.
- λ is the number k of nearest neighbors in k-nearest neighborhoods, often
expressed as a fraction or span k/N of the total training sample.

Note: There is a natural bias–variance tradeoff as we change the width of the averaging
window, which is most explicit for local averages:

Structured local regression

- Structured kernels
- Structured regression functions

Kernel Density Estimation

- e.g. Gaussian Kernel (1D):

Radial Basis functions (RBF)
- Note: Kernels can be thought of as radial functions/kernels. (They are functions
that their values depends on the distance between the input (x i or x) and some
fixed point (x0 or ξj))
- Using kernels as basis functions -> radial basis functions
o i.e. transforming data into a another feature space before applying an
algorithm
- Can use them to do least squares regression e.g., or any other algorithms (e.g.
that assumes linearity in the feature space and without transformation/rbfs of
feature space we don’t expect linearity)
o PCA (kernel-PCA), k-means (kernel k-means; spectral clustering) etc

E.g. Kernel Linear Regression model

o
o With Gaussian kernel = RBF network

Note: Radial basis functions form the bridge between the modern “kernel methods” (ML –
e.g. kernel SVM) and local fitting technology (i.e. kernel smoothing methods)

Training:
- Similar as explained below for RBFs represented as NN except we do OLS to
estimate betas.
o Get ξj through the centres of k-means clustering (one way to do it)
o Lambda can be determined from kernel (e.g for Gaussian is standard
deviation)
o OLS for betas

Explained as Neural Networks

https://en.wikipedia.org/wiki/Radial_basis_function_network
Training
RBF networks are typically trained from pairs of input and target values x(t) ,y(t) by a two-step
algorithm.
In the first step, the center vectors ci of the RBF functions in the hidden layer are chosen. This
step can be performed in several ways; centers can be randomly sampled from some set of
examples, or they can be determined using k-means clustering. Note that this step
is unsupervised.
The second step simply fits a linear model with coefficients wi to the hidden layer's outputs with
respect to some objective function. A common objective function, at least for regression/function
estimation, is the least squares function:

Kernels and RBF in SVM: http://cs229.stanford.edu/notes/cs229-notes3.pdf (Andrew NG lecture

ntes)

Mixture Models for Density Estimation and Classification

Density estimation, f(x)

- f(x) = Linear combination of densities of x (with different parameters, e.g. mean
and covariance matrix) and weights alpha
o Gaussian mixture model – Gaussian densities with different mean and
covariance matrix
o Fit my maximum likelihood, using the EM algorithm (Ch.8)
o The density can be viewed as a kind of kernel

Classification
- Probability xi belongs to class/component m, rim = value of weighted component
m (from a term m from the linear combination f(x))/sum of the linear combination
f(x)
o e.g. xi is age, f(x) is as defined above in density estimation
o Suppose we threshold each value ri2 and hence define δi = I(ri2 > 0.5)

Additional notes
Computational Considerations
- Kernel and local regression and density estimation are memory-based methods.
For many real-time applications, this can make this class of methods
(computationally) infeasible.

Kernel Smoothing-MP Wand-MC Jones-1995
100% (1)
Kernel Smoothing-MP Wand-MC Jones-1995
228 pages
Hardle - Applied Nonparametric Regression
No ratings yet
Hardle - Applied Nonparametric Regression
433 pages
Mathematics of Neural Networks. Models, Algorithms and Applications (PDFDrive)
No ratings yet
Mathematics of Neural Networks. Models, Algorithms and Applications (PDFDrive)
423 pages
Jeff Augen - Trading Options at Expiration-Strategies and Models For Winning The Endgame
100% (1)
Jeff Augen - Trading Options at Expiration-Strategies and Models For Winning The Endgame
6 pages
A Kernel Smoother Is A Statistical Technique For Estimating A Real Valued Function
No ratings yet
A Kernel Smoother Is A Statistical Technique For Estimating A Real Valued Function
6 pages
Kernel
No ratings yet
Kernel
3 pages
Smoothing: Smooth
No ratings yet
Smoothing: Smooth
19 pages
SDV
No ratings yet
SDV
82 pages
Lecture03_kernel
No ratings yet
Lecture03_kernel
28 pages
Intro&NP Stat
No ratings yet
Intro&NP Stat
122 pages
Kernel Methods: Dept. Computer Science & Engineering, Shanghai Jiao Tong University
No ratings yet
Kernel Methods: Dept. Computer Science & Engineering, Shanghai Jiao Tong University
29 pages
15 Splines
No ratings yet
15 Splines
51 pages
Applied Nonparametric Regression
No ratings yet
Applied Nonparametric Regression
433 pages
(eBook-PDF) - Statistics - Applied Nonparametric Regression
No ratings yet
(eBook-PDF) - Statistics - Applied Nonparametric Regression
433 pages
Applied Nonparametric Regression: Wolfgang H Ardle
No ratings yet
Applied Nonparametric Regression: Wolfgang H Ardle
433 pages
Kernel Smoothers: An Overview of Curve Estimators For The First Graduate Course in Nonparametric Statistics
No ratings yet
Kernel Smoothers: An Overview of Curve Estimators For The First Graduate Course in Nonparametric Statistics
13 pages
Nonlinear Regression
No ratings yet
Nonlinear Regression
8 pages
slides3part2-mrbm2324
No ratings yet
slides3part2-mrbm2324
23 pages
non-par-regression
No ratings yet
non-par-regression
35 pages
Nonparametric Regression Analysis: Chapter Three
No ratings yet
Nonparametric Regression Analysis: Chapter Three
11 pages
An Introduction To Kernel Methods: C. Campbell
No ratings yet
An Introduction To Kernel Methods: C. Campbell
38 pages
slides4-mrbm2324
No ratings yet
slides4-mrbm2324
40 pages
Calonico Cattaneo Farrel 2019 Nprobust
No ratings yet
Calonico Cattaneo Farrel 2019 Nprobust
33 pages
Time Series Forecasting by Using Wavelet Kernel SVM
No ratings yet
Time Series Forecasting by Using Wavelet Kernel SVM
52 pages
tr867 PDF
No ratings yet
tr867 PDF
48 pages
W6a Gaussian Process Kernels
No ratings yet
W6a Gaussian Process Kernels
6 pages
Nonparametric and Semiparametric Models
No ratings yet
Nonparametric and Semiparametric Models
325 pages
Nonparametric regression
No ratings yet
Nonparametric regression
7 pages
intro to regression
No ratings yet
intro to regression
4 pages
Kde Presentation PDF
No ratings yet
Kde Presentation PDF
105 pages
4.4-InstanceBasedLearning Part 2
No ratings yet
4.4-InstanceBasedLearning Part 2
16 pages
Article Kernal Model
No ratings yet
Article Kernal Model
9 pages
Cours2 ML
No ratings yet
Cours2 ML
21 pages
BOOK Nonparametric and Semiparametric Models-2004
No ratings yet
BOOK Nonparametric and Semiparametric Models-2004
87 pages
ml_cheat (1)
No ratings yet
ml_cheat (1)
9 pages
SVM Intro
No ratings yet
SVM Intro
23 pages
K-Nearest-Neighbors Regression: I2n (X) I K 1 N
No ratings yet
K-Nearest-Neighbors Regression: I2n (X) I K 1 N
3 pages
ML 2024 Part4 More Methods
No ratings yet
ML 2024 Part4 More Methods
90 pages
Multivariat Kernel Regression
No ratings yet
Multivariat Kernel Regression
3 pages
Tutorial 7 Machine Learning Algorithms
No ratings yet
Tutorial 7 Machine Learning Algorithms
30 pages
SVM Kernel Functions
No ratings yet
SVM Kernel Functions
12 pages
Local Linear Regression For Functional Data: Alain Berlinet, Abdallah Elamine, André Mas Université Montpellier 2
No ratings yet
Local Linear Regression For Functional Data: Alain Berlinet, Abdallah Elamine, André Mas Université Montpellier 2
23 pages
ML Lecture06 2
No ratings yet
ML Lecture06 2
63 pages
The Annals of Statistics 10.1214/009053606000000830 Institute of Mathematical Statistics
No ratings yet
The Annals of Statistics 10.1214/009053606000000830 Institute of Mathematical Statistics
22 pages
UNIT3 Machine Learning
No ratings yet
UNIT3 Machine Learning
53 pages
Important Questions
No ratings yet
Important Questions
18 pages
wahba_improper_priors
No ratings yet
wahba_improper_priors
9 pages
SVM
No ratings yet
SVM
12 pages
Introduction To Kernels: Max Welling
No ratings yet
Introduction To Kernels: Max Welling
16 pages
Estimating Penalized Spline Regressions: Theory and Application To Economics
No ratings yet
Estimating Penalized Spline Regressions: Theory and Application To Economics
16 pages
Lecture17 Kernels
No ratings yet
Lecture17 Kernels
23 pages
Kernel Models 1233
No ratings yet
Kernel Models 1233
56 pages
unit-2.pptx
No ratings yet
unit-2.pptx
133 pages
Hands-On Machine Learning: Chapter 5: Support Vector Machines
No ratings yet
Hands-On Machine Learning: Chapter 5: Support Vector Machines
32 pages
Literature Review For LPR
No ratings yet
Literature Review For LPR
23 pages
Support Vector Machines For Classification and Regression: Steve R. Gunn
No ratings yet
Support Vector Machines For Classification and Regression: Steve R. Gunn
66 pages
Radial Basis Networks: Fundamentals and Applications for The Activation Functions of Artificial Neural Networks
From Everand
Radial Basis Networks: Fundamentals and Applications for The Activation Functions of Artificial Neural Networks
Fouad Sabry
No ratings yet
K Nearest Neighbor Algorithm: Fundamentals and Applications
From Everand
K Nearest Neighbor Algorithm: Fundamentals and Applications
Fouad Sabry
No ratings yet
Ordered Weighted Averaging Aggregation Operator: Fundamentals and Applications
From Everand
Ordered Weighted Averaging Aggregation Operator: Fundamentals and Applications
Fouad Sabry
No ratings yet
Flood Fill: Flood Fill: Exploring Computer Vision's Dynamic Terrain
From Everand
Flood Fill: Flood Fill: Exploring Computer Vision's Dynamic Terrain
Fouad Sabry
No ratings yet
Backpropagation: Fundamentals and Applications for Preparing Data for Training in Deep Learning
From Everand
Backpropagation: Fundamentals and Applications for Preparing Data for Training in Deep Learning
Fouad Sabry
No ratings yet
Gauss Nodes Revolution: Numerical Integration Theory Radically Simplified And Generalised
From Everand
Gauss Nodes Revolution: Numerical Integration Theory Radically Simplified And Generalised
Rob Porter
No ratings yet
LLMs - Adapting Foundation Models - Notes
No ratings yet
LLMs - Adapting Foundation Models - Notes
3 pages
Feature Selection
No ratings yet
Feature Selection
1 page
Interpretable Machine Learning (Christoph Molnar) notes
No ratings yet
Interpretable Machine Learning (Christoph Molnar) notes
9 pages
Elements of Statistical Learning II - Ch.7 Model Assessment and Selection - Notes
No ratings yet
Elements of Statistical Learning II - Ch.7 Model Assessment and Selection - Notes
2 pages
Elements of Statistical Learning II - Ch.3 Linear Regression - Notes
No ratings yet
Elements of Statistical Learning II - Ch.3 Linear Regression - Notes
4 pages
Soft Computing Unit 2 Notes..
No ratings yet
Soft Computing Unit 2 Notes..
24 pages
Radial Basis Function Networks: Applications: Introduction To Neural Networks: Lecture 14
No ratings yet
Radial Basis Function Networks: Applications: Introduction To Neural Networks: Lecture 14
14 pages
Swagat Kumar: PHD Thesis: Kinematic Control of Redundant Manipulators Using Neural Networks
No ratings yet
Swagat Kumar: PHD Thesis: Kinematic Control of Redundant Manipulators Using Neural Networks
4 pages
Sahoo2009 PDF
No ratings yet
Sahoo2009 PDF
18 pages
Fundamentals of ANN
No ratings yet
Fundamentals of ANN
213 pages
Data-Driven Modelling. Exercise ANN-1. Prediction of Flow Using Artificial Neural Networks (ANN)
No ratings yet
Data-Driven Modelling. Exercise ANN-1. Prediction of Flow Using Artificial Neural Networks (ANN)
9 pages
Stator Fault Analysis of Three-Phase Induction Motors Usinginformation Measures and Artificial Neural Networks
No ratings yet
Stator Fault Analysis of Three-Phase Induction Motors Usinginformation Measures and Artificial Neural Networks
10 pages
Development and Comparison of Neural Network Based Soft Sensors For Online Estimation of Cement Clinker Quality
No ratings yet
Development and Comparison of Neural Network Based Soft Sensors For Online Estimation of Cement Clinker Quality
11 pages
Download Neural Networks in Healthcare Potential And Challenges Rezaul Begg ebook All Chapters PDF
100% (24)
Download Neural Networks in Healthcare Potential And Challenges Rezaul Begg ebook All Chapters PDF
75 pages
Power Plant Diagnosis and Fault Detection
No ratings yet
Power Plant Diagnosis and Fault Detection
6 pages
Intelligent Optimal-Setting Control For Grinding Circuits of Mineral Processing Process
No ratings yet
Intelligent Optimal-Setting Control For Grinding Circuits of Mineral Processing Process
14 pages
07cp18 Neural Networks and Applications 3 0 0 100
No ratings yet
07cp18 Neural Networks and Applications 3 0 0 100
2 pages
Autotuning PIDControllersfor Quadplane Hybrid UAVusing Differential Evolution Algorithm
No ratings yet
Autotuning PIDControllersfor Quadplane Hybrid UAVusing Differential Evolution Algorithm
17 pages
Introduction To Radial Basis Function Networks
No ratings yet
Introduction To Radial Basis Function Networks
45 pages
Module 5 AIML
No ratings yet
Module 5 AIML
18 pages
Data Anomaly Diagnosis Method of Temperature Sensor
No ratings yet
Data Anomaly Diagnosis Method of Temperature Sensor
11 pages
Chapter - 6 Artificial Neural Network (Ann) Modeling
No ratings yet
Chapter - 6 Artificial Neural Network (Ann) Modeling
24 pages
acis.20140206.11
No ratings yet
acis.20140206.11
5 pages
Computer and Communication
No ratings yet
Computer and Communication
40 pages
Calculation_method_of_ship_collision_force_on_brid
No ratings yet
Calculation_method_of_ship_collision_force_on_brid
11 pages
Part 2
No ratings yet
Part 2
165 pages
RGTU New Syllabus - EC 7 & 8 Sem 2010 - 11
100% (2)
RGTU New Syllabus - EC 7 & 8 Sem 2010 - 11
22 pages
Robot Tele
No ratings yet
Robot Tele
5 pages
Fundamentals of Soft Computing
No ratings yet
Fundamentals of Soft Computing
256 pages
A Survey of Fitness Approximation Methods Applied in Evolutionary Algorithms
No ratings yet
A Survey of Fitness Approximation Methods Applied in Evolutionary Algorithms
26 pages
Numbers of Classifier
No ratings yet
Numbers of Classifier
49 pages
Data Mining of Agricultural Yield Data - A Comparison of Regression Models
No ratings yet
Data Mining of Agricultural Yield Data - A Comparison of Regression Models
15 pages
Artificial Neural Networks For Machining Processes
No ratings yet
Artificial Neural Networks For Machining Processes
25 pages

Elements of Statistical Learning II - Ch.6 Kernel Smoothing Methods - Notes

Uploaded by

Elements of Statistical Learning II - Ch.6 Kernel Smoothing Methods - Notes

Uploaded by

Ch.

6 Kernel Smoothing methods

Kernel Smoothing Regression

Summary – Kernel smoother

Simple kernel smoothing (locally weighted average of our points)

- hλ(x0) is a width function (indexed by λ) that determines the width of the

Kernel smoothing of local regressions (kernel weighted local regression) (Improvement to

What are they

What do they help in?

To summarize some collected wisdom on this issue:

Selecting width of kernel

Structured local regression

Kernel Density Estimation

- e.g. Gaussian Kernel (1D):

E.g. Kernel Linear Regression model

Explained as Neural Networks

Kernels and RBF in SVM: http://cs229.stanford.edu/notes/cs229-notes3.pdf (Andrew NG lecture

Mixture Models for Density Estimation and Classification

Density estimation, f(x)

You might also like