0% found this document useful (0 votes)
92 views

Linear Methods For Classification

1) The document discusses various linear methods for classification problems, including linear regression of an indicator matrix, linear discriminant analysis, logistic regression, and perceptron learning algorithms. 2) These linear methods use linear decision boundaries to divide the feature space and classify data points, such as using a hyperplane for problems with two classes. 3) The methods are compared, such as logistic regression being more general than linear discriminant analysis with fewer assumptions, though they often produce similar results. Finding the optimal separating hyperplane that minimizes overlap is discussed.

Uploaded by

elcebir
Copyright
© Attribution Non-Commercial (BY-NC)
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
92 views

Linear Methods For Classification

1) The document discusses various linear methods for classification problems, including linear regression of an indicator matrix, linear discriminant analysis, logistic regression, and perceptron learning algorithms. 2) These linear methods use linear decision boundaries to divide the feature space and classify data points, such as using a hyperplane for problems with two classes. 3) The methods are compared, such as logistic regression being more general than linear discriminant analysis with fewer assumptions, though they often produce similar results. Finding the optimal separating hyperplane that minimizes overlap is discussed.

Uploaded by

elcebir
Copyright
© Attribution Non-Commercial (BY-NC)
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 29

Linear Methods for Classification

as. Prof. Dr. abdulhamit subasi

5/16/12 By Hakan

Machine balk stilini dzenlemek Asl alt Learning Presentation

Introduction

Basic setup of a classification problem. Understanding the Bayes classification rule. Understanding the classification approach by linear regression of indicator matrix. Understanding the phenomenon of masking.

5/16/12

Setup for supervised Learning


Training data: {(x1,g1), (x2,g2), ..., (xN,gN)}. The feature vectorX= (X1,X2, ... ,Xp), where each variableXjis quantitative. The response variable G is categorical. G G = {1, 2, ... ,K} Form a predictorG(x) to predictGbased onX.
5/16/12

Setup for Supervised Learning

G(x) divides the input space (feature vector space) into a collection of regions, each labeled by one class. See the

5/16/12

Linear Methods

Decision boundariesare linear: linear methods for classification. Two class problem: The decision boundary between the two classes is a hyperplane in the feature vector space. A hyperplane in thepdimensional input space is the set:
5/16/12

Linear Methods

The two regions separated by a hyperplane:

More than two classes: The decision boundary between any pair of classkandlis a hyperplane How do you choose the hyperplane? 5/16/12

Linear Methods

Example methods for deciding the hyperplane:


Linear regression of an indicator matrix. Linear discriminant analysis. Logistic regression. Rosenblatts perceptron Learning algorithm

5/16/12 Note: Linear decision boundaries are not necessarily

The Bayes Classification Rule

5/16/12

5/16/12

5/16/12

Linear Regression of an Indicator Matrix


g
1 3 2 4

Y1Y2Y3Y4
1 0 0 0 0 0 1 0 0 1 0 0 0 0 0 1

5/16/12

Linear Regression Fit to the Class Indicator Variables

Verification of
We want to prove which is equivalent to prove

(Eq. 1) Notice
(Eq. 5/16/12 2)

Linear Regression Fit to the Class Indicator Variables


And the augmented X has

5/16/12 From Eq. 2: we can see that

Linear Regression Fit to the Class Indicator Variables Eq. 1 becomes:

True for any x.


5/16/12

Linear discriminant analysis


density of X in class prior G=k probability fk(x) Gaussian and the class have a common covariance matrix log-ratio : is linear in x decision boundaries are linear discriminant function : classification :
5/16/12

Remarks

with 2 classes, linear discriminant analysis classification with linear least square with more than 2 classes : avoid masking problems if not common covariance matrix, quadratic discriminant analysis
5/16/12

Regularized discriminant analysis (RDA)

Compromise between linear discriminant analysis (LDA) and quadratic discriminant analysis (QDA) regularized covariance matrix :
covariance matrix used in LDA determined by crossvalidation
5/16/12

Computations

Simplified by diagonalisation of covariance matrices

(eigen-decomposition) Algorithm :

Sphere the data X (using Eigendecomposition of


5/16/12 common

the covariance matrix)

Reduced-rank linear discriminant analysis

Fisher : Find the linear combination Z=aTX such that the between-class variance is maximized relative to the withinclass variance.

maximizing the Rayleigh quotient B where : : Between-class covariance W : within-class covariance


5/16/12

Logistic regression

model specified by K-1 log-odds or logit transformations :

5/16/12

Fitting logistic regression model

usually, by maximum likelihood (Newton-Raphson algorithm to solve the score equations) example : K =2 (2 groups), write log-likelihood
5/16/12

encode

Example : South african heart disease


correlation between the set of predictors surprising results : some variables not included in the logistic model
5/16/12

Quadratic approximations and inference


weigth s

5/16/12

Differences between LDA and logistic regression

same form BUT differences in the way the coefficients are estimated : logistic regression : more general, less assumptions (arbitrary density function for X), more robust BUT very similar results in practice

5/16/12

Separating hyperplanes

perceptron = classifiers such as : hyperplane or affine set L : defined by the equation


Properties

vector normal to the surface L for any point x0 in L, the signed distance of any point x to L is given by :
5/16/12

Rosenblatts perceptron learning algorithm

try to separate hyperplanes by minimizing the distance of missclassified points to the decisison if is misclassified, then boundary if is misclassified, then M is the index set of missclassified The algorithm uses stochastic gradient descent to minimize points.
this piecewise linear criterion.
5/16/12

minim ize

Optimal separating hyperplanes

find hyperplane that minimizes some measure of overlap in the training data. advantages over Rosenblatts algorithm :

unique solution better classification performance on test data

least square 2 solutions by perceptron algorithm with different 5/16/12

Resources

Celine BUGLI `The elements of Statistical Learning`

5/16/12

Thank you

5/16/12

You might also like