Linear Methods For Classification
Linear Methods For Classification
5/16/12 By Hakan
Introduction
Basic setup of a classification problem. Understanding the Bayes classification rule. Understanding the classification approach by linear regression of indicator matrix. Understanding the phenomenon of masking.
5/16/12
Training data: {(x1,g1), (x2,g2), ..., (xN,gN)}. The feature vectorX= (X1,X2, ... ,Xp), where each variableXjis quantitative. The response variable G is categorical. G G = {1, 2, ... ,K} Form a predictorG(x) to predictGbased onX.
5/16/12
G(x) divides the input space (feature vector space) into a collection of regions, each labeled by one class. See the
5/16/12
Linear Methods
Decision boundariesare linear: linear methods for classification. Two class problem: The decision boundary between the two classes is a hyperplane in the feature vector space. A hyperplane in thepdimensional input space is the set:
5/16/12
Linear Methods
More than two classes: The decision boundary between any pair of classkandlis a hyperplane How do you choose the hyperplane? 5/16/12
Linear Methods
Linear regression of an indicator matrix. Linear discriminant analysis. Logistic regression. Rosenblatts perceptron Learning algorithm
5/16/12
5/16/12
5/16/12
Y1Y2Y3Y4
1 0 0 0 0 0 1 0 0 1 0 0 0 0 0 1
5/16/12
Verification of
We want to prove which is equivalent to prove
(Eq. 1) Notice
(Eq. 5/16/12 2)
Remarks
with 2 classes, linear discriminant analysis classification with linear least square with more than 2 classes : avoid masking problems if not common covariance matrix, quadratic discriminant analysis
5/16/12
Compromise between linear discriminant analysis (LDA) and quadratic discriminant analysis (QDA) regularized covariance matrix :
covariance matrix used in LDA determined by crossvalidation
5/16/12
Computations
(eigen-decomposition) Algorithm :
Fisher : Find the linear combination Z=aTX such that the between-class variance is maximized relative to the withinclass variance.
Logistic regression
5/16/12
usually, by maximum likelihood (Newton-Raphson algorithm to solve the score equations) example : K =2 (2 groups), write log-likelihood
5/16/12
encode
5/16/12
same form BUT differences in the way the coefficients are estimated : logistic regression : more general, less assumptions (arbitrary density function for X), more robust BUT very similar results in practice
5/16/12
Separating hyperplanes
vector normal to the surface L for any point x0 in L, the signed distance of any point x to L is given by :
5/16/12
try to separate hyperplanes by minimizing the distance of missclassified points to the decisison if is misclassified, then boundary if is misclassified, then M is the index set of missclassified The algorithm uses stochastic gradient descent to minimize points.
this piecewise linear criterion.
5/16/12
minim ize
find hyperplane that minimizes some measure of overlap in the training data. advantages over Rosenblatts algorithm :
Resources
5/16/12
Thank you
5/16/12