0% found this document useful (0 votes)
27 views

Machine Learning - Overview

This document provides an overview of machine learning algorithms including generative and discriminative algorithms. It summarizes parametric and non-parametric density based classification methods such as QDA, LDA, NMC, histograms, Parzen windows, k-NN, and Naive Bayes. It also covers linear classifiers like logistic regression and SVMs, non-linear classifiers including decision trees, MLPs, and combining classifiers. Dimensionality reduction techniques such as PCA are discussed. Finally, it outlines various evaluation methods for machine learning models like test/train sets, cross-validation, bootstrapping, learning curves, and performance metrics like ROC curves and confusion matrices.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
27 views

Machine Learning - Overview

This document provides an overview of machine learning algorithms including generative and discriminative algorithms. It summarizes parametric and non-parametric density based classification methods such as QDA, LDA, NMC, histograms, Parzen windows, k-NN, and Naive Bayes. It also covers linear classifiers like logistic regression and SVMs, non-linear classifiers including decision trees, MLPs, and combining classifiers. Dimensionality reduction techniques such as PCA are discussed. Finally, it outlines various evaluation methods for machine learning models like test/train sets, cross-validation, bootstrapping, learning curves, and performance metrics like ROC curves and confusion matrices.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

Machine Learning - Overview

Generative algorithm

Covariance matrix =

Parametric density based classification


- QDA
- For each class, compute covariance matrix and mean
- Get g(x) for each class → pdf estimation of each class
- Boundary → g(x) - g(x)
- LDA
- For each class, compute mean
- Take averaged covariance matrix
- Boundary same as before
- NMC
- For each class, compute mean
- Take identity covariance matrix
- Boundary → same as before
Non-parametric density based classification
- Histogram
- Parzen
- Place cells around points
- Define cell shape
- Fix cell size
- Optimize :
- Heuristic
- Optimize likelihood
- Average k-nearest neighbor distance
- From the new pdf → classify with a boundary

- K-nn
- Start at a data point
- Grow a sphere (no fixed size)
- Stop when over k-points are included in the
sphere
- Majority vote
- Naive Bayes
- For each class
- For each feature → compute p(x|y) (just count the occurences)
Discriminative algorithms

Linear classifiers
Logistic regression
- Cost function → come up with a good weight vector
- Gradient descent
- Algorithm :
- For each class :
- Start with random values for theta
- Compute Cost() → if < tolerance : break
- Cost use hypothesis (which uses sigmoid function) on each element
- Sigmoid function output the probability that p(x|y) - check if it is
correct through cost function
- Compute new values of theta with gradient descent
- → linear discriminant is a linear equation
- Decision boundary = p(y1|x) = p(y2/x)
Multiclass classifiers
- One versus one
- One versus the rest
SVM
- Points not linearly separable → penalty for missing a
point
- Hinge loss
Non linear classifiers
Decision trees
- ID3
- Decide which attribute to split on
- For each value → new child node
- Split training examples
- If pure / acceptable → stop
- Best attribute ?
- Purity
- On each attribute
- Calculate entropy
- Information gain
- Penalize attributes with many children - attributes:

-
- Avoid overfitting : pruning
- Measure performance when removing the node and children from the tree
- Remove node resulting in greatest improvement
- Repeat til further pruning is harmful
MLP
- Single class
- Activation function
- Linear
- Multiple class
- Non linear
Multi layer perceptron
- 1. Calculate position of data point to each of the decision boundaries
- 2. Combine results of 1. to determine position of the data point to both decision
boundaries and determine class
- Passes
- Feed-forward
- Backpropagation
Combine classifier
- Steps
- Take input pattern
- Run through all classifiers
- Put all these in a combiner
- Kullback leibler
- Majority voting

PCA
- Find eigenpairs of the covariance XX​T​ (X being zero-mean)
- Pivotal condensation
- Power iteration

-
- Sensitive to large values → scaling
- Scree plot
Clustering
- Proximity measure
- Partition
- K-means
- Hierarchical techniques
- Single linkage
- Average linkage
- Complete linkage
Pros Cons

QDA More precise Needs a lot of data


Fast classification Curse of dimensionality
Gaussian assumptions
Training time

LDA Linear decision boundary Gaussian assumptions


Fast classification Training time
Easy

NMC Less accurate Needs less data


Doesn’t take variance into account

Parzen Small computation time ???

K-NN Simple / eazy to implement Curse of dimensionality


Distance metrics Imbalanced data cause problems
No assumption (almost) Outlier sensitive
Slow
Memory
Hard to find optimal k
Missing data

Naive Bayes Can’t deal with correlated features

Hierarchical Dendrogram gives an overview of Computationally expensive / limited to


clustering all possible clusterings hierarchical nestings

When is scaling important ?


- If classifier depends on distances
- If you don’t know anything beforehand
When and why non linear classifiers are used ?

Evaluation
Error estimation
- Test/training set division
-
- Bootstrapping
- Error = avg(errors)
- k-fold cross validation
- Error = avg(errors)
- LOO cross-validation
- Double cross validation
- Cross validation inception
- Learning curves
- Feature curves
- Bias-variance dilemma
- Confusion matrices
- Rejection curves
- Outlier
- Ambiguity
- ROC curve
- Plot error of each class in a graph
- AUC
Evaluate a trained Machine Learning model,
explain why a training and test dataset is needed,
explain what crossvalidation is, and bootstrapping
explain and compute learning curves
avoid overfitting by performing regularisation
explain reject curves and ROC curves

You might also like