Introduction ML
Introduction ML
Introduction
林彥宇 教授
Yen-Yu Lin, Professor
國立陽明交通大學 資訊工程學系
Computer Science, National Yang Ming Chiao Tung University
2
Pattern recognition and machine learning
3
Applications of machine learning
• Computer vision
• Speech recognition
• Information retrieval
• Natural language processing
• Robotics
• Bioinformatics
• Data mining
• Finance
• …
4
Problem definition of a machine learning task
• Training data
➢ A set of N training data {x1, x2, …, xN}, sometimes together with
their target vectors {t1, t2, …, tN}
• Feature extraction
➢ Original input variables are usually transformed into some new
space of variables, where the problem can be better handled
• Model learning
➢ We learn a proper model for the problem
• Generalization or testing
➢ To correctly predict new examples (testing data) that differ from
those used for training
5
Cat image classification: Training data
6
Cat image classification: Feature extraction
pose variations
7
Cat image classification: Model learning
Classifier
8
Cat image classification: Testing
9
Cat image classification: Testing
10
Regression
a real value
x y
11
Supervised vs. Unsupervised learning
12
Good vs. bad features for classification
Feature B
Feature B
Feature A
Feature B
Feature A
bad features
Feature A
good features
13
Good vs. bad features for regression
Output Value
Output Value
Feature
Output Value
Feature
bad feature
Feature
good feature
14
Supervised vs. Unsupervised learning
15
Unsupervised learning for clustering
k-mean clustering
16
Unsupervised learning for dimensionality reduction
PCA: Principal
component
analysis
17
Unsupervised learning for density estimation
kernel density
estimation (KDE)
18
Unsupervised learning for data generation
Generative Adversarial
Networks (GAN): Given a
set of images, generate
new images from the
same distributions
19
Applications of data generation
• Face synthesis
22
Polynomial curve fitting: Error function
➢ Differentiable
➢ Closed form solution
23
Polynomial curve fitting: Model selection
• Under-fitting: M = 0 or M =1
➢ The constant or first order polynomial gives poor fit due to
insufficient flexibility
• The third order polynomial gives the best fit
• Over-fitting: M = 9
➢ All training points are perfectly fitted
➢ Poor representation of the green curve
➢ The generalization is poor
25
Polynomial curve fitting: Generalization
26
Polynomial curve fitting: Generalization
27
Polynomial curve fitting: Data size vs. Over-fitting
M=9
28
Polynomial curve fitting: Regularization
30
Probability theory
31
A toy examples
32
Probability theory: A two-variable case
• Some notations
➢ Let the number of trails where 𝑋 = 𝑥𝑖 and 𝑌 = 𝑦𝑗 be 𝑛𝑖𝑗
➢ Let the number of trails where 𝑋 takes value 𝑥𝑖 be 𝑐𝑖
➢ Let the number of trails where 𝑌 takes value 𝑦𝑗 be 𝑟𝑗
33
Joint, marginal, and conditional probabilities
34
Joint, marginal, and conditional probabilities
35
Joint, marginal, and conditional probabilities
• It is defined by
37
Bayes’ theorem
38
Probability with continuous variables
39
Sum rule and product rule
40
Expectations and covariances
41
Expectations and covariances
42
Gaussian distribution
43
Mean and variance of a Gaussian distribution
44
Multivariate Gaussian
45
Bayes’ theorem for polynomial curve fitting
47
Determining Gaussian parameters by maximum
likelihood
48
Determining Gaussian parameters by maximum
likelihood
• Since the data are i.i.d., the likelihood function of data given
mean 𝜇 and variance 𝜎 2 is
49
Probabilistic perspective of polynomial curve fitting
50
Probabilistic perspective of polynomial curve fitting
51
Maximum likelihood solution
52
Maximum likelihood solution
53
Maximum a posterior (MAP) solution
54
Maximum a posterior (MAP) solution
55
Bayesian curve fitting
56
Probabilistic polynomial curve fitting
p(t | x, D) =
57
Model selection
58
Model selection
59
Model selection via cross validation
• S-fold cross-validation
➢ Partition training data into S equal-sized groups
➢ S-1 groups are used to train the model that is evaluated on the
remaining group
➢ Repeat the procedure for all S possible runs
➢ Average the performance
60
Drawbacks of model selection
• Some drawbacks
➢ The number of training runs increases by a factor of S
➢ The number of hyperparameter value combinations increases
exponentially
61
Summary
• Probability density
➢ Expectation, variance, and covariance
➢ Gaussian distribution
62
Summary
• Bayes’ theorem
63
References
64
Thank You for Your Attention!
65