01 Introduction 1
01 Introduction 1
Machine
Learning
1
What is Machine Learning?
“Learning is any process by which a system
improves performance from experience.”
- Herbert Simon
Data Output
Computer
Program
Machine
Learning
Data Program
Computer
Output
4
When Do We Use Machine
Learning?
ML is used when:
• Human expertise does not exist (navigating on Mars)
• Humans can’t explain their expertise (speech recognition)
• Models must be customized (personalized medicine)
• Models are based on huge amounts of data (genomics)
6
Some more examples of tasks that are
best solved by using a learning
algorithm
• Recognizing patterns:
– Facial identities or facial expressions
– Handwritten or spoken words
– Medical images
• Generating patterns:
– Generating images or motion sequences
• Recognizing anomalies:
– Unusual credit card transactions
– Unusual patterns of sensor readings in a nuclear power
plant
• Prediction:
7
– Future stock prices or currency exchange rates
8
Sample Applications
• Web search
• Computational biology
• Finance
• E-commerce
• Space exploration
• Robotics
• Information extraction
• Social networks
• Debugging software
• [Your favorite area]
8
Slide credit: Pedro
Samuel’s Checkers-Player
“Machine Learning: Field of study that gives
computers the ability to learn without being
explicitly programmed.” -Arthur Samuel
(1959)
9
Defining the Learning Task
Improve on task T, with respect to
performance metric P, based on
experience E
T: Playing checkers
P: Percentage of games won against an arbitrary
opponent E: Playing practice games against itself
11
Autonomous Cars
13
Autonomous Car Technology
Path
Plannin
g
Laser Terrain
Mapping
Stanley
1
5
Deep Belief Net on Face Images
object models
object parts
(combination
of edges)
edges
Andrew Ng
Based on
materials by
pixels
16
Learning of Object Parts
17
Slide credit: Andrew
Training on Multiple Objects
18
Slide credit: Andrew
Scene Labeling via Deep
Learning
[Farabet et al. ICML 2012, PAMI 2013] 19
Inference from Deep Learned
Models
Generating posterior samples from faces by “filling in”experiments
(cf. Lee and Mumford, 2003). Combine bottom-up and top-down inference.
Input images
Samples from
feedforward
Inference
(control)
Samples
from Full
posterior
inference
20
Machine Learning in
Automatic Speech Recognition
A Typical Speech Recognition System
22
Types of Learning
23
Types of Learning
7
6
5
4
km)
3
2
1
0
1970 1980 1990 2000 2010 2020
Year
26
Data from G. Witt. Journal of Statistics Education, Volume 21, Number 1 (2013)
Supervised Learning:
Classification
• Given (x1, y1), (x2, y2), ..., (xn, yn)
• Learn a function f(x) to predict y given x
– y is categorical == classification
Breast Cancer (Malignant / Benign)
1(Malignant)
0(Benign)
Tumor
Size
27
Based on example by Andrew Ng
Supervised Learning:
Classification
• Given (x1, y1), (x2, y2), ..., (xn, yn)
• Learn a function f(x) to predict y given x
– y is categorical == classification
Breast Cancer (Malignant / Benign)
1(Malignant)
0(Benign)
Tumor
Size
Tumor Size 28
Supervised Learning:
Classification
• Given (x1, y1), (x2, y2), ..., (xn, yn)
• Learn a function f(x) to predict y given x
– y is categorical == classification
Breast Cancer (Malignant / Benign)
1(Malignant)
0(Benign)
Tumor
Predict Size
Benign Predict Malignant
Tumor Size 29
Supervised Learning
• x can be multi-dimensional
– Each dimension corresponds to an attribute
- Clump Thickness
- Uniformity of Cell Size
Ag - Uniformity of Cell Shape
e
…
Tumor Size
30
Based on example by Andrew Ng
Unsupervised Learning
• Given x1, x2, ..., xn (without labels)
• Output hidden structure behind the x’s
– E.g., clustering
31
Unsupervised Learning
Genomics application: group individuals by genetic similarity
Gene
s
Individuals 32
[Source: Daphne Koller]
Unsupervised Learning
34
Image credit: statsoft.com Audio from
Unsupervised Learning
• Independent component analysis –
separate a combined signal into its
original sources
35
Image credit: statsoft.com Audio from
Reinforcement Learning
• Given a sequence of states and actions
with (delayed) rewards, output a policy
– Policy is a mapping from states actions
that tells you what to do in a given state
• Examples:
– Credit assignment problem
– Game playing
– Robot in a maze
– Balance a pole on your hand
36
The Agent-Environment Interface
... rt +1 rt +2 rt +3 s ...
st a st at st at t +3a
+1 +2 t
t +1 +2 +3
37
Slide credit: Sutton & Barto
Reinforcement Learning
https://www.youtube.com/watch?v=4cgWya-wjgY
38
Inverse Reinforcement Learning
• Learn policy from user demonstrations
40
Framing a Learning Problem
41
Designing a Learning System
• Choose the training experience
• Choose exactly what is to be learned
– i.e. the target function
• Choose how to represent the target function
• Choose a learning algorithm to infer the
target function from the experience
Training Learner
data
Environment
/ Experience Knowledg
e
Testing
data Performance
Element 41
Moone
Based on slide by Ray
y
Training vs. Test Distribution
• We generally assume that the training
and test examples are independently
drawn from the same overall distribution
of data
– We call this “i.i.d” which stands for
“independent and identically distributed”
43
Slide credit: Pedro Domingos
Various Function
Representations
• Numerical functions
– Linear regression
– Neural networks
– Support vector machines
• Symbolic functions
– Decision trees
– Rules in propositional logic
– Rules in first-order predicate logic
• Instance-based functions
– Nearest-neighbor
– Case-based
• Probabilistic Graphical Models
– Naïve Bayes
– Bayesian networks
– Hidden-Markov Models (HMMs)
– Probabilistic Context Free Grammars (PCFGs)
44
Slide credit: Ray
– Markov networks
45
Slide credit: Ray
Various Search/Optimization
Algorithms
• Gradient descent
– Perceptron
– Backpropagation
• Dynamic Programming
– HMM Learning
– PCFG Learning
• Divide and Conquer
– Decision tree induction
– Rule learning
• Evolutionary Computation
– Genetic Algorithms (GAs)
– Genetic Programming (GP)
– Neuro-evolution
46
Slide credit: Ray
Evaluation
• Accurac
y
• Precision and recall
• Squared error
• Likelihood
• Posterior probability
• Cost / Utility
• Margin
• Entropy
• K-L divergence
• etc.
4
Slide credit: Pedro 7
ML in Practice
• Understand domain, prior knowledge, and goals
• Data integration, selection, cleaning, pre-processing,
Loop etc.
• Learn models
• Interpret results
• Consolidate and deploy discovered knowledge
4
Based on a slide by Pedro 8
Lessons Learned about Learning
• Learning can be viewed as using direct or
indirect experience to approximate a chosen
target function.
51
Slide credit: Ray
History of Machine Learning
(cont.)
• 1980s:
– Advanced decision tree and rule learning
– Explanation-based Learning (EBL)
– Learning and planning and problem solving
– Utility problem
– Analogy
– Cognitive architectures
– Resurgence of neural networks (connectionism, backpropagation)
– Valiant’s PAC Learning Theory
– Focus on experimental methodology
• 1990s
– Data mining
– Adaptive software agents and web applications
– Text learning
– Reinforcement learning (RL)
– Inductive Logic Programming (ILP)
52
Slide credit: Ray
– Ensembles: Bagging, Boosting, and Stacking
– Bayes Net learning
53
Slide credit: Ray
History of Machine Learning
(cont.)
• 2000s
– Support vector machines & kernel methods
– Graphical models
– Statistical relational learning
– Transfer learning
– Sequence labeling
– Collective classification and structured outputs
– Computer Systems Applications (Compilers, Debugging, Graphics, Security)
– E-mail management
– Personalized assistants that learn
– Learning in robotics and vision
• 2010s
– Deep learning systems
– Learning for big data
– Bayesian methods
– Multi-task & lifelong learning
53
Based on slide by Ray
– Applications to vision, speech, social networks, learning to read,
etc.
– ???
54
Based on slide by Ray
What We’ll Cover in this
Course
• Supervised learning • Unsupervised learning
– Decision tree induction – Clustering
– Linear regression – Dimensionality
– Logistic regression reduction
– Support vector • Reinforcement
machines & kernel
methods learning
– Model ensembles – Temporal
difference
– Bayesian learning learning
– Neural networks & – Q learning
deep learning
– Learning theory • Evaluation
• Applications
Our focus will be on applying machine learning to real
applications
54