l9
l9
Machine Learning
Machine Learning
• What is learning?
Definitions
• Webster
– To gain knowledge or understanding of or skill in by study, instruction or experience; memorize; to
acquire knowledge or skill in a behavioral tendency; discovery, to obtain knowledge of for the first
time
• Simon
– Any process by which a system improves its performance
• So far we have programmed knowledge into the agent (expert rules, probabilities, search
space representations), but an autonomous agent should acquire this knowledge on its own.
• Machine Learning will make this possible
A General Model of Learning Agents
• Learning Element
– Adds knowledge, makes improvement to system
• Performance Element
– Performs task, selects external actions
• Critic
– Monitors results of performance, provides feedback to learning element
• Problem Generator
– Actively suggests experiments, generates examples to test
• Performance Standard
– Method / standard of measuring performance
The Learning Problem
• Learning = Improving with experience at some task
– Improve over task T
– With respect to performance measure P
– Based on experience E
• Example: Learn to play checkers (Chinook)
– T: Play checkers
– P: % of games won in world tournament
– E: opportunity to play against self
• Example: Learn to Diagnose Patients
– T: Diagnose patients
– P: Percent of patients correctly diagnosed
– E: Pre-diagnosed medical histories of patients
Categories of Learning
• Learning by being told • Syskill and Webert
• Learning by examples / Perform Web Page Rati
Supervised learning ng
• Learning by discovery /
Unsupervised learning
• Example of supervised l
• Learning by experimentation /
earning
Reinforcement learning • Example of supervised l
earning
• Example of unsupervise
d learning
Learning From Examples
• Learn general concepts or categories from examples
• Learn a task (drive a vehicle, win a game of backgammon)
• Examples of objects or tasks are gathered and stored in a
database
• Each example is described by a set of attributes or features
• Each example used for training is classified with its correct label
(chair, not chair, horse, not horse, 1 vs. 2 vs. 3, good move, bad
move, etc.)
• The machine learning program learns general concept
description from these specific examples
• The ML program should be applied to classify or perform tasks
never before seen from learned concept
Learning From Examples
• First algorithm: naïve Bayes classifier
• D is training data
– Each data point is described by attributes a1..an
• Learn mapping from data point to a class value
– Class values v1..vj
• We are searching through the space of
possible concepts
– Functions that map data to class value
Supervised Learning Algorithm – Naïve Bayes
P ( D | h) P ( h)
P(h | D)
P( D)
Remind Assess
Prediction Problems
• Software that customizes to user
Inductive Learning Hypothesis
• Any hypothesis found to approximate the target function
well over a sufficiently large set of training examples will
also approximate the target function well over other
unobserved examples
Inductive Bias
• There can be a number of hypotheses consistent with
training data
• Each learning algorithm has an inductive bias that imposes a
preference on the space of all possible hypotheses
Decision Trees
• A decision tree takes a description of an object or situation as
input, and outputs a yes/no "decision".
• Can also be used to output greater variety of answers.
• Here is a decision tree for the concept PlayTennis
Decision Tree Representation
I((p/p+n), (n/p+n))
One solution:
prune decision tree
How Do We Prune a Decision Tree?
• Delete a decision node
• This causes entire subtree rooted at node to
be removed
• Replace by leaf node, and assign it by majority
vote
• Reduced error pruning: remove nodes as long
as performance improves on validation set
Measure Performance of a Learning Algorithm
Outlook
sunny windy
overcast
Yes No No Yes
Performance Measures
• Percentage correctly classified, averaged over folds
• Confusion matrix
Predicted Predicted
Negative Positive
Actual TN FP
Negative
Actual FN TP
Positive
y= 1 if >nThreshold
wx
i 1 i i
0 otherwise
• Transfer function
• Learning rate, wnew wold ( y d y ) x
• Threshold update function
• Number of epochs
Learn Logical AND of x1 and x2
w *x i i Threshold
x1 x2 yd
• Initially let w1=0, w2=0, T=0, eta=1 0 0 0
• Epoch 1 0 1 0
1 0 0
1 1 1
CONVERGENCE!
The AND Function
• Notice that the classes can be separated by a line (hyperplane)
x2
+ is Class 1
- Is Class 2 + + +
- -
+
-- - + +
- + x1
- +
- -- +
- - -
+
2x1 + x2 = 2
+ - -
+
- - x1
+
+ -
+ -
+
-
+ - -
- -
-
Examples
• Perceptron Example
• Perceptron Example
Linearly Separable
• If the classes can be separated by a hyperplane,
then they are linearly separable.
• Linearly Separable Learnable by a Perceptron
• Here is the XOR space: No line can separate these data points
into two classes – need two lines
+ -
- +
How Can We Learn These Functions?
Since the gradient specifies direction of steepest increase of error, the training rule for gradient
descent is to update each weight by the derivative of the error with respect to each weight, or
E Err
Err *
W j W j
where g’(in) is the derivative of the transfer function & a j is the activation value at source node j.
We want to eliminate the error when we adjust the weights, so we multiply the formula by -1.
We want to constrain the adjustment, so we multiply the formula again by the learning rate .
Same general idea as before. If error is positive, then network output is too small
so weights are increased for positive inputs and decreased for negative inputs.
The opposite happens when the error is negative.
Hidden-to-output Weights
• and the data in the new space is separable as shown on the right.
• We can search for such mappings that leave maximal margins between
Reinforcement Learning
• Learn action selection for probabilistic applications
– Robot learning to dock on battery charger
– Learning to choose actions to optimize factory output
– Learning to play Backgammon
• Note several problem characteristics:
– Delayed reward
– Opportunity for active exploration
– Possibility that state only partially observable
– Possible need to learn multiple tasks with same
sensors/effectors
Reinforcement Learning
• Learning an optimal strategy for maximizing future reward
• Agent has little prior knowledge and no immediate
feedback
• Action credit assignment difficult when only future reward
• Two basic agent designs
– Agent learns utility function U(s) on states
• Used to select actions maximizing expected utility
• Requires a model of action outcomes (T(s,a,s'))
– Agent learns action-value function
• Gives expected utility Q(s,a) of action a in state s
• Q-learning Q(s,a)
• No action outcome model, but cannot look ahead
•
Passive Learning in a Known Environment
• Here are utilities for our navigation problem with =1 and R(s) = -.04 for
Calculate Utility Values
• When selecting an action, the agent choose an action that maximizes
the Expected Utility of the resulting state
3 .19 .46 +1
2 .19 -1
1 -.54
1 2 3 4
Learning an Action-Value Function: Q-Learning