0% found this document useful (0 votes)
56 views

Ai Unit5 Learning

1. Machine learning allows computer programs to improve their performance through experience without being explicitly programmed. It is concerned with the development of computer programs that are able to learn from data. 2. There are three main types of machine learning: supervised learning, unsupervised learning, and reinforcement learning. Supervised learning uses labeled examples to learn a function that maps inputs to outputs. Unsupervised learning finds hidden patterns in unlabeled data. Reinforcement learning learns from interactions by trial and error using feedback from its environment. 3. Inductive learning is a form of machine learning that involves generalizing from observed experiences to predict possibilities and make decisions. It allows a system to operate in unknown environments by learning patterns from examples to form
Copyright
© © All Rights Reserved
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
56 views

Ai Unit5 Learning

1. Machine learning allows computer programs to improve their performance through experience without being explicitly programmed. It is concerned with the development of computer programs that are able to learn from data. 2. There are three main types of machine learning: supervised learning, unsupervised learning, and reinforcement learning. Supervised learning uses labeled examples to learn a function that maps inputs to outputs. Unsupervised learning finds hidden patterns in unlabeled data. Reinforcement learning learns from interactions by trial and error using feedback from its environment. 3. Inductive learning is a form of machine learning that involves generalizing from observed experiences to predict possibilities and make decisions. It allows a system to operate in unknown environments by learning patterns from examples to form
Copyright
© © All Rights Reserved
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 62

LEARNING

An agent is learning if it improves its


performance on future tasks after making
observations about the world.
A computer program is said to learn from
experience E with respect to some class of
tasks T and performance measure P, if its
performance at tasks in T, as measured by
P, improves with experience E .
Why would agent learn?
1.The designers cannot anticipate all
possible situations that the agent might
find itself in
2.The designers cannot anticipate all
changes over time
3.Sometimes human programmers have no
idea how to program a solution
themselves
Machine Learning
Machine learning is concerned with
computer programs that automatically
improve their performance through
experience.

Field of study that gives computers the


ability to learn without being explicitly
programmed.
Machine
Learning
Traditional Programming

Data
Output
Computer
Machine Progra
Learning
m Data
Progra
Output Computer
m
Learning agents
Learning Agents
It allows the agent to operate in unknown
environment.
4 components:
1) Learning element.
2) Performance element.
3) Critic.
4) Problem generator.
Learning element:
Responsible for making improvements.
Performance element:
Responsible for selecting external action.
(entire agent)-Percepts and decides on action.
Critic:
Learning element uses feedback from critic-
how the agent is doing, how the performance
element should be modified to do better in
future.
Problem generator:
Responsible for suggesting actions that will
lead to new and informative experiences.
LEARNING ELEMENT

Learning is affected by 4 major issues.


1) component to be improved.
2)prior knowledge the agent has.

3) representation used for data and


component

4) What feed back available to learn


Forms of Learning:
1. Supervised Learning.
2. Unsupervised Learning.
3. Reinforcement Learning.
• Supervised Learning:
Learning from example of its input & output.
Eg: Taxi driver agent learning from the instructor and camera
images.
• Unsupervised Learning:

Involves learning pattern in the input without output.

Eg: Good traffic or Bad traffic.(by taxi driver agent)


• Reinforcement Learning:

- Agent learn from reinforcement(Feedback).i.e rewards and


punishments.

Eg: tip at the end of journey can be a reward to taxi driver and
vice versa
machine learning algorithm
Every machine learning algorithm has three components:
Representation: how to represent knowledge. Examples include
decision trees, sets of rules, instances.
Evaluation: the way to evaluate candidate programs
(hypotheses). Examples include accuracy, prediction and recall,
squared error.
Optimization: the way candidate programs are generated known
as the search process. For example combinatorial optimization,
convex optimization, constrained optimization.
Supervised Learning
Supervised learning is the type of machine learning in
which machines are trained using well "labeled" training
data, and on basis of that data, machines predict the
output
 Regression algorithms are used if
there is a relationship between
the input variable and the output
variable. It is used for the
prediction of continuous
variables, such as Weather
forecasting, Market Trends, etc
 Classification algorithms are
used when the output variable
is categorical, which means
there are two classes such as Yes-
No, Male-Female, True-false, etc.

Types of supervised Machine


learning
Unsupervised learning
 unsupervised learning is a machine learning technique in which
models are not supervised using training dataset. Instead, models
itself find the hidden patterns and insights from the given data. It can
be compared to learning which takes place in the human brain while
learning new things.
 Clustering: Clustering is a
method of grouping the
objects into clusters such
that objects with most
similarities remains into a
group.
 Association: A method
which is used for finding
the relationships between
variables in the large
database.

Types of Unsupervised Learning


Reinforcement learning
 Reinforcement Learning is a feedback-based Machine learning
technique in which an agent learns to behave in an environment by
performing the actions and seeing the results of actions. For each good
action, the agent gets positive feedback, and for each bad action, the
agent gets negative feedback or penalty.
Supervised learning
Given a input, must try to recover the
unknown function or some thing close to it.
Eg:
( x, f(x) ), Where x- Input f(x) – output of
the function applied to x.
f returns a function h that approximates f.
h – hypothesis.
Hypothesis space H – set of hypothesis.
Supervised learning[contd…]
The task of pure induction is that given a
collection of examples, return a function h
that approximates the true function f.
The function h is called hypothesis.
To measure the accuracy of hypothesis, we
give it a test set of examples that are distinct
from training set.
We can say a hypothesis generalizes well if it
correctly predicts the value.
Hypothesis space
It is a set of all possible legal hypothesis.
From this set ML algorithm will select the best
possible(ONLY ONE),which would describe the target
function/outputs.
If the hypothesis agrees with all the data then it is called
consistent hypothesis.
But the problem comes when we have multiple such
consistent hypothesis, then which one to select?
Answer is to select the simplest hypothesis
which is consistent with the data.
This principle is called ockham’s razor.
“given all other things being equal, a shorter
explanation for the given data should be
favored over lengthier explanation.
W.r.t ML the one with least complex to deploy
and easiest to interpret should be used.
Eg: ( x,f(x))
Data that exactly fits the straight line -
consistent hypothesis.
f(x)

x- Polynomial of degree 1 .
Polynomial of degree - n
We say that the learning problem is
Realizable- if the hypothesis contains true
function
Ex: f(x)=ax+b+csin(x)
Otherwise- Unrealizable(we cannot tell the
learning problem is realizable,bcoz always true
function is not known.)
Inductive learning-concept
learning
 A new field of machine learning known as inductive
learning has been introduced to help in inducing
general rules and predicting future activities.

 Inductive learning is learning from observation and


earlier knowledge by generalization of rules and
conclusions.

 Inductive learning allows for the identification of


training data or earlier knowledge patterns.

 The identified and extracted generalized rules come to


use in reasoning and problem solving.
Inductive learning
Given examples of a function (X, F(X))
Predict function F(X) for new examples.
A Framework For Studying Inductive
Learning
Terminology used in machine learning:
Training example:
a sample from x including its output from the target
function
Target function:
the mapping function f from x to f(x)
Hypothesis:
approximation of f, a candidate function.
Concept: A boolean target function, positive examples
and negative examples for the 1/0 class values.
Classifier: 
Learning program outputs a classifier that can
be used to classify.
Learner:
Process that creates the classifier.
Hypothesis space:
set of possible approximations of f that the
algorithm can create.
Version space:
subset of the hypothesis space that is consistent
with the observed data
WORKING….
1) First Determine the type of training dataset
2) Collect/Gather the labeled training data.
3) Split the training dataset into training dataset, test dataset, and
validation dataset.
4) Determine the input features of the training dataset, which should
have enough knowledge so that the model can accurately predict
the output.
5) Determine the suitable algorithm for the model, such as support
vector machine, decision tree, etc.
6) Execute the algorithm on the training dataset. Sometimes we need
validation sets as the control parameters, which are the subset of
training datasets.
7) Evaluate the accuracy of the model by providing the test set. If the
model predicts the correct output, which means our model is
accurate.
Why inductive
learning
 Alternative method of knowledge acquisition. In this knowledge
learned or induced from examples.

 Human experts are capable of using their knowledge in their daily


work, but they usually cannot summarize and generalize their
knowledge explicitly in a form which is sufficiently systematic,
correct and complete for machine representation and application .

 While it is very difficult for an expert to articulate his knowledge, it


is relatively easy to document case studies of the expert's skills at
work.
Learning Decision Trees
 Decision tree induction is one of the simplest and successful forms of
learning algorithm.
 Decision Tree is a Supervised learning technique  and very easy to
implement.
Decision tree as performance element:
 It takes input as - Object or situation described by set of attributes and
returns a decision. (predicted o/p value for input).
 I/P & O/P attributes can be Discrete or Continuous.
 Learning a Discrete valued function- Classification Learning.

Continuous valued function – Regression Learning .


 We will concentrate on Boolean classification as true (positive) or false
(negative).
Decision tree reaches its decision by performing a sequence of
tests. It is a tree-structured classifier, where
Each internal node in the tree corresponds to a test of the value of one of
the properties.
Branches from the nodes are labelled with the possible values of test.
Leaf node in the tree specifies the value to be returned, if leaf node is
reached.
Write decision tree for the following
Boolean functions
Decision Trees

Let us consider a simple example of


waiting in the restaurant.
The aim here is to learn a definition for the
goal predicate Will Wait.
Here first we have to state what attributes
are available to describe the example
domain.
Let’s decide on the following list of
attrbutes:
Hotel – Goal (Will Wait)
• Alternate
• Bar
• Fri / Sat
• Hungry
• Patrons ( Full , Some , None)
• Price ($ , $$, $$$ )
• Raining
• Reservation-
• Type (Indian, Chinese, Italian )
• Wait Estimate.(0-10,10-30,30-60,>60)
Goal- Will Wait.
Expressiveness of decision trees
Hypothesis: any particular decision tree hypothesis for the given
goal predicate (Will Wait) can be seen as an assertion of the form
s WillWait(s)  (P1 (s)  P2 (s) …… Pn (s)

Where each condition pi(s) is a conjunction of tests


corresponding to a path from the root of the tree to a leaf with
positive outcome.
At glance it looks like a first order sentence ,but in sense it is
propositional—it contains just one variable and all the predicate
are unary.
Expressiveness of decision trees….
• Decision trees are fully expressive within the class of
propositional languages.
• Any Boolean function can be written as a decision
tree by having each row in the truth table for the
function correspond to a path in the tree.
• Kinds of functions:
Parity Function: Returns 1 if and only if an even
number of inputs are 1.(large decision tree)
Majority Function:
Returns 1 if more than half of its inputs are 1.
(difficult to use a decision tree)
This complete Set is called training Set for restaurant domain.
Here we are given with 12 training examples, which
we classify into true and false sets.
As we have discussed with the decision tree we can
use any attribute as a first test in the tree.
But if I take type attribute ,then it has 4 variations
and each has true and false values, so this attribute is
a poor attribute.
So we select patrons is a fairly imp attribute. where
we can clearly say yes/no.
Generally after first test the tree splits up and each
outcome is new decision tree itself.
Testing good attributes first allows us to minimize the
tree depth.
After the first attribute splits the samples, the remaining samples
become decision tree problems themselves (or subtrees)
but with less samples and one less attribute, e.g.,

This suggests a recursive approach to build decision trees.


Decision Tree Algorithm
Aim: Find the smallest tree consistent with the training
samples.
Idea: Recursively choose “most significant” attribute
as root of (sub)tree.
Decision Tree Algorithm cont…
1) If there are some positive and negative samples, then
choose the best attribute to split them, e.g., test Patrons
at the root.
2) If all the remaining samples are all positive or all
negative, we have reached a leaf node. Assign label as
positive (or negative).
3) If there are no samples left, it means that no such
sample has been observed. Return a default value
calculated from the majority classification at the
node’s parent
4) If there are no attributes left, but both positive and
negative samples, it means that these samples have exactly
the same feature values but different classifications. This
may happen because
 some of the data could be incorrect, or
 the attributes do not give enough information to
describe the situation fully (i.e. we lack other useful
attributes), or
 the problem is truly non-deterministic, i.e., given two
samples describing exactly the same conditions, we may
make different decisions.
Solution: Call it a leaf node and assign the majority vote as
the label.
Which attribute should be tested at each
node?
We want to build a small decision tree
Information gain
◦ How well a given attribute separates the training
examples according to their target classification
◦ Reduction in entropy
Entropy
◦ (im)purity of an arbitrary collection of examples
Entropy
If there are only two classes

Entropy S    p log 2 p  p log 2 p

In general,
c
Entropy S     pi log 2 pi
i 1
Information Gain
• The expected reduction in entropy achieved
by splitting the training examples
Sv
GainS , A  Entropy S     Entropy S v 
vValues A S
Find entropy and information gain
for the attributes A,B and C .derive
the decision tree for the given
classification.
INSTANCE A B C CLASSIFI
S CATION
1 T T F YES
2 T T F YES
3 T F F NO
4 F F F YES
5 F T T NO
6 F T T NO
Find entropy and information gain
for the attributes a1 and a2 .derive
the decision tree.
Choosing the Best Attribute
We need a measure of “good” and “bad” for attributes.
One way to do this is to compute the information content at
a node, i.e. at node R

L
Σ
I(R) = −P (ci) log2 P (ci)
i=1

where {c1,... , cL} are the L class labels present at the node,
and
P (ci) is the probability of getting class ci at the node.
A unit of information is called a “bit”.
Choosing the Best Attribute (cont.)
Example:
At the root node of the restaurant problem, c1 = T rue,
c2 = False and there are 6 T rue samples and 6 False samples.
Therefore no. of samples = c 1 6
P (c1) = = = 0.5
total no. of samples 6+ 6
no. of samples = c 2 6
P (c2) = = = 0.5
total no. of samples 6+ 6
I(Root) = −0.5 × log2 0.5 − 0.5 × log2 0.5 = 1 bit

In general, the amount of information will be maximum


when all classes are equally likely, and be minimum when
the node is homogeneous (all samples have the same
labels).
What are the maximum and minimum attainable values for
Choosing the Best Attribute (cont.)
An attribute A will divide the samples at a node
into different subsets (or child nodes) EA=v1 ,... ,
EA=vM , where A has M distinct values {v1,... , vM }.
Generally each subset EA=vi will have samples of
different labels, so if we go along that branch we
will need an additional of I(EA=vi ) bits of
information.
Over fitting
Growing each branch of the tree deeply
enough to perfectly classify the training
examples is not a good strategy.
◦ The resulting tree may over fit the training
data
Over fitting
◦ The tree can explain the training data very
well but performs poorly on new data
Alleviating the overfitting problem
Several approaches
◦ Stop growing the tree earlier
◦ Post-prune the tree

How can we evaluate the classification


performance of the tree for new data?
◦ The available data are separated into two sets
of examples: a training set and a validation
(development) set
Validation (development) set
Use a portion of the original training data to
estimate the generalization performance.

Training set
Original
training set

Validation set

Test set Test set


THANK YOU

You might also like