0% found this document useful (0 votes)
74 views

ML Lecture 8

This document discusses computational learning theory and frameworks for analyzing learning algorithms. It introduces the Probably Approximately Correct (PAC) learning model and the mistake bound model. The PAC model examines sample complexity, computational complexity, and the conditions under which a learning algorithm can output a hypothesis that has low error. The mistake bound model analyzes the number of mistakes a learning algorithm makes during training. Key concepts discussed include VC-dimension, sample complexity bounds for finite and infinite hypothesis spaces, and analyzing learning algorithms like the weighted majority algorithm within these frameworks.

Uploaded by

Avishek01
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
74 views

ML Lecture 8

This document discusses computational learning theory and frameworks for analyzing learning algorithms. It introduces the Probably Approximately Correct (PAC) learning model and the mistake bound model. The PAC model examines sample complexity, computational complexity, and the conditions under which a learning algorithm can output a hypothesis that has low error. The mistake bound model analyzes the number of mistakes a learning algorithm makes during training. Key concepts discussed include VC-dimension, sample complexity bounds for finite and infinite hypothesis spaces, and analyzing learning algorithms like the weighted majority algorithm within these frameworks.

Uploaded by

Avishek01
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 12

Machine Learning: Lecture 8

Computational Learning
Theory
(Based on Chapter 7 of Mitchell T..,
Machine Learning, 1997)

1
Overview
 Are there general laws that govern learning?
 Sample Complexity: How many training examples are needed for
a learner to converge (with high probability) to a successful
hypothesis?
 Computational Complexity: How much computational effort is
needed for a learner to converge (with high probability) to a
successful hypothesis?
 Mistake Bound: How many training examples will the learner
misclassify before converging to a successful hypothesis?
 These questions will be answered within two analytical
frameworks:
 The Probably Approximately Correct (PAC) framework

 The Mistake Bound framework


2
Overview (Cont’d)
 Rather than answering these questions for
individual learners, we will answer them for
broad classes of learners. In particular we will
consider:
 The size or complexity of the hypothesis space
considered by the learner.
 The accuracy to which the target concept must be
approximated.
 The probability that the learner will output a
successful hypothesis.
 The manner in which training examples are
presented to the learner.
3
The PAC Learning Model
 Definition: Consider a concept class C
defined over a set of instances X of length n
and a learner L using hypothesis space H. C is
PAC-learnable by L using H if for all cC,
distributions D over X,  such that 0<  < 1/2,
and  such that 0<  <1/2, learner L will, with
probability at least (1- ), output a hypothesis
hH such that errorD(h)   , in time that is
polynomial in 1/ , 1/ , n , and size(c).

4
Sample Complexity for Finite
Hypothesis Spaces
 Given any consistent learner, the number of examples
sufficient to assure that any hypothesis will be probably
(with probability (1- )) approximately (within error  )
correct is m= 1/ (ln|H|+ln(1/))
 If the learner is not consistent, m= 1/22 (ln|H|+ln(1/))
 Conjunctions of Boolean Literals are also PAC-
Learnable and m= 1/ (n.ln3+ln(1/))
 k-term DNF expressions are not PAC learnable because
even though they have polynomial sample complexity,
their computational complexity is not polynomial.
 Surprisingly, however, k-term CNF is PAC learnable.
5
Sample Complexity for Infinite
Hypothesis Spaces I: VC-Dimension
 The PAC Learning framework has 2 disadvantages:
 It can lead to weak bounds

 Sample Complexity bound cannot be established for

infinite hypothesis spaces


 We introduce new ideas for dealing with these problems:
 Definition: A set of instances S is shattered by hypothesis

space H iff for every dichotomy of S there exists some


hypothesis in H consistent with this dichotomy.
 Definition: The Vapnik-Chervonenkis dimension,

VC(H), of hypothesis space H defined over instance


space X is the size of the largest finite subset of X
shattered by H. If arbitrarily large finite sets of X can b
shattered by H, then VC(H)=
6
Sample Complexity for Infinite
Hypothesis Spaces II
 Upper-Bound on sample complexity, using the VC-
Dimension: m 1/ (4log2(2/)+8VC(H)log2(13/)
 Lower Bound on sample complexity, using the VC-
Dimension:
Consider any concept class C such that VC(C)  2, any
learner L, and any 0 <  < 1/8, and 0 <  < 1/100. Then
there exists a distribution D and target concept in C
such that if L observes fewer examples than
max[1/ log(1/ ),(VC(C)-1)/(32)]
then with probability at least , L outputs a hypothesis
h having errorD(h)>  . 7
VC-Dimension for Neural Networks
 Let G be a layered directed acyclic graph with n
input nodes and s2 internal nodes, each having
at most r inputs. Let C be a concept class over Rr
of VC dimension d, corresponding to the set of
functions that can be described by each of the s
internal nodes. Let CG be the G-composition of
C, corresponding to the set of functions that can
be represented by G. Then VC(CG)2ds log(es),
where e is the base of the natural logarithm.
 This theorem can help us bound the VC-
Dimension of a neural network and thus, its
sample complexity (See, [Mitchell, p.219])!
8
The Mistake Bound Model of Learning
 The Mistake Bound framework is different from
the PAC framework as it considers learners that
receive a sequence of training examples and that
predict, upon receiving each example, what its
target value is.
 The question asked in this setting is: “How
many mistakes will the learner make in its
predictions before it learns the target concept?”
 This question is significant in practical settings
where learning must be done while the system is
in actual use.
9
Optimal Mistake Bounds
 Definition: Let C be an arbitrary nonempty
concept class. The optimal mistake bound
for C, denoted Opt(C), is the minimum over
all possible learning algorithms A of MA(C).
Opt(C)=minALearning_Algorithm MA(C)
 For any concept class C, the optimal
mistake bound is bound as follows:
VC(C)  Opt(C)  log2(|C|)

10
A Case Study: The Weighted-
Majority Algorithm
ai denotes the ith prediction algorithm in the pool A of
algorithm. wi denotes the weight associated with ai.
 For all i initialize wi <-- 1
 For each training example <x,c(x)>
 Initialize q0 and q1 to 0

 For each prediction algorithm ai

• If ai(x)=0 then q0 <-- q0+wi


• If ai(x)=1 then q1 <-- q1+wi
 If q1 > q0 then predict c(x)=1

 If q0 > q1 then predict c(x) =0

 If q0=q1 then predict 0 or 1 at random for c(x)

 For each prediction algorithm ai in A do

• If ai(x)  c(x) then wi <-- wi


11
Relative Mistake Bound for the
Weighted-Majority Algorithm
 Let D be any sequence of training examples, let A
be any set of n prediction algorithms, and let k be
the minimum number of mistakes made by any
algorithm in A for the training sequence D. Then
the number of mistakes over D made by the
Weighted-Majority algorithm using =1/2 is at
most 2.4(k + log2n).
 This theorem can be generalized for any 0   1
where the bound becomes
(k log2 1/ + log2n)/log2(2/(1+ ))
12

You might also like