0% found this document useful (0 votes)

6 views

Lecture5 Learning Theory v1.1

lecture5-learning-theory-v1.1

Uploaded by

Aniket Dwivedi

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

6 views

Lecture5 Learning Theory v1.1

lecture5-learning-theory-v1.1

Uploaded by

Aniket Dwivedi

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 59

Introduction to Machine Learning

Learning theory: generalization and VC

dimension
Yifeng Tao
School of Computer Science
Carnegie Mellon University

Slides adapted from Eric Xing

Yifeng Tao Carnegie Mellon University 1

Outline
oComputational learning theories
o PAC framework
o Agnostic framework
oVC dimension

PAC Agnostic
|H| finite

|H| infinite,
but VC(H) finite

Yifeng Tao Carnegie Mellon University 2

Generalizability of Learning
oIn machine learning it's really generalization error that we care, but
most learning algorithms fit their models to the training set.
oWhy should doing well on the training set tell us anything about
generalization error? Specifically, can we relate error on training
set to generalization error?
oAre there conditions under which we can actually prove that learning
algorithms will work well?
oLecture 1:

[Slide from Eric Xing]

Yifeng Tao Carnegie Mellon University 3

What General Laws Constrain Inductive Learning?
oWant theory to relate:
o Training examples: m
o Complexity of hypothesis/concept space: H
o Accuracy of approximation to target concept:
o Probability of successful learning:
oAll the results in O(…)

[Slide from Eric Xing]

Yifeng Tao Carnegie Mellon University 4

Prototypical concept learning task
oBinary classification
o Everything we'll say here generalizes to other, including regression and
multi-class classification problems.
oGiven:
o Instances X: Possible days, each described by the attributes Sky, AirTemp,
Humidity, Wind, Water, Forecast
o Target function c: EnjoySport : X à {0, 1}
o Hypotheses space H: Conjunctions of literals. E.g.
o (?, Cold, High, ?, ?, ¬EnjoySport).
o Training examples S: iid positive and negative examples of the target
function
o (x1, c(x1)), ... (xm, c(xm))
oDetermine:
o A hypothesis h in H such that h(x) is "good" w.r.t c(x) for all x in S?
o A hypothesis h in H such that h(x) is "good" w.r.t c(x) for all x in the true
distribution D?

[Slide from Eric Xing]

Yifeng Tao Carnegie Mellon University 5

Sample Complexity
oHow many training examples m are sufficient to learn the target
concept?
oTraining scenarios:
o If learner proposes instances, as queries to teacher
o Learner proposes instance x, teacher provides c(x)
o If teacher (who knows c) provides training examples
o Teacher provides sequence of examples offer m(x,c(x))
o If some random process (e.g., nature) proposes instances
o Instance x generated randomly, teacher provides c(x)

[Slide from Eric Xing]

Yifeng Tao Carnegie Mellon University 6

Two Basic Competing Models

[Slide from Eric Xing]

Yifeng Tao Carnegie Mellon University 7

Protocol

[Slide from Eric Xing]

Yifeng Tao Carnegie Mellon University 8

True error of a hypothesis

oDefinition: The true error (denoted εD(h)) of hypothesis h with

respect to target concept c and distribution D is the probability that h
will misclassify an instance drawn at random according to D.

[Slide from Eric Xing]

Yifeng Tao Carnegie Mellon University 9

Two notions of error
oTraining error (a.k.a., empirical risk or empirical error) of hypothesis
h with respect to target concept c
o How often h(x) ≠ c(x) over training instance from S

oTrue error of (a.k.a., generalization error, test error) hypothesis h

with respect to c
o How often h(x) ≠ c(x) over future random instances drew iid from D

[Slide from Eric Xing]

Yifeng Tao Carnegie Mellon University 10

The Union Bound

[Slide from Eric Xing]

Yifeng Tao Carnegie Mellon University 11

Hoeffding inequality

[Slide from Eric Xing]

Yifeng Tao Carnegie Mellon University 12

Version Space

[Slide from Eric Xing]

Yifeng Tao Carnegie Mellon University 13

Consistent Learner
oA learner is consistent if it outputs hypothesis that perfectly fits the
training data
o This is a quite reasonable learning strategy
oEvery consistent learning outputs a hypothesis belonging to the
version space
oWe want to know how such hypothesis generalizes

[Slide from Eric Xing]

Yifeng Tao Carnegie Mellon University 14

Probably Approximately Correct

oDouble “hedging"
o Approximately
o Probably
oNeed both!

[Slide from Eric Xing]

Yifeng Tao Carnegie Mellon University 15

Exhausting the version space

VSH,D

[Slide from Eric Xing]

Yifeng Tao Carnegie Mellon University 16

How many examples will ε-exhaust the VS

[Slide from Eric Xing]

Yifeng Tao Carnegie Mellon University 17

Proof

[Slide from Eric Xing and David Sontag]

Yifeng Tao Carnegie Mellon University 18

What it means
o[Haussler, 1988]: probability that the version space is not ε-
exhausted after m training examples is at most |H|e-εm

oSuppose we want this probability to be at most δ

oHow many training examples suffice?

oIf

[Slide from Eric Xing]

Yifeng Tao Carnegie Mellon University 19

Learning Conjunctions of Boolean Literals

[Slide from Eric Xing]

Yifeng Tao Carnegie Mellon University 20

PAC Learnability
oA learning algorithm is PAC learnable if it
o Requires no more than polynomial computation per training example, and
o no more than polynomial number of samples
oTheorem: conjunctions of Boolean literals is PAC learnable

[Slide from Eric Xing]

Yifeng Tao Carnegie Mellon University 21

How about EnjoySport?

[Slide from Eric Xing]

Yifeng Tao Carnegie Mellon University 22

PAC-Learning

[Slide from Eric Xing]

Yifeng Tao Carnegie Mellon University 23

Agnostic Learning

[Slide from Eric Xing]

Yifeng Tao Carnegie Mellon University 24

Empirical Risk Minimization Paradigm

[Slide from Eric Xing]

Yifeng Tao Carnegie Mellon University 25

The Case of Finite H
oH = {h1, ..., hk} consisting of k hypotheses.
oWe would like to give guarantees on the generalization error of ĥ.
oFirst, we will show that is a reliable estimate of ε(h) for all h.
oSecond, we will show that this implies an upper-bound on the
generalization error of ĥ.

[Slide from Eric Xing]

Yifeng Tao Carnegie Mellon University 26

Misclassification Probability
o

[Slide from Eric Xing]

Yifeng Tao Carnegie Mellon University 27

Uniform Convergence
o

[Slide from Eric Xing]

Yifeng Tao Carnegie Mellon University 28

[Slide from Eric Xing]

Yifeng Tao Carnegie Mellon University 29

Sample Complexity

[Slide from Eric Xing]

Yifeng Tao Carnegie Mellon University 30

Generalization Error Bound

[Slide from Eric Xing]

Yifeng Tao Carnegie Mellon University 31

Agnostic framework

[Slide from Eric Xing]

Yifeng Tao Carnegie Mellon University 32

What if H is not finite?
oCan’t use our result for infinite H

oNeed some other measure of complexity for H

o Vapnik-Chervonenkis (VC) dimension!

[Slide from Eric Xing]

Yifeng Tao Carnegie Mellon University 33

How do we characterize “power”?
oDifferent machines have different amounts of “power”.
oTradeoff between:
o More power: Can model more complex classifiers but might overfit
o Less power: Not going to overfit, but restricted in what it can model

oHow do we characterize the amount of power?

[Slide from Eric Xing]

Yifeng Tao Carnegie Mellon University 34

Shattering a Set of Instances

[Slide from Eric Xing]

Yifeng Tao Carnegie Mellon University 35

Three Instances Shattered

[Slide from Eric Xing]

Yifeng Tao Carnegie Mellon University 36

The Vapnik-Chervonenkis Dimension

[Slide from Eric Xing]

Yifeng Tao Carnegie Mellon University 37

VC dimension: examples

[Slide from Eric Xing and David Sontag]

Yifeng Tao Carnegie Mellon University 38

VC dimension: examples

[Slide from Eric Xing]

Yifeng Tao Carnegie Mellon University 39

[Slide from Eric Xing]

Yifeng Tao Carnegie Mellon University 40

[Slide from Eric Xing]

Yifeng Tao Carnegie Mellon University 41

[Slide from Eric Xing]

Yifeng Tao Carnegie Mellon University 42

The VC Dimension and the Number of Parameters
oThe VC dimension thus gives concreteness to the notion of the
capacity of a given set of h.
oIs it true that learning machines with many parameters would have
high VC dimension, while learning machines with few parameters
would have low VC dimension?

oAn infinite-VC function with just one parameter!

[Slide from Eric Xing]

Yifeng Tao Carnegie Mellon University 43

An infinite-VC function with just one parameter

[Slide from Eric Xing]

Yifeng Tao Carnegie Mellon University 44

Sample Complexity from VC Dimension

[Slide from Eric Xing]

Yifeng Tao Carnegie Mellon University 45

Consistency
oA learning process (model) is said to be consistent if model error,
measured on new data sampled from the same underlying
probability laws of our original sample, converges, when original
sample size increases, towards model error, measured on original
sample.

[Slide from Eric Xing]

Yifeng Tao Carnegie Mellon University 46

Vapnik main theorem

[Slide from Eric Xing]

Yifeng Tao Carnegie Mellon University 47

Agnostic Learning: VC Bounds

[Slide from Eric Xing]

Yifeng Tao Carnegie Mellon University 48

Model convergence speed

[Slide from Eric Xing]

Yifeng Tao Carnegie Mellon University 49

How to control model generalization capacity
oRisk Expectation = Empirical Risk + Confidence Interval
oTo minimize Empirical Risk alone will not always give a good
generalization capacity: one will want to minimize the sum of
Empirical Risk and Confidence Interval
oWhat is important is not the numerical value of the Vapnik limit, most
often too large to be of any practical use, it is the fact that this limit is
a non decreasing function of model family function “richness”

[Slide from Eric Xing]

Yifeng Tao Carnegie Mellon University 50

Structural Risk Minimization

[Slide from Eric Xing]

Yifeng Tao Carnegie Mellon University 51

SRM strategy

[Slide from Eric Xing]

Yifeng Tao Carnegie Mellon University 52

SRM strategy

[Slide from Eric Xing]

Yifeng Tao Carnegie Mellon University 53

SRM strategy

[Slide from Eric Xing]

Yifeng Tao Carnegie Mellon University 54

Putting SRM into action: linear models case
oThere are many SRM-based strategies to build models:
oIn the case of linear models
y = wTx + b
oone wants to make ||w|| a controlled parameter: let us call HC the
linear model function family satisfying the constraint:
||w|| < C
oVapnik Major theorem: When C decreases, d(HC) decreases

[Slide from Eric Xing]

Yifeng Tao Carnegie Mellon University 55

Putting SRM into action: linear models case

yi - wTxi - b

wTxi + b

[Slide from Eric Xing]

Yifeng Tao Carnegie Mellon University 56

Take away message
oSample complexity varies with the learning setting
o Learner actively queries trainer
o Examples provided at random
oWithin the PAC learning setting, we can bound the probability that
learner will output hypothesis with given error
o For ANY consistent learner (case where c in H)
o For ANY “best fit” hypothesis (agnostic learning, where perhaps c not in H)
oVC dimension as measure of complexity of H
oQuantitative bounds characterizing bias/variance in choice of H

[Slide from Eric Xing]

Yifeng Tao Carnegie Mellon University 57

Take home message

PAC

[Slide from Matt Gormley]

Yifeng Tao Carnegie Mellon University 58

References
oEric Xing, Ziv Bar-Joseph. 10701 Introduction to Machine Learning:
http://www.cs.cmu.edu/~epxing/Class/10701/
oMatt Gormley. 10601 Introduction to Machine Learning:
http://www.cs.cmu.edu/~mgormley/courses/10601/index.html
oDavid Sontag. Introduction To Machine Learning.
https://people.csail.mit.edu/dsontag/courses/ml12/slides/lecture14.p
df

Yifeng Tao Carnegie Mellon University 59

Error List
85% (13)
Error List
247 pages
Rate Analysis - BOOK
No ratings yet
Rate Analysis - BOOK
367 pages
ML 3
No ratings yet
ML 3
36 pages
ML Unit-2 Material Add-On
No ratings yet
ML Unit-2 Material Add-On
82 pages
ML Unit-3
No ratings yet
ML Unit-3
24 pages
computational learning theorem
No ratings yet
computational learning theorem
91 pages
Foundations of Machine Learning: Module 7: Computational Learning Theory
No ratings yet
Foundations of Machine Learning: Module 7: Computational Learning Theory
64 pages
Unit 1-1
No ratings yet
Unit 1-1
75 pages
Week_7_Notes[1]
No ratings yet
Week_7_Notes[1]
11 pages
Cs 171 18 IntroLearning Old
No ratings yet
Cs 171 18 IntroLearning Old
47 pages
Machine Leaning 3
No ratings yet
Machine Leaning 3
44 pages
Lect 26 PDF
No ratings yet
Lect 26 PDF
14 pages
Lecture16 VC
No ratings yet
Lecture16 VC
42 pages
ML Unit-3.-1
No ratings yet
ML Unit-3.-1
28 pages
AL3451 13 M
No ratings yet
AL3451 13 M
22 pages
ML Chapter 7 (CLT) Notes
No ratings yet
ML Chapter 7 (CLT) Notes
59 pages
MachineLearning_UNIT III
No ratings yet
MachineLearning_UNIT III
30 pages
Tutorial
No ratings yet
Tutorial
81 pages
Machine Learning: PAC-Learning and VC-Dimension
No ratings yet
Machine Learning: PAC-Learning and VC-Dimension
31 pages
TheLearningTheory 2
No ratings yet
TheLearningTheory 2
90 pages
UNIT 5
No ratings yet
UNIT 5
21 pages
ML Lecture 1 Iitg
No ratings yet
ML Lecture 1 Iitg
32 pages
unit - 1
No ratings yet
unit - 1
29 pages
NN
No ratings yet
NN
12 pages
Notes
No ratings yet
Notes
125 pages
Learning in Artificial Intelligence
No ratings yet
Learning in Artificial Intelligence
6 pages
3 Ml Ch2 Concept Learning Short
No ratings yet
3 Ml Ch2 Concept Learning Short
16 pages
ML Lecture 8
No ratings yet
ML Lecture 8
12 pages
Sec 1630
No ratings yet
Sec 1630
145 pages
Computational Learning
No ratings yet
Computational Learning
12 pages
Basics of Learning Theory
No ratings yet
Basics of Learning Theory
35 pages
Week 3
No ratings yet
Week 3
56 pages
LearningTheory
No ratings yet
LearningTheory
19 pages
Key Ideas in Machine Learning
No ratings yet
Key Ideas in Machine Learning
11 pages
Lecture 5
No ratings yet
Lecture 5
12 pages
WINSEM2021-22 CSE4020 ETH VL2021220501968 Reference Material I 22-01-2022 PAC Learning
No ratings yet
WINSEM2021-22 CSE4020 ETH VL2021220501968 Reference Material I 22-01-2022 PAC Learning
34 pages
AI-unit-4
No ratings yet
AI-unit-4
91 pages
UNIT-1
No ratings yet
UNIT-1
43 pages
5 - AIML - Module3 - PPT
No ratings yet
5 - AIML - Module3 - PPT
37 pages
ML Sit1305
No ratings yet
ML Sit1305
127 pages
Learnability Can Be Undecidable-Nicolelis
No ratings yet
Learnability Can Be Undecidable-Nicolelis
5 pages
ai unit 5 part 3
No ratings yet
ai unit 5 part 3
9 pages
Lecture Series On Machine Learning: Ravi Gupta G. Bharadwaja Kumar
No ratings yet
Lecture Series On Machine Learning: Ravi Gupta G. Bharadwaja Kumar
77 pages
ML_UNIT 4
No ratings yet
ML_UNIT 4
15 pages
ML UNIT 1-2-57
No ratings yet
ML UNIT 1-2-57
56 pages
ITML U1 Overview
No ratings yet
ITML U1 Overview
45 pages
Ecs 403 ML Module I
No ratings yet
Ecs 403 ML Module I
33 pages
SupervisedLearning 2 33
No ratings yet
SupervisedLearning 2 33
32 pages
Computational Learning Theory
No ratings yet
Computational Learning Theory
15 pages
Machine Learning - Computational Learning Theory PDF
No ratings yet
Machine Learning - Computational Learning Theory PDF
7 pages
MachineLearningMathematics
No ratings yet
MachineLearningMathematics
15 pages
ML Unit 1
No ratings yet
ML Unit 1
156 pages
1 Introduction To Machine Learning
No ratings yet
1 Introduction To Machine Learning
20 pages
Unit Iii
No ratings yet
Unit Iii
6 pages
ML Document-1 - Merged
No ratings yet
ML Document-1 - Merged
19 pages
SML_Lecture2
No ratings yet
SML_Lecture2
35 pages
Chapter 11
No ratings yet
Chapter 11
55 pages
Montanez Dissertation
No ratings yet
Montanez Dissertation
143 pages
Lecture 2.4
No ratings yet
Lecture 2.4
28 pages
Video Tutorial: Machine Learning 17CS73
100% (2)
Video Tutorial: Machine Learning 17CS73
27 pages
Colt Tutorial
No ratings yet
Colt Tutorial
43 pages
Horn Clause: Fundamentals and Applications
From Everand
Horn Clause: Fundamentals and Applications
Fouad Sabry
No ratings yet
Vitamins, Tamil Board
No ratings yet
Vitamins, Tamil Board
11 pages
Manpower - Daily Report Dt-18!10!2024
No ratings yet
Manpower - Daily Report Dt-18!10!2024
3 pages
Answer Science Ws 3 Grade 4
No ratings yet
Answer Science Ws 3 Grade 4
2 pages
Training of Mhps-Tomoni For GSRC
100% (5)
Training of Mhps-Tomoni For GSRC
22 pages
BL NurBio Activity 7 - Factors Affecting Enzymes Activity Revised (SY-2023-2024) - Edited
No ratings yet
BL NurBio Activity 7 - Factors Affecting Enzymes Activity Revised (SY-2023-2024) - Edited
6 pages
Total Dietary Fibre
No ratings yet
Total Dietary Fibre
28 pages
Phantom Node Method
No ratings yet
Phantom Node Method
12 pages
Compaction Variables and Compaction Specification
No ratings yet
Compaction Variables and Compaction Specification
18 pages
Laboratory-4-Ultrasonic-Sensor (LAVIZARES, VALMORES, REYES)
No ratings yet
Laboratory-4-Ultrasonic-Sensor (LAVIZARES, VALMORES, REYES)
4 pages
ASSIGNMENT 02
No ratings yet
ASSIGNMENT 02
7 pages
Animals+with+Attitude+4+EPIC
No ratings yet
Animals+with+Attitude+4+EPIC
14 pages
Sist en 14629 2007
No ratings yet
Sist en 14629 2007
9 pages
2022-03-01 BMW Car
No ratings yet
2022-03-01 BMW Car
84 pages
(Ebook) Australian Soil and Land Survey Field Handbook (Australian Soil and Land Survey Handbooks Series) by The National Committee for Soil and Terrain ISBN 9780643093959, 0643093958 - The ebook is now available, just one click to start reading
100% (2)
(Ebook) Australian Soil and Land Survey Field Handbook (Australian Soil and Land Survey Handbooks Series) by The National Committee for Soil and Terrain ISBN 9780643093959, 0643093958 - The ebook is now available, just one click to start reading
54 pages
Ghost Town at Sundown - Comprehension Questions With Answer
No ratings yet
Ghost Town at Sundown - Comprehension Questions With Answer
2 pages
Parts Wise Warranty Bs6 - m&Hcv Models Ph-1
No ratings yet
Parts Wise Warranty Bs6 - m&Hcv Models Ph-1
5 pages
Million Dollar Spaghetti
No ratings yet
Million Dollar Spaghetti
2 pages
Fls Brochure Usa v2 PDF
No ratings yet
Fls Brochure Usa v2 PDF
4 pages
8 Bookshelf - NBK493367-1 PDF
No ratings yet
8 Bookshelf - NBK493367-1 PDF
175 pages
Physical Plan
No ratings yet
Physical Plan
15 pages
Prelim Lesson 1
No ratings yet
Prelim Lesson 1
13 pages
Global Flavor Industry
100% (1)
Global Flavor Industry
11 pages
District Calendar 1
No ratings yet
District Calendar 1
2 pages
TJUSAMO 2013-2014 Modular Arithmetic
No ratings yet
TJUSAMO 2013-2014 Modular Arithmetic
4 pages
Precision Target Rifles: The American Evolution of The
No ratings yet
Precision Target Rifles: The American Evolution of The
8 pages
Tests SI Units Traditional Units: Clinical Laboratory Tests - Normal Values
No ratings yet
Tests SI Units Traditional Units: Clinical Laboratory Tests - Normal Values
2 pages
Unit-4 Managing Health and Sefety
No ratings yet
Unit-4 Managing Health and Sefety
12 pages
Maroque Orange Book
100% (1)
Maroque Orange Book
47 pages