0% found this document useful (0 votes)

3 views

PAC Bayesian Learning overview

The document provides an overview of PAC-Bayesian learning, detailing its theoretical foundations, algorithms, and current trends in the field. It discusses the mathematical principles behind learning, Bayesian approaches, and the significance of PAC bounds in generalization. Additionally, it highlights various implementations and applications of PAC-Bayes in real-world scenarios.

Uploaded by

g4briel.santanna

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

3 views

PAC Bayesian Learning overview

Uploaded by

g4briel.santanna

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 66

PAC-Bayesian Learning

An overview of theory, algorithms and

current trends

Benjamin Guedj

https://bguedj.github.io
Inria Lille - Nord Europe https://bguedj.github.io - 6 PAC
6 PAC: Making PAC Learning great again

1. Active
2. Sequential
3. Structure-aware
4. Efficient
5. Ideal
6. Safe

Peter Grünwald Benjamin Guedj Emilie Kaufmann Wouter Koolen

CWI, co-PI Inria, co-PI Inria CWI

https://bguedj.github.io - 6 PAC - 2
A mathematical theory of learning: towards AI
{Statistical,Machine} learning: devise automatic procedures to
infer general rules from data.

Field of study about computers’ ability to learn without being

explicitly programmed (Arthur Samuel, 1959).

In the (rather not so?) long term: mimic the inductive functioning
of the humain brain to develop an artificial intelligence.

Big data / data science (somewhat annoying) hype: extremely

dynamic field at the crossroads of Computer Science, Optimization
and Statistics.

A hot topic at CWI and Inria in general and in Lille in particular:

we are hiring!

https://bguedj.github.io - 6 PAC - 3
Learning in a nutshell

https://bguedj.github.io - 6 PAC - 4
Learning in a nutshell
Collect data Dn = (Xi , Yi )ni=1 distributed as a random variable
(X, Y) ∈ X × Y. Data may be incomplete (unsupervised setting,
missing input), collected sequentially / actively, etc.

Goal: use Dn to build up φb such that φ(X)

b ≈ Y. Learning is to be
able to generalize!

Goal: use Dn to build up φb such that φ(X)

b ≈ Y. Learning is to be
able to generalize!

For some loss function ` : Y × Y → R+ , let

n
1 X b
R : φb →
7 E` φ(X),
b Y and rn : φb →
7 ` φ(Xi ), Yi
n
i=1

denote the risk (unknown) and empirical risk (known), respectively.

Typical goals: probabilistic bounds on R, algorithm based on rn .

Under classical assumptions, rn → R.
https://bguedj.github.io - 6 PAC - 4
Bayesian learning in a nutshell

https://bguedj.github.io - 6 PAC - 5
Bayesian learning in a nutshell
Let F be a set of candidates functions equipped with a probability
measure π (prior). Let f be the (known) density of the (assumed)
distribution of (X, Y), and define the posterior

ρb (·) ∝ f (X, Y|·)π(·).

Model-based learning (may be parametric or nonparametric).

ρb (·) ∝ f (X, Y|·)π(·).

Model-based learning (may be parametric or nonparametric).

I MAP φb ∈ arg max ρb (φ).

φ∈F

R
I Mean φb = Eρb φ = F φb
ρ (dφ).

I Realization φb ∼ ρb.

I ...

https://bguedj.github.io - 6 PAC - 5
Quasi-Bayesian learning in a nutshell
A.k.a generalized Bayes.

https://bguedj.github.io - 6 PAC - 6
Quasi-Bayesian learning in a nutshell
A.k.a generalized Bayes.
Let F be a set of candidates functions equipped with a probability
measure π (prior). Let λ > 0, and define a quasi-posterior

ρbλ (·) ∝ exp (−λrn (·)) π(·).

Model-free learning!

I MAQP φbλ ∈ arg max ρbλ (φ).

φ∈F
R
I Mean φbλ = Eρbλ φ = F φb
ρλ (dφ).

I Realization φbλ ∼ ρbλ .

I ...

https://bguedj.github.io - 6 PAC - 6
Why quasi-Bayes?

https://bguedj.github.io - 6 PAC - 7
Why quasi-Bayes?

One justification (there are others). Let K denote the

Kullback-Leibler divergence
(R
dρ
F log dπ dρ when ρ π,
K(ρ, π) =
+∞ otherwise.

With the classical quadratic loss ` : (a, b) 7→ (a − b)2 ,

K(ρ, π)
Z
ρbλ ∈ arg inf rn (φ)ρ(dφ) + .
ρπ F λ

https://bguedj.github.io - 6 PAC - 7
Statistical aggregation revisited

https://bguedj.github.io - 6 PAC - 8
Statistical aggregation revisited

Z
φbλ := Eρbλ φ = φb
ρλ (dφ)
ZF
= φ exp (−λrn (φ)) π(dφ)
F
#F
X exp(−λrn (φi ))π(φi )
= P#F φi , if |F| < +∞.
i=1 j=1 exp(−λrn (φj ))π(φj )
| {z }
ωλ,i

This is the celebrated exponentially weighted aggregate (EWA).

G. (2013). Agrégation d’estimateurs et de classificateurs : théorie et méthodes, Ph.D. thesis, Université Pierre

& Marie Curie

https://bguedj.github.io - 6 PAC - 8
PAC learning in a nutshell

https://bguedj.github.io - 6 PAC - 9
PAC learning in a nutshell
Probably Approximately Correct (PAC) oracle inequalities /
generalization bounds and empirical bounds.
Valiant (1984). A theory of the learnable, Communications of the ACM

Let φb be a learning algorithm. For any > 0,

n o
P R φb ≤ ♠ rn (φ) b + ∆(n, d, φ, ) ≥ 1 − ,
n o
? ?
P R φ − R ≤ ♠ inf R(φ) − R + ∆(n, d, φ, )
b ≥ 1 − ,
φ∈F

where ♠ ≥ 1 and R? = inf φ∈F R(φ).

Key argument: concentration inequalities (e.g., Bernstein) +

duality formula (Csiszár, Catoni).

https://bguedj.github.io - 6 PAC - 9
The PAC-Bayesian theory

https://bguedj.github.io - 6 PAC - 10
The PAC-Bayesian theory
...consists in producing PAC bounds for quasi-Bayesian learning
algorithms.
While PAC bounds focus on estimators θ̂n that are obtained as
functionals of the sample and for which the risk R is small, the
PAC-Bayesian approach studies anRaggregation distribution ρ̂n that
depends on the sample, for which R(θ)ρ̂n (dθ) is small.

McAllester (1998). Some PAC-Bayesian theorems, COLT

McAllester (1999). PAC-Bayesian model averaging, COLT

Catoni (2004). Statistical Learning Theory and Stochastic Optimization, Springer

Audibert (2004). Une approche PAC-bayésienne de la théorie statistique de l’apprentissage, Ph.D. thesis,

Université Pierre & Marie Curie

Catoni (2007). PAC-Bayesian Supervised Classification: The Thermodynamics of Statistical Learning, IMS

Dalalyan and Tsybakov (2008). Aggregation by exponential weighting, sharp PAC-Bayesian bounds and sparsity,

Machine Learning

https://bguedj.github.io - 6 PAC - 10
A flexible and powerful framework (1/2)

https://bguedj.github.io - 6 PAC - 11
A flexible and powerful framework (1/2)

Alquier and Wintenberger (2012). Model selection for weakly dependent time series forecasting, Bernoulli

Seldin, Laviolette, Cesa-Bianchi, Shawe-Taylor and Auer (2012). PAC-Bayesian inequalities for martingales,

IEEE Transactions on Information Theory

Alquier and Biau (2013). Sparse Single-Index Model, Journal of Machine Learning Research

G. and Alquier (2013). PAC-Bayesian Estimation and Prediction in Sparse Additive Models, Electronic Journal

of Statistics

Alquier and G. (2017). An Oracle Inequality for Quasi-Bayesian Non-Negative Matrix Factorization,

Mathematical Methods of Statistics

Dziugaite and Roy (2017). Computing Nonvacuous Generalization Bounds for Deep (Stochastic) Neural

Networks with Many More Parameters than Training Data, UAI

Dziugaite and Roy (2018). Data-dependent PAC-Bayes priors via differential privacy, NIPS

https://bguedj.github.io - 6 PAC - 11
A flexible and powerful framework (2/2)

Rivasplata, Parrado-Hernandez, Shawe-Taylor, Sun and Szepesvari (2018). PAC-Bayes bounds for stable

algorithms with instance-dependent priors, arXiv preprint

G. and Robbiano (2018). PAC-Bayesian High Dimensional Bipartite Ranking, Journal of Statistical Planning and

Inference

Li, G. and Loustau (2018). A Quasi-Bayesian perspective to Online Clustering, Electronic Journal of Statistics

G. and Li (2018). Sequential Learning of Principal Curves: Summarizing Data Streams on the Fly, arXiv preprint

https://bguedj.github.io - 6 PAC - 12
A flexible and powerful framework (2/2)

Rivasplata, Parrado-Hernandez, Shawe-Taylor, Sun and Szepesvari (2018). PAC-Bayes bounds for stable

algorithms with instance-dependent priors, arXiv preprint

G. and Robbiano (2018). PAC-Bayesian High Dimensional Bipartite Ranking, Journal of Statistical Planning and

Inference

Li, G. and Loustau (2018). A Quasi-Bayesian perspective to Online Clustering, Electronic Journal of Statistics

G. and Li (2018). Sequential Learning of Principal Curves: Summarizing Data Streams on the Fly, arXiv preprint

Towards (almost) no assumptions to derive powerful results

Bégin, Germain, Laviolette and Roy (2016). PAC-Bayesian bounds based on the Rényi divergence, AISTATS

Alquier and G. (2018). Simpler PAC-Bayesian bounds for hostile data, Machine Learning

https://bguedj.github.io - 6 PAC - 12
Existing implementation: PAC-Bayes in the real world

https://bguedj.github.io - 6 PAC - 13
Existing implementation: PAC-Bayes in the real world
I (Transdimensional) MCMC
G. and Alquier (2013). PAC-Bayesian Estimation and Prediction in Sparse Additive Models, Electronic

Journal of Statistics

Alquier and Biau (2013). Sparse Single-Index Model, Journal of Machine Learning Research

Li, G. and Loustau (2018). A Quasi-Bayesian perspective to Online Clustering, Electronic Journal of

Statistics

G. and Robbiano (2018). PAC-Bayesian High Dimensional Bipartite Ranking, Journal of Statistical

Planning and Inference

Journal of Statistics

Alquier and Biau (2013). Sparse Single-Index Model, Journal of Machine Learning Research

Li, G. and Loustau (2018). A Quasi-Bayesian perspective to Online Clustering, Electronic Journal of

Statistics

G. and Robbiano (2018). PAC-Bayesian High Dimensional Bipartite Ranking, Journal of Statistical

Planning and Inference

I Stochastic optimization
Alquier and G. (2017). An Oracle Inequality for Quasi-Bayesian Non-Negative Matrix Factorization,

Mathematical Methods of Statistics

G. and Li (2018). Sequential Learning of Principal Curves: Summarizing Data Streams on the Fly, arXiv

preprint

Journal of Statistics

Alquier and Biau (2013). Sparse Single-Index Model, Journal of Machine Learning Research

Li, G. and Loustau (2018). A Quasi-Bayesian perspective to Online Clustering, Electronic Journal of

Statistics

G. and Robbiano (2018). PAC-Bayesian High Dimensional Bipartite Ranking, Journal of Statistical

Planning and Inference

I Stochastic optimization
Alquier and G. (2017). An Oracle Inequality for Quasi-Bayesian Non-Negative Matrix Factorization,

Mathematical Methods of Statistics

G. and Li (2018). Sequential Learning of Principal Curves: Summarizing Data Streams on the Fly, arXiv

preprint

I Variational Bayes
Alquier, Ridgway and Chopin (2016). On the properties of variational approximations of Gibbs

posteriors, Journal of Machine Learning Research

https://bguedj.github.io - 6 PAC - 13
(intermediary) take-home message

https://bguedj.github.io - 6 PAC - 14
(intermediary) take-home message

PAC-Bayesian learning is a flexible and powerful machinery.

https://bguedj.github.io - 6 PAC - 14
(intermediary) take-home message

PAC-Bayesian learning is a flexible and powerful machinery.

+ little to no assumptions (teaser for second part)

+ flexibility: works as long as you can define a loss
+ generalization properties: state-of-the-art PAC risk
bounds
+ model-free learning

https://bguedj.github.io - 6 PAC - 14
(intermediary) take-home message

PAC-Bayesian learning is a flexible and powerful machinery.

+ little to no assumptions (teaser for second part)

+ flexibility: works as long as you can define a loss
+ generalization properties: state-of-the-art PAC risk
bounds
+ model-free learning

- still perceived as a black box and suffers from lack of

interpretability
- implementation plagued with the same issues as
"classical" Bayesian learning (speed / high dim / ...)

https://bguedj.github.io - 6 PAC - 14
A unified PAC-Bayesian framework
Alquier and G. (2018)
Simpler PAC-Bayesian Bounds for hostile data
Machine Learning

https://bguedj.github.io - 6 PAC - 15
Motivation: towards an agnostic learning theory
PAC-Bayesian bounds are a key justification in stat/ML for using
Bayesian-flavored learning algorithms in several settings.
high dimensional bipartite ranking, non-negative matrix factorization, sequential learning of principal curves, online

clustering, single-index models, high dimensional additive regression, domain adaptation, neural networks, . . .

Conversely, they are also used to elicit new learning algorithms.

https://bguedj.github.io - 6 PAC - 16
Motivation: towards an agnostic learning theory
PAC-Bayesian bounds are a key justification in stat/ML for using
Bayesian-flavored learning algorithms in several settings.
high dimensional bipartite ranking, non-negative matrix factorization, sequential learning of principal curves, online

clustering, single-index models, high dimensional additive regression, domain adaptation, neural networks, . . .

Conversely, they are also used to elicit new learning algorithms.

Most of these bounds rely on heavy and unrealistic assumptions:

e.g., boundedness of the loss function, independence. Hardly met
when working on real data!

We relaxed these constraints and provide unprecedented

PAC-Bayesian learning bounds for dependent and/or heavy-tailed
data, a.k.a hostile data.
skip context

https://bguedj.github.io - 6 PAC - 16
Context: PAC bounds for heavy-tailed random variables

Calm before the storm (< 2015)

PAC-Bayesian bounds for unbounded losses, under strong exponential
moments assumptions.
Catoni (2004). Statistical Learning Theory and Stochastic Optimization, Springer

https://bguedj.github.io - 6 PAC - 17
Context: PAC bounds for heavy-tailed random variables
The next big thing (≥ 2015)
I PAC bounds for the (penalized) ERM without an exponential
moments assumption with the small-ball property
Mendelson (2015). Learning without concentration, Journal of ACM Lecué and Mendelson (2016).

Regularization and the small-ball method, The Annals of Statistics Grünwald and Mehta (2016). Fast

Rates for Unbounded Losses, arXiv preprint

I Robust loss functions

Catoni (2016). PAC-Bayesian bounds for the Gram matrix and least squares regression with a random

design, arXiv preprint

I Median-of-means tournaments for estimating the mean

without an exponential moments assumption.
Devroye, Lerasle, Lugosi and Oliveira (2016). Sub-Gaussian mean estimators, The Annals of Statistics

Lugosi and Mendelson (2018). Risk minimization by median-of-means tournaments, Journal of the

European Mathematical Society Lugosi and Mendelson (2017). Regularization, sparse recovery, and

median-of-means tournaments, arXiv preprint Lecué and Lerasle (2017). Learning from MoM’s

principles: Le Cam’s approach, arXiv preprint

https://bguedj.github.io - 6 PAC - 18
Context: PAC bounds for dependent observations
PAC(-Bayesian) bounds have been provided by a series of papers.
However all these works relied on concentration inequalities or limit
theorems for time series for which boundedness or exponential
moments assumption are crucial.
Mohri and Rostamizadeh (2010). Stability bounds for stationary φ-mixing and β-mixing processes, Journal of

Machine Learning Research

Ralaivola, Szafranski and Stempfel (2010). Chromatic PAC-Bayes bounds for non-iid data: Applications to

ranking and stationary β-mixing processes, Journal of Machine Learning Research

Seldin, Laviolette, Cesa-Bianchi, Shawe-Taylor and Auer (2012). PAC-Bayesian inequalities for martingales,

IEEE Transactions on Information Theory

Alquier and Wintenberger (2012). Model selection for weakly dependent time series forecasting, Bernoulli

Agarwal and Duchi (2013). The generalization ability of online algorithms for dependent data, IEEE

Transactions on Information Theory

Kuznetsov and Mohri (2014). Generalization bounds for time series prediction with non-stationary processes,

ALT

https://bguedj.github.io - 6 PAC - 19
Disclaimer

The strategy I’m about to describe yields, at best, the same rates
as those existing in known settings.

https://bguedj.github.io - 6 PAC - 20
Disclaimer

The strategy I’m about to describe yields, at best, the same rates
as those existing in known settings.

However we designed a unified framework to derive PAC-Bayesian

bounds for settings where even no PAC learning bounds were
available (such as heavy-tailed time series).

https://bguedj.github.io - 6 PAC - 20
Notation

Loss function `, observations (X1 , Y1 ), . . . , (Xn , Yn ), family of

predictors (fθ , θ ∈ Θ).

Observations are not required to be independent nor identically

distributed. Let `i (θ) = `[fθ (Xi ), Yi ], and define the (empirical)
risk as
n
1X
rn (θ) = `i (θ),
n
i=1

R(θ) = E rn (θ) .

https://bguedj.github.io - 6 PAC - 21
Key quantities

https://bguedj.github.io - 6 PAC - 22
Key quantities
Definition
For any function g, let
Z
Mφp ,n = E (|rn (θ) − R(θ)|p ) π(dθ).

Definition
Let f be a convex function with f (1) = 0. Csiszár’s f -divergence
between ρ and π is defined by
Z
dρ
Df (ρ, π) = f dπ
dπ

when ρ π and Df (ρ, π) = +∞ otherwise.

Note that K(ρ, π) = Dx log(x ) (ρ, π) and χ2 (ρ, π) = Dx 2 −1 (ρ, π).
https://bguedj.github.io - 6 PAC - 22
Main theorem

p
Fix p > 1, q = p−1 and δ ∈ (0, 1). With probability at least 1 − δ
we have for any distribution ρ
1
Mφq ,n
Z Z
q 1
Rdρ − rn dρ ≤ Dφp −1 (ρ, π) + 1 p .
δ

https://bguedj.github.io - 6 PAC - 23
Proof
Let ∆n (θ) := |rn (θ) − R(θ)|.
Z Z Z Z
dρ
Rdρ − rn dρ ≤ ∆n dρ = ∆n
dπ
dπ
Z q1 Z p p1
q dρ
≤ ∆n dπ dπ (Hölder ineq.)
dπ
R q q1 Z p p1
E ∆n dπ dρ
≤ dπ (Markov, w.p. 1 − δ)
δ dπ
1
Mφq ,n q

1
= Dφp −1 (ρ, π) + 1 p .
δ

Inspired by

Bégin, Germain, Laviolette and Roy (2016). PAC-Bayesian bounds based on the Rényi divergence, AISTATS

https://bguedj.github.io - 6 PAC - 24
R R
We can compare rn dρ (observable) to Rdρ (unknown, the
objective) in terms of
I the moment Mφq ,n (which depends on the distribution of the
data)
I and the divergence Dφp −1 (ρ, π) (which is a measure of the
complexity of the set Θ).

https://bguedj.github.io - 6 PAC - 25
R R
We can compare rn dρ (observable) to Rdρ (unknown, the
objective) in terms of
I the moment Mφq ,n (which depends on the distribution of the
data)
I and the divergence Dφp −1 (ρ, π) (which is a measure of the
complexity of the set Θ).

Corolloray: with probability at least 1 − δ, for any ρ,

1
Mφq ,n
Z Z
q 1
Rdρ ≤ rn dρ + Dφp −1 (ρ, π) + 1 p .
δ

Strong incitement to define our aggregation distribution ρ̂n as the

minimizer of the right-hand side!

https://bguedj.github.io - 6 PAC - 25
Intersection

1. Computing the divergence term

I Discrete case
goto

I Continuous case
goto

2. Bounding the moments

I The iid case
goto

I The dependent case

goto

3. PAC-Bayes bounds to elicit new learning algorithms

goto

4. Conclusion
goto

https://bguedj.github.io - 6 PAC - 26
Computing the divergence term (discrete case)

Assume Card(Θ) = K < ∞ and that π is uniform on Θ. Fix

p
p > 1, q = p−1 and δ ∈ (0, 1). With probability at least 1 − δ
1
Mφq ,n

q
1− 1
R(θ̂ERM ) ≤ inf rn (θ) + K p .
θ∈Θ δ

back to intersection

https://bguedj.github.io - 6 PAC - 27
Computing the divergence term (continuous case)
Assume that there exists d > 0 such that for any γ > 0,
n o
0
≥ γd .

π θ ∈ Θ : rn (θ) ≤ inf 0
rn (θ ) + γ
θ ∈Θ

p
Fix p > 1, q = p−1 , δ ∈ (0, 1) and
h i
πγ (dθ) ∝ π(dθ)1 r (θ) − rn (θ̂ERM ) ≤ γ .

With probability at least 1 − δ

 
1 −d
q 
1
Mφq ,n 1+ dq  d 1+ dq
Z 
n o d 1+ dq 
Rdπγ ≤ inf rn (θ) + + .
θ∈Θ δ  q
 q 


back to intersection

https://bguedj.github.io - 6 PAC - 28
Bounding the moment Mφq ,n : the i.i.d case

Assume that Z
2
s = Var[`1 (θ)]π(dθ) < +∞

then q2
s2

Mφq ,n ≤ .
n
So
1 r
Dφp −1 (ρ, π) + 1 p s 2
Z Z
Rdρ ≤ rn dρ + 1 .
δq n
This rate can not be improved without further assumptions.

back to intersection

https://bguedj.github.io - 6 PAC - 29
Bounding the moment Mφq ,n : the i.i.d case

Assume Card(Θ) = K < +∞ and for any θ, `i (θ) is sub-Gaussian

with parameter σ 2 .

For any δ ∈ (0, 1), with probability at least 1 − δ

s
2K

2eσ 2 log δ
R(θ̂ERM ) ≤ inf rn (θ) + .
θ∈Θ n

This rate can not be improved without further assumptions on the

loss `.
Audibert (2009). Fast learning rates in statistical inference through aggregation, The Annals of Statistics

back to intersection

https://bguedj.github.io - 6 PAC - 30
Bounding the moment Mφq ,n : the dependent case

Definition
The α-mixing coefficients between two σ-algebras F and G are
defined by

α(F, G) = sup P(A ∩ B) − P(A)P(B) .

A∈F,B∈G

Define
αj = α[σ(X0 , Y0 ), σ(Xj , Yj )].
When the future of the series is strongly dependent of the past, αj
will remain constant or slowly decay. When the near future is
almost independent of the past, then the αj quickly decay to 0.

back to intersection

https://bguedj.github.io - 6 PAC - 31
Bounding the moment Mφq ,n : the dependent case
Bounded case: assume P 0 ≤ ` ≤ 1 and (Xi , Yi )i∈Z is a stationary
process which satisfies j∈Z αj < ∞. Then
1X
Mφ2 ,n ≤ αj .
n
j∈Z

https://bguedj.github.io - 6 PAC - 32
Bounding the moment Mφq ,n : the dependent case
Bounded case: assume P 0 ≤ ` ≤ 1 and (Xi , Yi )i∈Z is a stationary
process which satisfies j∈Z αj < ∞. Then
1X
Mφ2 ,n ≤ αj .
n
j∈Z

Unbounded case: assume that (Xi , Yi )i∈Z is a stationary process.

Let 1/r + 2/s = 1 and assume
Z
X 1/r 2
αj < ∞, {E [`si (θ)]} s π(dθ) < ∞.
j∈Z

Then
 
Z X 1
1 2
Mφ2 ,n ≤ {E [`si (θ)]} π(dθ) 
s αjr  .
n
j∈Z

back to intersection
https://bguedj.github.io - 6 PAC - 32
Example
Consider auto-regression with quadratic loss and linear predictors:
Xi = (1, Yi−1 ) ∈ R2 , Θ = R2 and fθ (·) = hθ, ·i. Let
Z
2 X 13
ν = 32E Yi6 3 αj 1 + 4 kθk6 π(dθ) .
j∈Z

With probability at least 1 − δ we have for any ρ

r
ν[1 + χ2 (ρ, π)]
Z Z
Rdρ ≤ rn dρ + .
nδ

Up to our knowledge, first PAC(-Bayesian) bound in the case of a

time series without any boundedness nor exponential moment
assumption.
back to intersection

https://bguedj.github.io - 6 PAC - 33
PAC-Bayesian bounds to elicit new learning algorithms

Reminder:

For p > 1 and q = p/(p − 1), with probability at least 1 − δ we

have for any ρ
1
Mφq ,n
Z Z
q 1
Rdρ ≤ rn dρ + Dφp −1 (ρ, π) + 1 p .
δ

back to intersection

https://bguedj.github.io - 6 PAC - 34
Definition
We define r n = r n (δ, p) as

Mφq ,n
Z
q
r n = min u ∈ R, [u − rn (θ)]+ π(dθ) = .
δ

Such a minimum always exists as the integral is a continuous

function of u, is equal to 0 when u = 0 and → ∞ when u → ∞.
We then define
1
dρ̂n [r n − rn (θ)]+p−1
(θ) = R 1 .
dπ
[r n − rn ]+p−1 dπ

back to intersection

https://bguedj.github.io - 6 PAC - 35
With probability at least 1 − δ,
(Z 1 )
Mφq ,n q
Z
1
Rdρ̂n ≤ r n ≤ inf Rdρ + 2 Dφp −1 (ρ, π) + 1 p .
ρ δ

https://bguedj.github.io - 6 PAC - 36
With probability at least 1 − δ,
(Z 1 )
Mφq ,n q
Z
1
Rdρ̂n ≤ r n ≤ inf Rdρ + 2 Dφp −1 (ρ, π) + 1 p .
ρ δ

Assume that there exists d > 0 such that for any γ > 0,
n o
0
≥ γd .

π θ ∈ Θ : rn (θ) ≤ inf 0
rn (θ ) + γ
θ ∈Θ

With probability at least 1 − δ,

1
Mφq ,n
Z
q+d
Rdρ̂n ≤ r n ≤ inf rn (θ) + 2 .
θ∈Θ δ

back to intersection

https://bguedj.github.io - 6 PAC - 36
Highlights

https://bguedj.github.io - 6 PAC - 37
Highlights

I 6 PAC
I 2 ANR-funded projects for the period 2019–2023
I APRIORI: representation learning and deep neural networks,
with PAC-Bayes
I BEAGLE (PI): agnostic learning, with PAC-Bayes
I H2020 European Commission project PERF-AI: machine
learning algorithms (including PAC-Bayes) applied to aviation
https://bguedj.github.io - 6 PAC - 37

We are hiring!

Interns, Engineers, PhD students, Postdocs

Spread the word!

https://bguedj.github.io - 6 PAC - 38

Assessment 1 Questioning Written Assessment Assessor Copy AURTTA021
0% (1)
Assessment 1 Questioning Written Assessment Assessor Copy AURTTA021
19 pages
Algebra
From Everand
Algebra
Larry C. Grove
5/5 (3)
Orange Game in Negotiation
No ratings yet
Orange Game in Negotiation
2 pages
Tips & Tricks DD15 Troubleshooting
No ratings yet
Tips & Tricks DD15 Troubleshooting
1 page
Bayesian Learning Unit 3 PDF
No ratings yet
Bayesian Learning Unit 3 PDF
18 pages
Acting For Animators
100% (4)
Acting For Animators
145 pages
Summary
No ratings yet
Summary
47 pages
Bayesian Inference and Learning
No ratings yet
Bayesian Inference and Learning
48 pages
Machine Learning: Lecture 6: Bayesian Learning (Based On Chapter 6 of Mitchell T.., Machine Learning, 1997)
No ratings yet
Machine Learning: Lecture 6: Bayesian Learning (Based On Chapter 6 of Mitchell T.., Machine Learning, 1997)
15 pages
Naive Bayes
No ratings yet
Naive Bayes
60 pages
Bayes ML Tutorial
No ratings yet
Bayes ML Tutorial
69 pages
ml 5
No ratings yet
ml 5
28 pages
UNIT-3
No ratings yet
UNIT-3
99 pages
Bark08 Ghahramani Samlbb 01
No ratings yet
Bark08 Ghahramani Samlbb 01
26 pages
07 - Bayesian Learning
No ratings yet
07 - Bayesian Learning
55 pages
Unit-2 Pac
No ratings yet
Unit-2 Pac
22 pages
Bishop2008 Chapter ANewFrameworkForMachineLearnin
No ratings yet
Bishop2008 Chapter ANewFrameworkForMachineLearnin
24 pages
Slide 1
No ratings yet
Slide 1
37 pages
Bayes Algorithm
No ratings yet
Bayes Algorithm
26 pages
Visit:: Join Telegram To Get Instant Updates: Contact: MAIL: Instagram: Instagram: Whatsapp Share
No ratings yet
Visit:: Join Telegram To Get Instant Updates: Contact: MAIL: Instagram: Instagram: Whatsapp Share
25 pages
3.1 New
No ratings yet
3.1 New
12 pages
Week6 - Naive Bayes
No ratings yet
Week6 - Naive Bayes
68 pages
6.1 Bayesian Learning
No ratings yet
6.1 Bayesian Learning
33 pages
Bayesian Learning: Artificial Intelligence and Machine Learning 18CS71
No ratings yet
Bayesian Learning: Artificial Intelligence and Machine Learning 18CS71
24 pages
15CS73 Module 4
No ratings yet
15CS73 Module 4
60 pages
Bayesian Learning
No ratings yet
Bayesian Learning
44 pages
AI Week 14
No ratings yet
AI Week 14
3 pages
Unit 5 - Machine Learning - WWW - Rgpvnotes.in
No ratings yet
Unit 5 - Machine Learning - WWW - Rgpvnotes.in
12 pages
Machine Learning: Foundations: Prof. Nathan Intrator
No ratings yet
Machine Learning: Foundations: Prof. Nathan Intrator
60 pages
Mod 4
No ratings yet
Mod 4
26 pages
E-Note 14654 Content Document 20231228101425AM
No ratings yet
E-Note 14654 Content Document 20231228101425AM
10 pages
Lecture 1: Brief Overview - PAC Learning
No ratings yet
Lecture 1: Brief Overview - PAC Learning
3 pages
Unit 2 Notes
No ratings yet
Unit 2 Notes
7 pages
ML Unit-4
No ratings yet
ML Unit-4
24 pages
Bayesian
No ratings yet
Bayesian
91 pages
Module 2 Notes
No ratings yet
Module 2 Notes
24 pages
module_5_notes BAYESIAN learning notes
No ratings yet
module_5_notes BAYESIAN learning notes
24 pages
ML UNIT 4-1-24
No ratings yet
ML UNIT 4-1-24
24 pages
18CS71 Module 4
No ratings yet
18CS71 Module 4
30 pages
ML Unit-Iii
No ratings yet
ML Unit-Iii
178 pages
Unit-4
No ratings yet
Unit-4
36 pages
ML_UNIT 4
No ratings yet
ML_UNIT 4
15 pages
Lecture 06 Bayesian Networks 07112022 011127pm
No ratings yet
Lecture 06 Bayesian Networks 07112022 011127pm
33 pages
Module - 4 AIML
No ratings yet
Module - 4 AIML
22 pages
Unit 5 - Machine Learning - WWW - Rgpvnotes.in
No ratings yet
Unit 5 - Machine Learning - WWW - Rgpvnotes.in
17 pages
Ryan Adams 140814 Bayesopt Ncap
No ratings yet
Ryan Adams 140814 Bayesopt Ncap
84 pages
ML - Unit4pdf
No ratings yet
ML - Unit4pdf
65 pages
Lecture # 2-1 Probabilistic Models
No ratings yet
Lecture # 2-1 Probabilistic Models
40 pages
MLT Pac
No ratings yet
MLT Pac
3 pages
Unit III
No ratings yet
Unit III
19 pages
Data Mining - Module 7
No ratings yet
Data Mining - Module 7
8 pages
Module 5
No ratings yet
Module 5
24 pages
Unit 2 Bayesian Learning
No ratings yet
Unit 2 Bayesian Learning
50 pages
AI&ML-Q With Answer
No ratings yet
AI&ML-Q With Answer
18 pages
ML - Unit 1 - Part Ii
No ratings yet
ML - Unit 1 - Part Ii
18 pages
Variational Bayes
No ratings yet
Variational Bayes
38 pages
Statistical Learning Framework
No ratings yet
Statistical Learning Framework
7 pages
UNIT 3 - Frequentist Statistics
No ratings yet
UNIT 3 - Frequentist Statistics
65 pages
Bayesian Learning Video Tutorial
No ratings yet
Bayesian Learning Video Tutorial
25 pages
Bayesian Modelling For Data Analysis and Learning From Data
No ratings yet
Bayesian Modelling For Data Analysis and Learning From Data
19 pages
Lec14 15 GenerativeModelsForDiscreteData
No ratings yet
Lec14 15 GenerativeModelsForDiscreteData
74 pages
Square Summable Power Series
From Everand
Square Summable Power Series
Louis de Branges
5/5 (1)
Group Theory I Essentials
From Everand
Group Theory I Essentials
Emil Milewski
No ratings yet
A Short Course in Discrete Mathematics
From Everand
A Short Course in Discrete Mathematics
Edward A. Bender
3/5 (1)
Notes On Performance Management System PDF
100% (1)
Notes On Performance Management System PDF
20 pages
Group 2 Nick Joaquin's - Candido's Apocalypse - Analysis - Genette's Theory Analysis
No ratings yet
Group 2 Nick Joaquin's - Candido's Apocalypse - Analysis - Genette's Theory Analysis
10 pages
SSC GR 10 Electronics Q4 Module 1 WK 1 - v.01-CC-released-22May2021
No ratings yet
SSC GR 10 Electronics Q4 Module 1 WK 1 - v.01-CC-released-22May2021
20 pages
Definition of Beam:: Beams and Support Reactions By: Parag Kamlakar Pal
No ratings yet
Definition of Beam:: Beams and Support Reactions By: Parag Kamlakar Pal
4 pages
Bandpass Project PDF
No ratings yet
Bandpass Project PDF
13 pages
Humangeography5 Rev
No ratings yet
Humangeography5 Rev
46 pages
BENZ Key Manual by ClovisKit
No ratings yet
BENZ Key Manual by ClovisKit
5 pages
Relay Identification: Example CDG31FF002SACH
No ratings yet
Relay Identification: Example CDG31FF002SACH
5 pages
8612 Assignment Il
No ratings yet
8612 Assignment Il
16 pages
Templat Umum JSU Bahasa Inggeris Level 1 Primary School
No ratings yet
Templat Umum JSU Bahasa Inggeris Level 1 Primary School
4 pages
Pre Master Plan Solar Energy Production in Palestine: December 2016
No ratings yet
Pre Master Plan Solar Energy Production in Palestine: December 2016
50 pages
7075
100% (1)
7075
102 pages
dv05 PDF
No ratings yet
dv05 PDF
71 pages
Flow Through Annulus
No ratings yet
Flow Through Annulus
10 pages
The Multiplier Effect of Good Time Management
No ratings yet
The Multiplier Effect of Good Time Management
13 pages
Comprehension Skills
No ratings yet
Comprehension Skills
5 pages
Bull Whip Effect
No ratings yet
Bull Whip Effect
14 pages
2GIG DW20R 345 Install Guide
No ratings yet
2GIG DW20R 345 Install Guide
3 pages
Letter To Employer InternalProjects
No ratings yet
Letter To Employer InternalProjects
2 pages
Briefing For Minister Erwin
No ratings yet
Briefing For Minister Erwin
2 pages
Aristotle's Rational-Political Animal
No ratings yet
Aristotle's Rational-Political Animal
4 pages
Detailed Lesson Plan in Science IV
No ratings yet
Detailed Lesson Plan in Science IV
4 pages
Colegio de Sta. Teresa de Avila
No ratings yet
Colegio de Sta. Teresa de Avila
6 pages
Module 8 Inventory Excel Template
No ratings yet
Module 8 Inventory Excel Template
17 pages
Journal of Nano Technology
No ratings yet
Journal of Nano Technology
15 pages
Design and Analysis of Algorithms: Unit - I
No ratings yet
Design and Analysis of Algorithms: Unit - I
28 pages