0% found this document useful (0 votes)

11 views

AEPSHEP Lecture1

Uploaded by

Aloha

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

11 views

AEPSHEP Lecture1

Uploaded by

Aloha

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 91

2022 ASIA EUROPE PACIFIC SCHOOL OF HIGH-ENERGY PHYSICS

Practical Statistics

Nicolas Berger (LAPP Annecy) Lecture 1

2022 ASIA EUROPE PACIFIC SCHOOL OF HIGH-ENERGY PHYSICS

Practical Statistics

ysi c ist s
a r ti cl e Ph
For P
Nicolas Berger (LAPP Annecy) Lecture 1
Statistics are everywhere “There are three types of lies - lies, damn
lies, and statistics.” – Benjamin Disraeli

Credits: mattbuck / wikimedia Credits: StatLab

And Physics ? “If your experiment needs statistics, you ought to have
done a better experiment” – E. Rutherford 3/
Introduction
Statistical methods play a critical role in
many areas of physics

Higgs discovery : “We have 5σ” !

“5s”

Phys. Lett. B 716 (2012) 1-29

4/
Introduction

Sometimes difficult to distinguish a bona fide discovery from a background fluctuation…

New Physics ?

3.9σ ? 2.1σ ?

JHEP 09 (2016) 1 5/
Introduction

Sometimes difficult to distinguish a bona fide discovery from a background fluctuation…

A few months later...

New Physics ?

3.9σ ? 2.1σ ?

JHEP 09 (2016) 1 5/
Introduction
Precision measurements are another window into BSM effects
→ How to compute (and interpret) measurement intervals
→ How to model systematic uncertainties ?
→ How to get the smallest achievable uncertainties ?

Image credits: CERN courier, LHCb

6/
Lecture Plan
Statistics basic concepts (Today)
[Basic ingredients (PDFs, etc.)]
Statistical Modeling (PDFs for particle physics measurements)
Parameter estimation (maximum likelihood, least-squares, …)

Computing statistical results (Tomorrow)

Model testing (χ2 tests, hypothesis testing, p-values, …)
Discovery testing
Confidence intervals
Disclaimer: the examples and
Upper limits
methods covered in the
lectures will be biased towards
Systematics and further topics (Saturday)
LHC techniques (generally close
Systematics and profiling
[Bayesian techniques] to the state of the art anyway)

The class will be based on both lectures and hands-on tutorials 7/

Randomness in High-Energy Physics

Experimental data is produced by incredibly complex processes

8/
Randomness in High-Energy Physics More
d etail
s in o
lectu ther
res!
Experimental data is produced by incredibly complex processes
Hard scattering

PDFs, Parton shower, Pileup

Decays

Detector response

Reconstruction
Image Credits:
S. Höche,
SLAC-PUB-16160

Randomness involved in all stages

→ Classical randomness: detector response
→ Quantum effects in particle production, decay 9/
Measurement Errors: Energy measurement
Example: measuring the energy
of a photon in a calorimeter

Calorimeter Readout

10
/
Measurement Errors: Energy measurement
Example: measuring the energy
of a photon in a calorimeter

Calorimeter Readout

g Energy
deposit

Perfect
case

10
/
Measurement Errors: Energy measurement
Example: measuring the energy
of a photon in a calorimeter

Calorimeter Readout

g
Perfect Real
case life

10
/
Measurement Errors: Energy measurement
Example: measuring the energy
of a photon in a calorimeter
Measure leakage behind calorimeter

Measure leakage into neighboring cells

Calorimeter Readout

g
Perfect Real
case life

10
/
Measurement Errors: Energy measurement
Example: measuring the energy
of a photon in a calorimeter
Measure leakage behind calorimeter

Measure leakage into neighboring cells

Calorimeter Readout

g
Perfect Real
case life

Cannot predict the measured value for a given event

10
⇒ Random process ⇒ Need a probabilistic description /
Quantum Randomness: H®ZZ*®4l

http://www.phdcomics.com/comics/archive.php?comicid=1489
Phys. Rev. D 91, 012006

Rare process: Expect 1 signal

event every ~6 days

11
/
Quantum Randomness: H®ZZ*®4l

http://www.phdcomics.com/comics/archive.php?comicid=1489
Phys. Rev. D 91, 012006

Rare process: Expect 1 signal

event every ~6 days

View online
12
/
Quantum Randomness: H®ZZ*®4l

http://www.phdcomics.com/comics/archive.php?comicid=1489
Phys. Rev. D 91, 012006

Rare process: Expect 1 signal

event every ~6 days

“Will I get an event today ?” → only probabilistic answer

13
/
Statistical Modeling

14
/
Probability Distributions

Probabilistic treatment of possible outcomes

Þ Probability Distribution

Example: two-coin toss

→ Fractions of events in each bin i
converge to a limit pi

Probability distribution :
{ Pi } for i = 0, 1, 2
Properties
• Pi > 0
• Σ Pi=1

15
/
Continuous Variables: PDFs
Continuous variable: can consider per-bin probabilities pi, i=1.. nbins

Bin size → 0 : Probability distribution function P(x)

High PDF value

⇒ High chance to get a measurement here

x P(x) > 0, ∫ P(x) dx = 1

Generalizes to multiple variables :

P(x,y) > 0, ∫ P(x,y) dx dy = 1

Contours: P(x,y)
16
x /
Continuous Variables: PDFs
Continuous variable: can consider per-bin probabilities pi, i=1.. nbins

Bin size → 0 : Probability distribution function P(x)

High PDF value

⇒ High chance to get a measurement here

x P(x) > 0, ∫ P(x) dx = 1

Generalizes to multiple variables :

P(x,y) > 0, ∫ P(x,y) dx dy = 1

Contours: P(x,y)
16
x /
PDF Mean
PDF Properties: Mean
E(X) = <X> : Mean of X – expected outcome
on average over many measurements

⟨ X ⟩ = ∑ xi Pi or
i

⟨ X ⟩ = ∫ x P ( x) dx

→ Property of the PDF PDF Mean Sample Mean

For measurements x1... xn,

then can compute the Sample mean:

1
x̄ = ∑ x i
n i
→ Property of the sample
→ approximates the PDF mean. 17
/
PDF Properties: (Co)variance
Variance of X: 2
Var( X )=⟨ ( X − ⟨ X ⟩ ) ⟩ RMS
→ Average square of deviation from mean
→ RMS(X) = ÖVar(X) = σX standard deviation
Can be approximated by sample variance:

1
σ^ 2 = ∑ i
n−1 i
( x − x̄) 2

Cov(x, y) > 0

Covariance of X and Y: y

Cov ( X ,Y )=⟨ ( X − ⟨ X ⟩ ) (Y − ⟨ Y ⟩ ) ⟩
x
→ Large if variations of X and Y are “synchronized”

Cov ( X ,Y )
Correlation coefficient ρ= -1 ≤ ρ ≤ 1
√ Var( X ) Var(Y ) 18
/
PDF Properties: (Co)variance
Variance of X: 2
Var( X )=⟨ ( X − ⟨ X ⟩ ) ⟩ RMS
→ Average square of deviation from mean
→ RMS(X) = ÖVar(X) = σX standard deviation
Can be approximated by sample variance:

1
σ^ 2 = ∑ i
n−1 i
( x − x̄) 2

Cov(x, y) < 0

Covariance of X and Y: y

Cov ( X ,Y )=⟨ ( X − ⟨ X ⟩ ) (Y − ⟨ Y ⟩ ) ⟩
x
→ Large if variations of X and Y are “synchronized”

1
σ^ 2 = ∑ i
n−1 i
( x − x̄) 2

Cov(x, y) < 0

Covariance of X and Y: y

Cov ( X ,Y )=⟨ ( X − ⟨ X ⟩ ) (Y − ⟨ Y ⟩ ) ⟩ Cov(X, Y) = 0

x
→ Large if variations of X and Y are “synchronized”

Cov ( X ,Y )
Correlation coefficient ρ= -1 ≤ ρ ≤ 1
√ Var( X ) Var(Y ) 18
/
“Linear” vs. “non-linear” correlations
For non-Gaussian cases, the Correlation coefficient ρ is not the whole story:

2 ρ σ 1σ 2
tan 2 α = 2 2
σ 1− σ 2

Source: Wikipedia

In particular, variables can still be correlated even

when ρ=0 : “Non-linear” correlations. 19
/
Some vocabulary...

X, Y... are Random Variables (continuous or discrete), a.ka. observables :

→ X can take any value x, with probability P(X=x).

→ P(X=x) is the PDF of X, a.k.a. the Statistical Model.

→ The Observed data is one value xobs of X,

drawn from P(X=x).

x
20
x /
Gaussian PDF
Gaussian distribution:
2
( x− X 0 )
1 −
2 σ2
s
G ( x ; X 0 ,σ )= e
σ √2π X0
→ Mean : X0
→ Variance : σ2 (⇒ RMS = σ)

1
Generalize to N dimensions: 1 − ( x− X 0)T C −1 ( x− X 0 )
2
G ( x ; X 0 , C )= N 1/ 2
e
→ Mean : X0 [(2 π ) |C|] 2ρ σ1 σ2
tan 2 α =
→ Covariance matrix : σ 21 − σ 22
x2

α
C=
[
Var ( X 1 ) Cov ( X 1 , X 2 )
Cov ( X 2 , X 1 ) Var ( X 2 ) ]
=
[ σ 21
ρσ1σ2
ρ σ 1 σ2
σ
2
2
] x1
21
/
Central Limit Theorem (*) Assuming σX < ∞
and other regularity
conditions
For an observable X with any(*) distribution, one has
n
1 σX
x̄ = ∑ xi ∼ G ( ⟨ X ⟩ ,
n→ ∞
)
n i=1 √n
What this means:
• The average of many measurements is always Gaussian, whatever the
distribution for a single measurement
• The mean of the Gaussian is the average of the single measurements
• The RMS of the Gaussian decreases as Ön : smaller fluctuations when
averaging over many measurements
n

∑ xi
n→ ∞
Another version: ∼ G( n ⟨ X ⟩ , √ n σ X)
i=1

Mean scales like n, but RMS only like Ön 22

/
Central Limit Theorem in action

Draw events from a parabolic distribution (e.g. decay cos θ*)

n
1
x̄ = ∑ x i
n i=1

x̄
Distribution becomes Gaussian, although very non-Gaussian originally
23
Distribution becomes narrower as expected (as 1/√n ) /
Central Limit Theorem in action

Draw events from a parabolic distribution (e.g. decay cos θ*)

n
1
x̄ = ∑ x i
n i=1

x̄
Distribution becomes Gaussian, although very non-Gaussian originally
23
Distribution becomes narrower as expected (as 1/√n ) /
Central Limit Theorem in action

Draw events from a parabolic distribution (e.g. decay cos θ*)

n
1
x̄ = ∑ x i
n i=1

x̄
Distribution becomes Gaussian, although very non-Gaussian originally
23
Distribution becomes narrower as expected (as 1/√n ) /
Central Limit Theorem in action

Draw events from a parabolic distribution (e.g. decay cos θ*)

n
1
x̄ = ∑ x i
n i=1

x̄
Distribution becomes Gaussian, although very non-Gaussian originally
23
Distribution becomes narrower as expected (as 1/√n ) /
Central Limit Theorem in action

Draw events from a parabolic distribution (e.g. decay cos θ*)

n
1
x̄ = ∑ x i
n i=1

x̄
Distribution becomes Gaussian, although very non-Gaussian originally
23
Distribution becomes narrower as expected (as 1/√n ) /
Central Limit Theorem in action

Draw events from a parabolic distribution (e.g. decay cos θ*)

n
1
x̄ = ∑ x i
n i=1

x̄
Distribution becomes Gaussian, although very non-Gaussian originally
23
Distribution becomes narrower as expected (as 1/√n ) /
Central Limit Theorem in action

Draw events from a parabolic distribution (e.g. decay cos θ*)

n
1
x̄ = ∑ x i
n i=1

x̄
Distribution becomes Gaussian, although very non-Gaussian originally
23
Distribution becomes narrower as expected (as 1/√n ) /
Central Limit Theorem in action

Draw events from a parabolic distribution (e.g. decay cos θ*)

n
1
x̄ = ∑ x i
n i=1

x̄
Distribution becomes Gaussian, although very non-Gaussian originally
23
Distribution becomes narrower as expected (as 1/√n ) /
P(|x – X0| > Zσ)
Gaussian Quantiles Z
1 0.317

Consider z= (
x− X 0
σ ) “pull” of x
2
3
0.045
0.003

G(x; X0,σ) depends only on z ~ G(z; 0,1) 4 3 x 10-5

5 6 x 10-7
Probability P(|x – X0| > Zσ) to be away from the mean:

Cumulative Distribution Function (CDF)

of the Gaussian :

z
Φ ( z) = ∫− ∞ G(u ; 0,1) du
24
/
Z P(|x – X0| > Zσ)
Gaussian Quantiles
1 0.317

Consider z= (
x− X 0
σ ) “pull” of x
2
3
0.045
0.003

G(x; X0,σ) depends only on z ~ G(z; 0,1) 4 3 x 10-5

5 6 x 10-7
Probability P(|x – X0| > Zσ) to be away from the mean:

Cumulative Distribution Function (CDF)

of the Gaussian :

z
Φ ( z) = ∫− ∞ G(u ; 0,1) du
24
/
Z P(|x – X0| > Zσ)
Gaussian Quantiles
1 0.317

Consider z= (
x− X 0
σ ) “pull” of x
2
3
0.045
0.003

G(x; X0,σ) depends only on z ~ G(z; 0,1) 4 3 x 10-5

5 6 x 10-7
Probability P(|x – X0| > Zσ) to be away from the mean:

Cumulative Distribution Function (CDF)

of the Gaussian :

z
Φ ( z) = ∫− ∞ G(u ; 0,1) du
24
/
σ1
Chi-squared

Multiple Independent Gaussian

σ2
variables xi : Define
2 =1
χ
4
( )
n 0 2
xi − x 2=
χ =∑
2
σi
i
χ
i=1

Measures global distance from

χ2
reference point (x10 …. xn0)

Distribution depends on n :

Rule of thumb:
2
χ / n should be ≾ 1 25
/
σ1
Chi-squared

Multiple Independent Gaussian

σ2
variables xi : Define
2 =1
χ
4
( )
n 0 2
xi − x 2=
χ =∑
2
σi
i
χ
i=1

Measures global distance from χ

χ 2 /n
2

reference point (x10 …. xn0)

Distribution depends on n :

Rule of thumb:
2
χ / n should be ≾ 1 25
/
Histogram Chi-squared
Histogram χ2 with respect to a reference shape:
• Assume an independent Gaussian distribution in each bin
• Degrees of freedom = (number of bins) – (number of fit parameters)

BLUE histogram vs. flat reference

✔
χ = 12.9, p(χ =12.9, n=10) = 23%
2 2

26
/
Histogram Chi-squared
Histogram χ2 with respect to a reference shape:
• Assume an independent Gaussian distribution in each bin
• Degrees of freedom = (number of bins) – (number of fit parameters)

BLUE histogram vs. flat reference

✔
χ = 12.9, p(χ =12.9, n=10) = 23%
2 2

RED histogram vs. flat reference

χ2 = 38.8, p(χ2=38.8, n=10) = 0.003% ✘

BLUE histogram vs. flat reference

✔
χ = 12.9, p(χ =12.9, n=10) = 23%
2 2

RED histogram vs. flat reference

χ2 = 38.8, p(χ2=38.8, n=10) = 0.003% ✘
RED histogram vs. correct reference
χ2 = 9.5, p(χ2=9.5, n=10) = 49% ✔

26
/
Statistical Modeling

27
/
Example 1: Z counting Phys. Lett. B 759 (2016) 601

Measure the cross-section (event rate) of the

Z→ ee process

35000 ± 187
175 ± 8
ndata −N bkg
fid
σ = (81 ± 2) pb-1
C fid L
0.552 ± 0.006

σfid = 0.781 ± 0.004 (stat) ± 0.018 (syst) nb

Fluctuations in Other uncertainties

the data counts (assumptions, parameter values)

28
“Single bin counting” : only data input is ndata. /
Example 2: ttH→bb arXiv:2111.06712

Event counting in different regions:

Multiple-bin counting

Lots of information available

→ Potentially higher sensitivity
→ How to make optimal use of it ?

29
/
Example 3: unbinned modeling ATLAS-CONF-2017-045

All modeling done using continuous distributions:

S B
P total ( m γ γ ) = P signal ( m γ γ ; m H ) + P bkg ( m γ γ ) 30
S+ B S+ B /
How to count
Common situation: produce many events N, select a (very) small fraction P
→ In principle, binomial process
→ In practice, P ≪ 1, N ≫ 1, ⇒ Poisson approximation.
→ i.e. very rare process, but very many trials so still expect to see good events
n
Poisson distribution P ( n ; λ )=e
−λ λ λ = NP
n! N N≫1
(1−P) N−n n ≪ N
∼
( 1− λ
N ) ∼ e− λ

Mean = λ For a counting

Variance = λ measurement,
σ = √λ
RMS = √N
Central limit theorem :
becomes Gaussian for large λ :
λ→∞
P ( λ ) → G( λ , √ λ ) 31
/
How to count
Common situation: produce many events N, select a (very) small fraction P
→ In principle, binomial process
→ In practice, P ≪ 1, N ≫ 1, ⇒ Poisson approximation.
→ i.e. very rare process, but very many trials so still expect to see good events
n
Poisson distribution P ( n ; λ )=e
−λ λ λ = NP
n! N N≫1
(1−P) N−n n ≪ N
∼
( 1− λ
N ) ∼ e− λ

Mean = λ For a counting

Variance = λ measurement,
σ = √λ
RMS = √N
Central limit theorem :
becomes Gaussian for large λ :
λ→∞
P ( λ ) → G( λ , √ λ ) 31
/
Statistical Model for Counting

Observable: number of events n

Typically both Signal and Background present:

n
−( S + B) ( S + B) S : # of events from signal process
P ( n ; S , B)=e
n! B : # of events from bkg. process(es)

Model has parameters S and B.

B can be known a priori or not (S usually not...)

→ Example: assume B is known, use measured n to find out about S.

32
/
Multiple counting bins
Count in bins of a variable ⇒ histogram n1 ... nN.
(N : number of bins)
Per-bin fractions (=shapes)
of Signal and Background

N ni
( S f S , i + B f B , i)
P ({ni } ; S , B) = ∏ e
−( S f S , i + B f B , i )

i=1 ni !
Poisson distribution in each bin

Shapes f typically obtained from simulated events (Monte Carlo)

→ HEP: generally good modeling from simulation, although some uncertainties need
to be accounted for.

Also not always possible to generate sufficiently large MC samples

MC stat fluctuations can create artefacts, especially for S ≪ B. 33
/
Model Parameters

Model typically includes:

• Parameters of interest (POIs) : what we want to measure
→ S, mW, …

• Nuisance parameters (NPs) : other parameters needed to define the model

→ Background levels (B)

→ For binned data, fsigi , fbkgi

NPs must be either:

→ Known a priori (within uncertainties) or
→ Constrained by the data
34
/
Takeaways
Random data must be described using a statistical model:

Description Observable Likelihood

n Poisson
Counting −(S + B) (S + B)n
P(n; S , B)=e
n!
ni, i = 1 .. Nbins Poisson product
Binned shape nbins sig bkg n i
analysis ) (S f i + B f i )
P(ni ; S , B)=∏ e
sig bkg
−(S f i +Bf i

i=1 ni !

mi, i = 1 .. nevts Extended Unbinned Likelihood

Unbinned nevts
e−(S + B)
shape analysis P(mi ; S, B)=
nevts !
∏
i=1
S P sig (mi )+ B P bkg (mi )

Includes parameters of interest (POIs) but also nuisance parameters (NPs)

Next step: use the model to obtain information on the POIs 35
/
Maximum Likelihood Estimation

36
/
What a PDF is for
Model describes the distribution of the observable: P(data; parameters)
⇒ Possible outcomes of the experiment, for given parameter values
Can draw random events according to PDF : generate pseudo-data

P ( λ =5) 2, 5, 3, 7, 4, 9, ….
Each entry = separate “experiment”

Generate

Unbinned

37
/
What a PDF is also for: Likelihood
Model describes the distribution of the observable: P(data; parameters)
⇒ Possible outcomes of the experiment, for given parameter values
We want the other direction: use data to get information on parameters

P ( λ =?) 2

Estimate
?

Likelihood: L(parameters) = P(data; parameters)

38
→ same as the PDF, but seen as function of the parameters /
Maximum Likelihood Estimation
To estimate a parameter μ, find the value μ̂ that maximizes L(μ)

Maximum Likelihood
μ^ = arg max L(μ )
Estimator (MLE) μ̂ :

S = 0.5

L(S; n=5)
Observed L(S) max
P(n; S)

Value n=5
@ Ŝ = 5
given n=5
S=5 S = 20
n s
n

MLE: the value of μ for which this data was most likely to occur
The MLE is a function of the data – itself an observable
No guarantee it is the true value (data may be “unlikely”) but sensible estimate 39
/
Gaussian case

data

Best-fit of Gaussian PDF mean to observed data

40
/
Gaussian case

data

Best-fit of Gaussian PDF mean to observed data

40
/
Gaussian case

data

Best-fit of Gaussian PDF mean to observed data

40
/
Multiple Gaussian bins

-2 log Likelihood:

( )
N bins 2
ni – y i (μ )
λ (μ ) = −2 log L(μ )= ∑ σi
i=1

Maximum likelihood ⇔ Minimum χ2

⇔ Least-squares
minimization

However typically need to perform non-linear minimization in other cases.

HEP practice:
● MINUIT (C++ library within ROOT, numerical gradient descent)

● scipy.minimize – using NumPy/TensorFlow/PyTorch/... backends

→ Many algorithms – gradient-based, etc.
41
/
Multiple Gaussian bins

-2 log Likelihood:

( )
N bins 2
ni – y i (μ )
λ (μ ) = −2 log L(μ )= ∑ σi
i=1

Maximum likelihood ⇔ Minimum χ2

⇔ Least-squares
minimization

However typically need to perform non-linear minimization in other cases.

HEP practice:
● MINUIT (C++ library within ROOT, numerical gradient descent)

● scipy.minimize – using NumPy/TensorFlow/PyTorch/... backends

→ Many algorithms – gradient-based, etc.
41
/
Multiple Gaussian bins

-2 log Likelihood:

( )
N bins 2
ni – y i (μ )
λ (μ ) = −2 log L(μ )= ∑ σi
i=1

Maximum likelihood ⇔ Minimum χ2

⇔ Least-squares
minimization

However typically need to perform non-linear minimization in other cases.

HEP practice:
● MINUIT (C++ library within ROOT, numerical gradient descent)

● scipy.minimize – using NumPy/TensorFlow/PyTorch/... backends

→ Many algorithms – gradient-based, etc.
41
/
Multiple Gaussian bins

-2 log Likelihood:

( )
N bins 2
ni – y i (μ )
λ (μ ) = −2 log L(μ )= ∑ σi
i=1

Maximum likelihood ⇔ Minimum χ2

⇔ Least-squares
minimization

However typically need to perform non-linear minimization in other cases.

HEP practice:
● MINUIT (C++ library within ROOT, numerical gradient descent)

● scipy.minimize – using NumPy/TensorFlow/PyTorch/... backends

→ Many algorithms – gradient-based, etc.
42
/
Multiple Gaussian bins

-2 log Likelihood:

( )
N bins 2
ni – y i (μ )
λ (μ ) = −2 log L(μ )= ∑ σi
i=1

Maximum likelihood ⇔ Minimum χ2

⇔ Least-squares
minimization

However typically need to perform non-linear minimization in other cases.

HEP practice:
● MINUIT (C++ library within ROOT, numerical gradient descent)

● scipy.minimize – using NumPy/TensorFlow/PyTorch/... backends

→ Many algorithms – gradient-based, etc.
42
/
Multiple Gaussian bins

-2 log Likelihood:

( )
N bins 2
ni – y i (μ )
λ (μ ) = −2 log L(μ )= ∑ σi
i=1

Maximum likelihood ⇔ Minimum χ2

⇔ Least-squares
minimization

However typically need to perform non-linear minimization in other cases.

HEP practice:
● MINUIT (C++ library within ROOT, numerical gradient descent)

● scipy.minimize – using NumPy/TensorFlow/PyTorch/... backends

→ Many algorithms – gradient-based, etc.
42
/
Multiple Gaussian bins

-2 log Likelihood:

( )
N bins 2
ni – y i (μ )
λ (μ ) = −2 log L(μ )= ∑ σi
i=1

Maximum likelihood ⇔ Minimum χ2

⇔ Least-squares
minimization

However typically need to perform non-linear minimization in other cases.

HEP practice:
● MINUIT (C++ library within ROOT, numerical gradient descent)

● scipy.minimize – using NumPy/TensorFlow/PyTorch/... backends

→ Many algorithms – gradient-based, etc.
42
/
Hands-ons
Each lecture statistics lecture comes with “hands-on” exercises.
The hands-on session will be based on Jupyter notebooks built using the
numpy/scipy/pyplot stack.

If you have a computer, please install anaconda before the start of the class.
This provides a consistent installation of python, JupyterLab, etc.
→ Alternatively, you can also install JupyterLab as a standalone package.
→ Another solution is to run the notebooks on the public jupyter servers at
mybinder.org. This will probably be slower but avoids a local install.

No hands-on today, but have a look after the course.

Please be prepared to run the hands-ons during lectures 2 and 3 !
Links to resources
The hands-on resources for each lecture are listed below:

Lecture 1 notebook [solutions] binder [solutions] Today

Lecture 2 notebook binder

Lecture 3 notebook binder

● Use the notebook links if you have a local install: save the notebook
locally and open it with your JupyterLab installation.
● Use the binder links to use public servers: the links will open the
notebooks in a remote server sessions in your browser.

Notebooks with solutions to the exercises will be posted after the lectures.
Please let me know in case of technical issues running the notebooks!
Extra Slides

45
/
Error Bars
Strictly speaking, the uncertainty is given by the model :
→ Bin central value ~ mean of the bin PDF
→ Bin uncertainty ~ RMS of the bin PDF
The data is just what it is, a simple observed point.

⇒ One should in principle show the error bar on the prediction.

→ In practice, the usual convention is to have error bars on the data points.

46
/
Error Bars
Strictly speaking, the uncertainty is given by the model :
→ Bin central value ~ mean of the bin PDF
→ Bin uncertainty ~ RMS of the bin PDF
The data is just what it is, a simple observed point.

⇒ One should in principle show the error bar on the prediction.

→ In practice, the usual convention is to have error bars on the data points.

46
/
Rare Processes ?

HEP : almost always use Poisson

distributions. Why ?

ATLAS :
• Event rate ~ 1 GHz

(L~1034 cm-2s-1~10 nb-1/s, stot~108 nb, )

• Trigger rate ~ 1 kHz

(Higgs rate ~ 0.1 Hz)

⇒ p ~ 10-6 ≪ 1 (pH→γγ ~ 10-13)

A day of data: N ~ 1014 ≫ 1

Þ Poisson regime! Similarly true in many W.J. Stirling, private

communication

other physics situations.

47
(Large N = design requirement, to get not-too-small l=Np...) /
Unbinned Shape Analysis Signal

Observable: set of values m1... mn, one per event

s
→ Describe shape of the distribution of m mH
→ Deduce the probability to observe m1... mn

H→γγ-inspired example: slope a

• Gaussian signal P signal ( m) = G( m ; m H , σ )
• Exponential bkg P bkg ( m) = α e−α m Background
Expected yields : S, B
⇒ Total PDF for a single event:
S B −α m
P total (m) = G (m ; m H , σ ) + α e
S+ B S+ B

⇒ Total PDF for a dataset Probability to observe Total

the value mi
Probability to observe n events

n n
( S+ B) S B
P ({mi }i=1… n ) = e −( S+ B)
n!
∏ S+ B
G( mi ; m H , σ ) +
S+ B
−α m
α e 48 i

i=1 /
Poisson Example
n
Assume Poisson distribution with B = 0 : −S S
P ( n ; S) = e
Say we observe n=5, want to infer information on the parameter S n!
→ Try different values of S for a fixed data value n=5
→ Varying parameter, fixed data: likelihood
5
−S S
L( S ; n=5)=e
5!

Observed
Value n=5

49
n /
Poisson Example
n
Assume Poisson distribution with B = 0 : −S S
P ( n ; S) = e
Say we observe n=5, want to infer information on the parameter S n!
→ Try different values of S for a fixed data value n=5
→ Varying parameter, fixed data: likelihood
5
−S S
L( S ; n=5)=e
5!
Read L(S; n=5) here

Observed
P(S = 0.5)
Value n=5
Low
likelihood

Observed
P(S = 0.5)
Value n=5
Low
likelihood P(S = 5)
High
likelihood

Observed
P(S = 0.5)
Value n=5
Low
likelihood P(S = 5)
P(S = 20)
High
Low
likelihood
likelihood

Observed L(S; n=5):

P(S = 0.5) Likelihood
Value n=5
Low
P(S = 5) of S for n=5
likelihood
High
likelihood

49
n /
MLEs in Shape Analyses
Binned shape analysis:
N
L(S ; ni ) = P(n i ; S) = ∏ Pois (ni ; S f i + Bi )
i=1

Maximize global L(S) (each bin may prefer a different S)

In practice easier to minimize
N
λ Pois (S) = −2 log L( S) = −2 ∑ log Pois(ni ; S f i + Bi ) Needs a computer...
i=1
In the Gaussian limit

( )
N N 2
ni −(S f i + B i )
λ Gaus (S) = ∑ −2 log G (ni ; S f i + Bi , σ i ) = ∑ σi χ2 formula!
i=1 i=1

→ Gaussian MLE (min χ2 or min λGaus) : Best fit value in a χ2 (Least-squares) fit
→ Poisson MLE (min λPois) : Best fit value in a likelihood fit (in ROOT, fit option “L”)
In RooFit, λPois ⇒ RooAbsPdf::fitTo(), λGaus ⇒ RooAbsPdf::chi2FitTo().

In both cases, MLE ⇔ Best Fit 50

/
H→γγ
nevts
−(S + B)
L(S , B ; mi )=e ∏
i=1
S P sig (mi )+ B P bkg (m i )

Estimate the MLE Ŝ of S ?

Ŝ
→ Perform (likelihood) best-fit of
model to data
⇒ fit result for S is the desired Ŝ.

In particle physics, often use the

MINUIT minimizer within ROOT.

ATLAS-CONF-2017-045 51
/
MLE Properties

( )
* 2
^
( μ−μ )
• Asymptotically Gaussian P ( μ^ ) ∝ exp − 2
for n → ∞
* 2 σ μ^
and unbiased ⟨ μ^ ⟩ = μ for n → ∞

Standard deviation of the distribution of μ̂

for large enough datasets

• Asymptotically Efficient : σμ̂ is the lowest possible value (in the limit n®¥)
among consistent estimators.
→ MLE captures all the available information in the data
n→∞
• Also consistent: μ̂ converges to the true value for large n, μ^ → μ *
• Log-likelihood : Can also minimize l = -2 log L

→ Usually more efficient numerically

→ For Gaussian L, l is parabolic:

• Can drop multiplicative constants in L (additive constants in l) 52
/
Extra: Fisher Information

Fisher Information:
⟨( ⟩ ⟨ ⟩
2 2
I (μ ) = ∂ log L(μ )
∂μ ) = − ∂ 2 log L(μ )
∂μ
Measures the amount of information available in the measurement of μ.

Gaussian case:
1 ● For a Gaussian estimator μ̃
Gaussian likelihood: I (μ ) = 2
σ Gauss
( (~
)
* 2
μ −μ )
P (~
μ ) ∝ exp −
→ smaller σGauss ⇒ more information. 2
2 σ ~μ
● MLE: Var(μ̂) = σμ̂2
1
Cramer-Rao bound: Var( ~
μ)≥ Cramer-Rao: Var(μ̃) ≥ σGauss2 = σμ̃2
I (μ )
For any estimator μ̃ .

→ cannot be more precise than allowed by information in the measurement.

Efficient estimators reach the bound : e.g. MLE in the large dataset limit.
53
/
High-mass X→γγ Search: JHEP 09 (2016) 1
Some Examples
Higgs Discovery: Phys. Lett. B 716 (2012) 1-29

3.9σ

p0 = 1.8 ´ 10-9 Û Z = 5.9σ

54
/

PHYS 101 Lecture Note PDF
100% (2)
PHYS 101 Lecture Note PDF
329 pages
Introduction Experimental Strength of Materials
No ratings yet
Introduction Experimental Strength of Materials
53 pages
Chapter 1 - Introduction To Science
No ratings yet
Chapter 1 - Introduction To Science
43 pages
Simulation For High Energy Physics Detectors: Mauro R. Cosentino
No ratings yet
Simulation For High Energy Physics Detectors: Mauro R. Cosentino
28 pages
Hexp C13
No ratings yet
Hexp C13
67 pages
Phys 110 Lab Manual
No ratings yet
Phys 110 Lab Manual
100 pages
G 2lattice2024
No ratings yet
G 2lattice2024
55 pages
Modelling of The Growth Phase of Dalmarnock Fire Test One (Fire and Materials 2011)
No ratings yet
Modelling of The Growth Phase of Dalmarnock Fire Test One (Fire and Materials 2011)
28 pages
Chem 30 Unit Plan - Cameron Stuchly
No ratings yet
Chem 30 Unit Plan - Cameron Stuchly
9 pages
CHE101A F2020 Lec1-Annotated PDF
No ratings yet
CHE101A F2020 Lec1-Annotated PDF
35 pages
The Impact of Crystal Light Yield Non-Proportional
No ratings yet
The Impact of Crystal Light Yield Non-Proportional
10 pages
Lecture1 KP 24
No ratings yet
Lecture1 KP 24
45 pages
PHYSICS BSC BED 3 RD SEMESTER
No ratings yet
PHYSICS BSC BED 3 RD SEMESTER
15 pages
Precision tests of fundamental physics with η and η′ mesons
No ratings yet
Precision tests of fundamental physics with η and η′ mesons
124 pages
Coefficient of Linear Expansion
No ratings yet
Coefficient of Linear Expansion
7 pages
Pendahuluan Kimia Dasar
No ratings yet
Pendahuluan Kimia Dasar
11 pages
Nbti Thesis
100% (3)
Nbti Thesis
7 pages
Physics - 2 - Group
No ratings yet
Physics - 2 - Group
5 pages
THE GENERAL SCIENCE COMPENDIUM For IAS Prelims CSAT Paper 1 UPSC State PSC Disha Experts - Download the ebook now for full and detailed access
No ratings yet
THE GENERAL SCIENCE COMPENDIUM For IAS Prelims CSAT Paper 1 UPSC State PSC Disha Experts - Download the ebook now for full and detailed access
46 pages
ConChem Quarter 1
No ratings yet
ConChem Quarter 1
45 pages
A Low-Cost Quantitative Absorption Spectrophotometer
No ratings yet
A Low-Cost Quantitative Absorption Spectrophotometer
4 pages
FAP0015 Ch01 Measurement
No ratings yet
FAP0015 Ch01 Measurement
35 pages
PDF
No ratings yet
PDF
4 pages
THE GENERAL SCIENCE COMPENDIUM For IAS Prelims CSAT Paper 1 UPSC State PSC Disha Experts download
No ratings yet
THE GENERAL SCIENCE COMPENDIUM For IAS Prelims CSAT Paper 1 UPSC State PSC Disha Experts download
54 pages
EEE 101L Lab Manual
No ratings yet
EEE 101L Lab Manual
40 pages
Ftir PHD Thesis
100% (1)
Ftir PHD Thesis
7 pages
Photovoltaic Module Energy Yield Measurements Existing Approaches and Best Practice by Task 13 PDF
No ratings yet
Photovoltaic Module Energy Yield Measurements Existing Approaches and Best Practice by Task 13 PDF
134 pages
U 1 Ohnotes 18 F 2005
No ratings yet
U 1 Ohnotes 18 F 2005
16 pages
Calibration of Calorimeters and Thermal Analyzers.: S3 Project Raffort Théo, Bouvier Téo, Starosta Yvann, MCPC A
No ratings yet
Calibration of Calorimeters and Thermal Analyzers.: S3 Project Raffort Théo, Bouvier Téo, Starosta Yvann, MCPC A
8 pages
Manual PH516 Jan-May-2024 v1
No ratings yet
Manual PH516 Jan-May-2024 v1
37 pages
paper 7
No ratings yet
paper 7
13 pages
BPI Campus02 Part01 WS23
No ratings yet
BPI Campus02 Part01 WS23
47 pages
SHE_Manual_2024-05-03
No ratings yet
SHE_Manual_2024-05-03
19 pages
As Physics Weblinks
No ratings yet
As Physics Weblinks
30 pages
Anis Suhaila Shuib (PHD), Amicheme, Ciem: Email: Anissuhaila - Shuib@Taylors - Edu.My Office: Block C 9-16
No ratings yet
Anis Suhaila Shuib (PHD), Amicheme, Ciem: Email: Anissuhaila - Shuib@Taylors - Edu.My Office: Block C 9-16
25 pages
MUO Writeup
No ratings yet
MUO Writeup
15 pages
Solution Manual For Engineering Fluid Mechanics 10th Edition by Elgerpdf PDF Free
No ratings yet
Solution Manual For Engineering Fluid Mechanics 10th Edition by Elgerpdf PDF Free
54 pages
ʅ M T I T: Nuclear Physics: Study of Nuclear Structure and
No ratings yet
ʅ M T I T: Nuclear Physics: Study of Nuclear Structure and
2 pages
Group 1
No ratings yet
Group 1
47 pages
SOP BSC2023
No ratings yet
SOP BSC2023
17 pages
third_form_physics_manual_-_lv_2016_-_student_version
No ratings yet
third_form_physics_manual_-_lv_2016_-_student_version
29 pages
Electromagnetic
No ratings yet
Electromagnetic
67 pages
Physics 103/105 Lab Manual: Princeton University Physics Department
No ratings yet
Physics 103/105 Lab Manual: Princeton University Physics Department
126 pages
BSc.PhysicsHonoursSyllabus2024-48-55
No ratings yet
BSc.PhysicsHonoursSyllabus2024-48-55
8 pages
Thesis Czts
100% (2)
Thesis Czts
8 pages
Statistics Notes
No ratings yet
Statistics Notes
16 pages
Science Form 1 Chapter 1 Note
No ratings yet
Science Form 1 Chapter 1 Note
15 pages
Download Complete (Ebook) THE GENERAL SCIENCE COMPENDIUM For IAS Prelims, CSAT Paper 1 UPSC & State PSC by Disha Experts ISBN 9789386323279, 9386323273 PDF for All Chapters
100% (2)
Download Complete (Ebook) THE GENERAL SCIENCE COMPENDIUM For IAS Prelims, CSAT Paper 1 UPSC & State PSC by Disha Experts ISBN 9789386323279, 9386323273 PDF for All Chapters
76 pages
Lec1 Overview
No ratings yet
Lec1 Overview
49 pages
CP Violation Thesis
100% (2)
CP Violation Thesis
6 pages
Theorical Lessons Analytical Chemistry (Pse 232) : Boda Emmanuel 2020/2021
No ratings yet
Theorical Lessons Analytical Chemistry (Pse 232) : Boda Emmanuel 2020/2021
9 pages
(Ebook) THE GENERAL SCIENCE COMPENDIUM For IAS Prelims, CSAT Paper 1 UPSC & State PSC by Disha Experts ISBN 9789386323279, 9386323273 download
100% (1)
(Ebook) THE GENERAL SCIENCE COMPENDIUM For IAS Prelims, CSAT Paper 1 UPSC & State PSC by Disha Experts ISBN 9789386323279, 9386323273 download
46 pages
Development of A Quality Assurance Process For The SoLid Experiment
No ratings yet
Development of A Quality Assurance Process For The SoLid Experiment
24 pages
Ie Physics Menu
No ratings yet
Ie Physics Menu
3 pages
Backward Causation in Complex Action Model: - Superdeterminism and Transactional Interpretations
No ratings yet
Backward Causation in Complex Action Model: - Superdeterminism and Transactional Interpretations
28 pages
03 2024 A.C. Theory and Electronics with Risk of Liability - Module 2 Uploaded Version 2024-01-08
No ratings yet
03 2024 A.C. Theory and Electronics with Risk of Liability - Module 2 Uploaded Version 2024-01-08
260 pages
Computational Physics: Dr. Suresh C. Pattar
No ratings yet
Computational Physics: Dr. Suresh C. Pattar
16 pages
Mx9s5sfx6b_Grade 12 Science (a Y 2024-25) Unit Test-2
No ratings yet
Mx9s5sfx6b_Grade 12 Science (a Y 2024-25) Unit Test-2
2 pages
Aspects of Quadratic Gravity
No ratings yet
Aspects of Quadratic Gravity
24 pages
Chemistry for Students: The Only Chemistry Study Guide You'll Ever Need to Ace Your Course
From Everand
Chemistry for Students: The Only Chemistry Study Guide You'll Ever Need to Ace Your Course
Oakridge Press
No ratings yet
04S1
No ratings yet
04S1
8 pages
Mark Scheme January 2004: Physics A
No ratings yet
Mark Scheme January 2004: Physics A
5 pages
Write True or False: First Summative Test Pagsusulit Sa Science First Quarter
No ratings yet
Write True or False: First Summative Test Pagsusulit Sa Science First Quarter
7 pages
2122 S6 Math Pre-Mock Exam Paper 2 - Ans
No ratings yet
2122 S6 Math Pre-Mock Exam Paper 2 - Ans
5 pages
Esse Aqui.1.7
No ratings yet
Esse Aqui.1.7
7 pages
Download full Pharmaceutical powder compaction technology 2nd ed Edition Metin Celik ebook all chapters
100% (4)
Download full Pharmaceutical powder compaction technology 2nd ed Edition Metin Celik ebook all chapters
81 pages
Cambridge International AS & A Level: Chemistry 9701/22 February/March 2022
No ratings yet
Cambridge International AS & A Level: Chemistry 9701/22 February/March 2022
9 pages
Chemistry 2
No ratings yet
Chemistry 2
4 pages
Current Electricity Worksheet
No ratings yet
Current Electricity Worksheet
8 pages
Nuclear War Survival Skills - Richard Beckwith
67% (3)
Nuclear War Survival Skills - Richard Beckwith
130 pages
Rodillo Chicago
No ratings yet
Rodillo Chicago
4 pages
XI Revision Questions
No ratings yet
XI Revision Questions
6 pages
Rizwan SB Sales Report
No ratings yet
Rizwan SB Sales Report
1 page
Electrical Circuit Analysis-II MID-1 Q.P - SET 2
No ratings yet
Electrical Circuit Analysis-II MID-1 Q.P - SET 2
3 pages
ĐỀ SỐ 16 - KEY
No ratings yet
ĐỀ SỐ 16 - KEY
5 pages
List of Test On Concrete Durability
No ratings yet
List of Test On Concrete Durability
4 pages
Water Pump and Cooling Fan
No ratings yet
Water Pump and Cooling Fan
2 pages
Menthol USP
No ratings yet
Menthol USP
2 pages
ME6 Installation and Operation 2020 PDF
No ratings yet
ME6 Installation and Operation 2020 PDF
28 pages
Enclosure - NF - NL - WSNF NL Enclosure OBSOLETE1
No ratings yet
Enclosure - NF - NL - WSNF NL Enclosure OBSOLETE1
2 pages
VI1603 Spring R2 Final
100% (2)
VI1603 Spring R2 Final
28 pages
Material Properties of Copper Alloys Containing
No ratings yet
Material Properties of Copper Alloys Containing
274 pages
Su92 MSCHW F22 040
No ratings yet
Su92 MSCHW F22 040
1 page
RNC Interface Relay Module
No ratings yet
RNC Interface Relay Module
8 pages
Removal of Impurities in Aluminum by Use of Fluxes
No ratings yet
Removal of Impurities in Aluminum by Use of Fluxes
5 pages
Year 2 Revision Maths
No ratings yet
Year 2 Revision Maths
13 pages
Science Notes - 2024-05-08T125417.607
No ratings yet
Science Notes - 2024-05-08T125417.607
2 pages
Science Department: Topic 7.1 - Discrete Energy & Radioactivity Topic 7.2 - Nuclear Reactions 12 Grade (Physics SL)
No ratings yet
Science Department: Topic 7.1 - Discrete Energy & Radioactivity Topic 7.2 - Nuclear Reactions 12 Grade (Physics SL)
8 pages
AT7 t1
No ratings yet
AT7 t1
13 pages
Metal Detector Using 555 Timer
No ratings yet
Metal Detector Using 555 Timer
4 pages