Probability and Statistics
Probability and Statistics
Chapter 4
Apr. 2020
1
Behavior of a random variable: Probability density
function => Provide exact information about its mean
value and variability.
2
Finite number of measurements of a variable => estimating
the behavior of an entire population of values.
5
Histogram of the variable
Plot the data of Table 4.1:
Consists of N individual measurements: 𝑥𝑖 , i = 1, 2, . . . , N
each measurement taken at random but under identical test
operating conditions.
The measured values of this variable are plotted as data
points along a single axis
𝐾 = 1.87 𝑁 − 1 0.40 +1
Normal
Lognormal
Rectangular
8
Standard Statistical Distributions
Distribution Applications Density Function Shape
Triangular
Poisson
Binormal
9
Histogram of the variable
As N → ∞, the probability density function p(x), of the population
of variable x developed. In the limit as x → 0
space
discrete data
10
Histogram of the variable
Width of the density function reflects the data variation.
For a continuous random variable, the true variance
equivalent to
11
4.3 DESCRIBING THE BEHAVIOR OF A POPULATION
Normal (or gaussian) distribution
Given p(x), how can we predict the probability that any future measurement will fall ?
Probability that x will assume a value within the interval x ± x is given by the area
under p(x), which is found by integrating over the interval.
transformations
12 Table 4.3
4.3 DESCRIBING THE BEHAVIOR OF A POPULATION
Probability that the ith measured value of x will have a value between x'±x is
2 x P(z) x 100% = P%.
This is written as
13
4.4 STATISTICS OF FINITE-SIZED DATA SETS
14
4.4 STATISTICS OF FINITE-SIZED DATA SETS
Deviation of 𝑥𝑖 : 𝑥𝑖 − ഥ
𝑥
ഥ provides a most probable estimate of the true mean value x'
𝒙
ഥrepresents a probable measure of the variation found in a data
𝒙
set.
The degrees of freedom, : Number of data points minus the
number of previously determined statistical parameters used in
estimating that value (in the sample variance: = N-1)
15
4.4 STATISTICS OF FINITE-SIZED DATA SETS
16
4.4 STATISTICS OF FINITE-SIZED DATA SETS
Student's t variable: t,P provides a coverage factor used for finite data sets
(replaces z)
The interval ± t,P represents a precision interval, given at probability P%, within
which one should expect any measured value to fall.
17
Standard Deviation of the Means
Take N measurements of x under fixed operating conditions.
Duplicated this procedure M times => calculate the different
ഥ & 𝒔𝟐𝒙 for each of the M data sets.
estimates 𝒙
ഥ obtained from M replications follows a normal distribution
𝒙
defined by p(x)
Amount of variation in 𝒙 ഥ
depend on: 𝒔𝟐𝒙 & sample size N
Standard deviation of the means 𝑠𝑥ҧ
18
standard deviation & standard deviation of the means
20
4.5 CHI-SQUARED DISTRIBUTION ( 2 )
Develop the concept of the standard deviation of the means as a
precision indicator in the mean value; Estimate how well 𝑠𝑥2
predicts 𝜎 2
Plotted the sample standard deviation for many data sets, each
having N data points: Probability density function 𝑝( 2 )
21
Precision Interval in a Sample Variance
a probability of
: level of significance
Combining
23
• The dependent variable yi,j consisting of N measurements, i = 1, 2, . . . , N
of y at each of n values of independent variable, xj ; j = 1, 2, . . . , n.
• This type of behavior is common during calibrations and in many types
measurements in which the dependent variable y is measured under controlled
values of x.
• Repeated measurements of y yield a normal distribution with variance 𝑠𝑦2 (𝑥𝑖 ), about
some mean value, 𝑦(𝑥ത 𝑗)
24
Least-Squares Regression Analysis
26
4.7 DATA OUTLIER DETECTION
Data that lie outside the probability of normal variation
incorrectly offset the sample mean value estimate, inflate the
random uncertainty estimates, and influence a least-squares
correlation
Chauvenet’s criterion, which identifies outliers having less
than a 1/2N probability of occurrence.
Let suspected outlier in a data set of N values, the
data point is a potential outlier if
The three-sigma test: data points that lie outside the range of
99.73% probability of occurrence, 𝑥ҧ ± 𝑡,99.7 𝑠𝑥 , as potential
outliers.
27
28