0% found this document useful (0 votes)
45 views

Probability and Statistics

1) The document discusses probability density functions and how they provide information about the mean and variability of random variables. Finite data sets are used to estimate these density functions. 2) Measurements of variables introduce random errors that must be accounted for. Statistics help interpret data and estimate true mean values based on limited data points with random uncertainty. 3) Common statistical distributions like the normal, lognormal, and Poisson distributions are discussed as well as how they can be applied. Regression analysis is used to establish relationships between dependent and independent variables.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
45 views

Probability and Statistics

1) The document discusses probability density functions and how they provide information about the mean and variability of random variables. Finite data sets are used to estimate these density functions. 2) Measurements of variables introduce random errors that must be accounted for. Statistics help interpret data and estimate true mean values based on limited data points with random uncertainty. 3) Common statistical distributions like the normal, lognormal, and Poisson distributions are discussed as well as how they can be applied. Regression analysis is used to establish relationships between dependent and independent variables.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 28

Probability and Statistics

Chapter 4

Apr. 2020
1
Behavior of a random variable: Probability density
function => Provide exact information about its mean
value and variability.

The purpose of the measurements: Estimate the density


function based on the acquired limited data set.

Normal scatter of data about some central mean value:


Spatial distribution under nominally fixed operating
conditions + Random errors in MS & Meas.procedure.

2
Finite number of measurements of a variable => estimating
the behavior of an entire population of values.

Estimates based on a limited data set introduce another


random error into predicting the true value of the
measurement.

Statistics is a powerful tool used to interpret and present


data

FIND: Methods to estimate the true mean value based on a


limited number of data points and the random uncertainty;
3
were presented along with treatment of data curve fitting
4.2 STATISTICAL MEASUREMENT THEORY
From a statistical analysis of the data set an analysis of sources of
error that influence these data:

x’: The most probable estimate of based on the available data


𝑢𝑥ҧ the uncertainty interval at some probability level P%.
 Uncertainties are numbers that quantify the possible range of the
effects of errors.
 Uncertainty interval: Range about 𝑥ҧ within which we expect x'
to lie: Combines the uncertainty estimates of random &
systematic errors in the measurement of x.
4
Probability Density Functions

5
Histogram of the variable
 Plot the data of Table 4.1:
Consists of N individual measurements: 𝑥𝑖 , i = 1, 2, . . . , N
each measurement taken at random but under identical test
operating conditions.
 The measured values of this variable are plotted as data
points along a single axis

 The abscissa will be divided between the maximum and


minimum measured values x into K small intervals.
 𝑛𝑗 : number of times, measured value assumes a value within
an interval 𝑥 − 𝛿𝑥 ≤ 𝑥 ≤ 𝑥 + 𝛿𝑥 be plotted on the
6 ordinate.
Histogram of the variable
 Histogram: The resulting plot of 𝑛𝑗 versus x => view both the
tendency and the probability density of a variable.
 The ordinate can be nondimensionalized as 𝑓𝑗 − 𝑛𝑗 /N
converting the histogram into a frequency distribution.
 An estimate for the number of intervals K

𝐾 = 1.87 𝑁 − 1 0.40 +1

 As N becomes very large, a value of K: 𝐾 𝑁

7 good rule that 𝑛𝑗 >= 5 for at least one interval


Standard Statistical Distributions
Distribution Applications Density Function Shape

Normal

Lognormal

Rectangular

8
Standard Statistical Distributions
Distribution Applications Density Function Shape

Triangular

Poisson

Binormal

9
Histogram of the variable
 As N → ∞, the probability density function p(x), of the population
of variable x developed. In the limit as x → 0

 True mean value (central tendency) of a random variable


countinous
time

space

discrete data

10
Histogram of the variable
 Width of the density function reflects the data variation.
For a continuous random variable, the true variance

equivalent to

for discrete data

Standard deviation, : statistical parameter = Square root of


the variance
𝜎= 𝜎2

11
4.3 DESCRIBING THE BEHAVIOR OF A POPULATION
Normal (or gaussian) distribution

Given p(x), how can we predict the probability that any future measurement will fall ?

Probability that x will assume a value within the interval x ± x is given by the area
under p(x), which is found by integrating over the interval.

transformations

for a normal distribution,


p(x) is symmetrical about x'

12 Table 4.3
4.3 DESCRIBING THE BEHAVIOR OF A POPULATION
Probability that the ith measured value of x will have a value between x'±x is
2 x P(z) x 100% = P%.
This is written as

13
4.4 STATISTICS OF FINITE-SIZED DATA SETS

 N measurements of x (N repetitions), each measurement


represented by 𝑥𝑖 , i = 1, 2, . . . , N; N is a finite value.
 Finite-sized data sets provide the statistical estimates: Sample
mean value (ഥ 𝒙); Sample variance (𝒔𝟐𝒙 ), its outcome: Sample
standard deviation (𝒔𝒙 )

14
4.4 STATISTICS OF FINITE-SIZED DATA SETS

 Deviation of 𝑥𝑖 : 𝑥𝑖 − ഥ
𝑥
ഥ provides a most probable estimate of the true mean value x'
 𝒙
ഥrepresents a probable measure of the variation found in a data
 𝒙
set.
 The degrees of freedom, : Number of data points minus the
number of previously determined statistical parameters used in
estimating that value (in the sample variance:  = N-1)

15
4.4 STATISTICS OF FINITE-SIZED DATA SETS

 Relation between probability and ifinite statistics can be


extended to data sets of finite sample size with only some
modification.
 When data sets are finite or smaller than the population:
z variable does not provide a reliable weight estimate of the
true probability.
 Sample variance can be weighted to compensate for the
difference between the finite statistical estimates and the
statistics based on p(x):

16
4.4 STATISTICS OF FINITE-SIZED DATA SETS

Student's t variable: t,P provides a coverage factor used for finite data sets
(replaces z)

The interval ± t,P represents a precision interval, given at probability P%, within
which one should expect any measured value to fall.

Table 4.4 Student’s t Distribution

17
Standard Deviation of the Means
 Take N measurements of x under fixed operating conditions.
 Duplicated this procedure M times => calculate the different
ഥ & 𝒔𝟐𝒙 for each of the M data sets.
estimates 𝒙
ഥ obtained from M replications follows a normal distribution
 𝒙
defined by p(x)
 Amount of variation in 𝒙 ഥ
depend on: 𝒔𝟐𝒙 & sample size N
 Standard deviation of the means 𝑠𝑥ҧ

18
standard deviation & standard deviation of the means

𝑠𝑥ҧ represents a measure of how well a measured mean value


represents the true mean of the population.
the true
mean value
19
the true
mean value

 The interval ± 𝑡,𝑃 𝑆𝑥ҧ expresses the mean value, with


coverage factor t at the assigned probability, P%, expected
the true value of x to fall.
 This confidence interval is a quantfied measure of the random
error in the estimate of the true value of variable x.
 The value 𝑆𝑥ҧ represents the random standard uncertainty in the
mean value
 The value 𝑡,𝑃 𝑆𝑥ҧ represents the random uncertainty in the
mean value at P% cofidence due to variation in the measured
data set.

20
4.5 CHI-SQUARED DISTRIBUTION ( 2 )
 Develop the concept of the standard deviation of the means as a
precision indicator in the mean value; Estimate how well 𝑠𝑥2
predicts 𝜎 2
 Plotted the sample standard deviation for many data sets, each
having N data points: Probability density function 𝑝( 2 )

For the normal distribution,

21
Precision Interval in a Sample Variance

 Formulated by the probability statement:

a probability of
: level of significance
 Combining

 For example, the 95% precision interval by which 𝑠𝑥2 estimes


𝜎 2 is given by

this interval is bounded by the 2.5% and 97.5% 𝟐


levels of significance (for 95% coverage). Table 4.5 Values for (𝜶 )
22
4.6 REGRESSION ANALYSIS
 A measured variable is often a function of one or more
independent variables that are controlled during the
measurement.
 When the measured variable is sampled, these variables are
controlled, to the extent possible, as are all the other
operating conditions.
 One of these variables is changed and a new sampling is made
under the new operating conditions.
 Regression analysis to establish a functional relationship
between the dependent & independent variable

23
• The dependent variable yi,j consisting of N measurements, i = 1, 2, . . . , N
of y at each of n values of independent variable, xj ; j = 1, 2, . . . , n.
• This type of behavior is common during calibrations and in many types
measurements in which the dependent variable y is measured under controlled
values of x.
• Repeated measurements of y yield a normal distribution with variance 𝑠𝑦2 (𝑥𝑖 ), about
some mean value, 𝑦(𝑥ത 𝑗)

24
Least-Squares Regression Analysis

 For a single variable of the form y = f(x) provides an mth order


polynomial fit of the data in the form

 For n different values of the independent variable; the highest


order, m, of the polynomial: m  n -1.
 The values of the m coefficients 𝑎0 , 𝑎1 .... 𝑎𝑚 are determined
by the analysis.
 The method of least-squares attempts to minimize the sum
of the squares of the deviations between the actual data and
the polynomial fit of a stated order by adjusting the values of
the polynomial coefficients.
25
Least-Squares Regression Analysis

GOAL: reduce D to a minimum for a given order of polynomial.


The total differential of D is dependent on the m + 1 coefficients through

minimize the sum of the squares of the deviations, we want dD to be zero.

26
4.7 DATA OUTLIER DETECTION
 Data that lie outside the probability of normal variation
incorrectly offset the sample mean value estimate, inflate the
random uncertainty estimates, and influence a least-squares
correlation
 Chauvenet’s criterion, which identifies outliers having less
than a 1/2N probability of occurrence.
 Let suspected outlier in a data set of N values, the
data point is a potential outlier if

 The three-sigma test: data points that lie outside the range of
99.73% probability of occurrence, 𝑥ҧ ± 𝑡,99.7 𝑠𝑥 , as potential
outliers.
27
28

You might also like