Business Inferential Statistics Lessons
Business Inferential Statistics Lessons
By
G. S. Msukwa
2023
Lesson One
Probability Distributions
The types of distributions are identified by their skewness. That is, some distributions are
symmetrical while others are skewed either ways (asymmetrical). These distributions fall under
two basic categories depending on their underlying variables, namely; discrete random
probability distributions and continuous random probability distributions.
Discrete variables are those variables which only assume finite sequenced numbers (ie, 5
people). Continuous variables are those that may take any value, including fractions of a unit
(ie, measuring weight; 2.5 kg, length etc.)
Probability distributions of discrete random variables can be presented in a frequency table and
frequency charts to show the relative strength or possibility of occurrence of particular
outcomes or values. In this case we consider two most relevant discrete random probability
distributions in economic and business setup.
Binomial Distributions
Firstly, let us consider a class of experiments which meet the following conditions. A binomial
distribution are outcomes of an experiment that satisfies all the four conditions below;
1. The experiment consists of (n) identical trials,
2. Two outcomes are possible on each trial (composed of a success and a failure).
3. The probability of the two outcomes do not change from one trial to the next.
4. The trials are independent (ie, the outcome of one trial does not affect the outcome of any
other trials).
Polynomials (multinomials) are distributions with specified number of trials (n), but where
more than two outcomes are possible from each experiment trial. The probability of the
outcomes do not change. The trials are also independent.
A Bernouli process is an experiment which satisfies condition: 2, 3 and 4 above.
f(x) = n! Px (1 – p)n – x
x!(n – x)!
That’s, the probability of obtaining value 0 or 1 or 2 etc, from any given trial is f(0), or f(1) or
f(2), respectively
Expected values and variance of the Binomial distribution are given as follows;
Poisson Distributions
then the occurrence follows a Poisson distribution where the probability of a Poisson random
variable is given by:
f(x) = λxe-λ
x!
here, x = number of occurrences in the interval
λ = mean or averages number of occurrences in an interval
e= the exponent (2.71828)
F(x) the probability of (x) occurrences in the interval
Expected values and variance of the Poisson distribution are given as follows;
E(x) = µ = λ
Var(x) = σ2 = λ
Illustrative examples;
These distributions are for variables which can take any values. For instance, length can be
measured continuously including fractions of units chosen (ie, 1.85m , 3.78km, etc). For this
reason the distributions are presented as continuous curves (the probability density function -
pdf ). The probability of occurrence of any value below a given value, above the value, or
between any two values is represented by area under the curve for that range.
4. Z = X-µ
σ
2
−z
f(z) = 1 e 2
√2 π
Note; use graphs and standard normal tables for determination of the probabilities (ie, the
area under the curve)
Some other normal probability distributions which do not fall under the standard normal
category may fall under the (student) t-distributions. Although the t-distribution is also
symmetrical the various shapes/sizes depend on the degrees of freedom (df). As the df
increase, the difference between the t-distribution and standard normal becomes smaller, since
the standard normal distribution is assumed to have infinite degrees of freedom (df = ∞ )
Exponential Distribution
This is a continuous distribution of time between events which occur continuously and
independently at a constants average rate. The pdf decreases at a constant rate with x values
(ie, service time). The probability of one variable x is the area under the curve corresponding to
an interval that variable assumes. See graph below;
Expected values and variance of the Binomial distribution are given as follows;
E(x) = µ = 1/µ
Var(x) = σ2 = 1/µ2
Illustrative examples
1.3 Other distributions
This is a continuous distribution of the sum of squares of random variables from samples.
A X2 distribution with k degrees of freedom (df) is the distribution of the sum of squares of k
independent standard normal random variables; ref, graph below:
In this distribution the pdf or the probability of any random variable x is given as;
k −x
2−1 2
x e
f (x) = k
k , for x ≥ 0
2 ⎾( )
2
2
Or f(x) = 0, otherwise
Rather than other distributions above, chi-square (X2) is mainly used as a test statistic for
inference tests of population variances from sample variances of a normally distributed
population of variables. It is also used as a test statistic for independence, test of goodness of fit
in multinomial, Poisson and normal distributions.
If a sample variance (s2) is a point estimator of the population variance (σ 2), in using the (s2) as
a basis for making inference about the population variance (σ 2), the sampling distribution of the
quantity (n – 1) s2/ σ2 is important.
When a simple random sample n is selected from a normally distributed population, the
sampling distribution of (n – 1)s2 , has a x2 distribution with (n – 1) degrees of freedom.
σ2
In other words, if z1, z2 ……zk are independent standard normal variables, then the sum of their
squares is distributed according to x2 with kdf.
The F distribution
The F distribution (after Ronald Fisher), is also a continuous probability distribution which arises
usually as a null distribution as a test statistic in analysis of variance (ANOVA) and other F tests.
It is a ratio between two estimates of an index. In ANOVA, it is a ratio of two population sample
variance. It is also used in regression analysis.
Refer also to the graphical presentation below and the illustration of the area under the curve
for α 0.05.
As the degrees of freedom increase the F distribution approaches the normal distribution. At df1
= 100, df2 = 200, the F distribution is sufficiently close to the normal distribution (becomes
asymptotic). We normally use F distribution tables to check up the probability values for
relevant tests.