0% found this document useful (0 votes)
7 views

Applied Statics - Merged

Uploaded by

l0rdofficial0001
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views

Applied Statics - Merged

Uploaded by

l0rdofficial0001
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 58

1

MA 3014 – S3 (’20 Batch) - 2022

1. Statistical Distributions

The distributions in the theory of statistics are classified mainly as discrete and continuous
distributions.

Discrete Distributions Continuous Distributions

Uniform Uniform

Bernoulli Normal & Standard Normal

Binomial t-Distribution

Poisson Chi-square

Negative Binomial F-Distribution

Geometric Exponential

Hyper geometric Gamma

Weibull

1.1 Discrete Distributions


Uniform Distribution

Probability Mass Function Cummulative Distribution Function

Bernoulli and Binomial distributions

If an experiment has two possible outcomes, “success” and “failure”, and their probabilities are,
respectively, 𝜃and (1 − 𝜃), then the variable of number of successes(𝑋), has a Bernoulli distribution
with 𝑝𝑚𝑓; 𝑓(𝑥) = 𝜃 (1 − 𝜃) ; 𝑥 = 0 𝑜𝑟 1.

The experiment consists of 𝑛 independent, repeated Bernoulli trials is said to be a binomial


experiment. The variable of number of successes(𝑋), then has a binomial distribution with
𝑛
𝑝𝑚𝑓; 𝑓(𝑥) = 𝜃 (1 − 𝜃) ; 𝑥 = 0,1, … , 𝑛
𝑥
2

MA 3014 – S3 (’20 Batch) - 2022


It can be proved that the mean and the standard deviation for a large sample from a binomial
distribution are given by

𝑥̅ = 𝑛𝑝 and σ= 𝑛𝑝𝑞;

where X is approximately distributed as Normal (Normal approximation to Binomial)

Poisson Distribution

In a Poisson experiment, the random occurrence of number of events over an interval (usually a time
interval) is observed. In the same experiment if the time between two events is observed, the
variable will theoretically follow a continuous distribution which will be discussed later.

The probability mass function of the r.v. X

𝑒 𝜆
𝑓(𝑥; 𝜆) = ; 𝑥 = 0,1,2, …
𝑥!

 𝜆 – The mean number of counts in the interval (>0)


 E(X) = V(X) = 𝜆

Negative Binomial distribution (Pascal distribution)

A negative binomial experiment

 The experiment consists of x repeated trials.


 Each trial results only in two outcomes, one a success and the other, a failure.
 The probability of success, denoted by p, is the same on every trial.
 The trials are independent; that is, the outcome on one trial does not affect the outcome on other
trials.
 The experiment continues until r successes are observed, where r is specified in advance.

Eg: Consider the statistical experiment of flipping a coin repeatedly and count the number of times
the coin lands on heads. Continue flipping the coin until it has Head 5 times on top. Then the number
of trials needed to have Head turned on 5 times (X), follows a negative binomial distribution.

X : 5 6 7 8 9 10 ……………………

P(X) : ? ? ? ? ? ? ……………………
3

MA 3014 – S3 (’20 Batch) - 2022

Notation and terminology

x : The number of trials required to produce r successes in a negative binomial


experiment.

r :The number of successes in the negative binomial experiment.

P :The probability of success on an individual trial.

Q : The probability of failure on an individual trial, 1-P.

n
Cr : The number of combinations of n things, taken r at a time.

Negative Binomial probability

b*(x; r, P) : - the probability that an x-trial negative binomial experiment results in the rth success
on the xth trial, when the probability of success of an individual trial is P.
b*(x; r, P) = x-1Cr-1 .Pr . (1 - P)x - r

Mean of Negative Binomial distribution

Variance of Negative Binomial distribution

NB: When dealing with negative binomial distribution, check on how the negative binomial random
variable is defined.

Alternative definitions can be:

 The negative binomial random variable is R, the number of successes before the binomial
experiment results in k failures. The mean of R is μR = kP/Q.
 The negative binomial random variable is K, the number of failures before the binomial
experiment results in r successes. The mean of K is μK = rQ/P.
4

MA 3014 – S3 (’20 Batch) - 2022


Geometric Distribution (A special case of Negative Binomial)

This is a special case of the negative binomial distribution, where the variable of interest is the number
of trials required for a single success or the first success. Thus, the geometric distribution is negative
binomial distribution with the number of successes (r) is equal to 1.

An example of a geometric distribution would be asking for the probability that the first head occurs on
the third flip. That probability is referred to as a geometric probability and is denoted by g(x; p). The
formula for geometric probability is

g(x; p) = bp.qx - 1

Mean of Geometric distribution

Variance of Geometric distribution

Hyper Geometric distribution

Hypergeometric Experiment

 A sample of size n is randomly selected without replacement from a population of N items.


 In the population, k items can be classified as successes, and N - k items can be classified as
failures.

Eg: Consider the statistical experiment of randomly selecting 2 marbles without replacement from an
urn of 10 marbles - 5 red and 5 green. The variable of interest is the number of red marbles selected.
This is a hyper geometric experiment.

Note : As binomial experiment requires that the probability of success be constant on every trial, the
above is not a binomial experiment. In the above experiment, the probability of a success changes on
every trial. Further that if the marbles were selected with replacement, the probability of success would
not change. It would be 5/10 on every trial. Then, this would be a binomial experiment.
5

MA 3014 – S3 (’20 Batch) - 2022

Notations and terminology


N : The number of items in the population.
k : The number of items in the population that are classified as successes.
n : The number of items in the sample.
x : The number of items in the sample that are classified as successes.
k
Cx: The number of combinations of k things, taken x at a time.

Hypergeometric probability

h(x; N, n, k): - the probability that an n-trial hypergeometric experiment results in exactlyx successes,
when the population consists of N items, k of which are classified as successes.

h(x; N, n, k) = [ kCx ] [ N-kCn-x ] / [ NCn ]

Mean of the Hypergeometric distribution

µx =nk / N

Variance of the Hypergeometric distribution

Vx = nk ( N - k )( N - n ) / [ N2 ( N - 1 ) ]
.
6

MA 3014 – S3 (’20 Batch) - 2022

1.2 Continuous Distributions


Normal Distribution and Standard Normal Distribution

;Normal pdf

;Standard Normal pdf

Cummulative density function

Exponential Distribution
7

MA 3014 – S3 (’20 Batch) - 2022

If a random variable X has the pdf


𝝀𝒙
𝒇(𝒙) = 𝝀𝒆 , 𝒙 ≥ 𝟎 𝒂𝒏𝒅 𝝀 > 0,
then it is said to have the exponential distribution with parameter 𝜆 and written as
𝑋~𝐸𝑥𝑝(𝜆).

The exponential distribution is often used to model the length of time until an event occurs.
The exponential distribution can be thought of as the continuous analogue of the geometric
distribution.
This parameter 𝝀 represents the “mean number of events per unit time” e.g. the rate of
arrivals or the rate of failures as same as in Poisson distribution.

𝜆 = 0.5(𝑠𝑜𝑙𝑖𝑑)

𝜆 = 1(𝑑𝑜𝑡𝑡𝑒𝑑)

𝜆 = 2(𝑑𝑎𝑠ℎ𝑒𝑑)

Applications
 Model inter arrival times (time between arrivals) when arrivals are completely random;
𝝀 = arrivals / hour
 Model service times; 𝝀 = services / minute
 Model the lifetime of a component that fails catastrophically (i.e. light bulb);
𝝀 = failure rate

Properties of the random variable X which has exponential distribution

1. It is closely related to the Poisson distribution – if X describes the time between two failures

then the number of failures per unit time has the Poisson distribution with parameter 𝝀, the
same.
8

MA 3014 – S3 (’20 Batch) - 2022

𝒙
2. The cdf is 𝑭𝑿 (𝒙) = 𝝀 ∫𝟎 𝒆 𝝀𝒚
𝒅𝒚 = 𝟏 − 𝒆 𝝀𝒙

𝟏
3. The 100(1 −∝)% percentile is 𝒙∝ = − 𝝀 𝐥𝐧 ∝

4. Mean 𝝁𝒙 = 𝟏⁄𝝀

5. Variance 𝑽𝒙 = 𝟏⁄𝝀𝟐

6. Moment Generating Function (mgf) 𝑀 (𝑡) = 𝝀⁄(𝝀 − 𝒕)

7. “Memoryless” property
For all s>= 0 and t >=0
P(X > s + t | X > s) = P(X > t)

Instance 1: If it is known that a component has survived s hours so far,the remaining


amount of time that it survives follows the same distribution as the original
distribution. It does not remember that it already has been used for s amount of time.

Instance 2:This means that the distribution of the waiting time to the next event
remains the same regardless of how long we have already been waiting. This only
happens when events occur (or not) totally at random, i.e., independent of past
history

Exercise :Suppose the life of an industrial lamp is exponentially distributed with failure rate
𝝀=1/3 (one failure every 3000 hours on the avg.) Determine the probability that
a) the lamp will last no longer than its mean life time. (constant for any 𝝀)
b) the lamp will last longer than its mean life time
c) the industrial lamp will last between 2000 and 3000 hours.
d) the lamp will last for another 1000 hours given that it is operating after 2500 hours.
Answer:
a) 𝑃(𝑋 ≤ 3)=
b) 𝑃(𝑋 > 3)=
c) 𝑃(2 ≤ 𝑋 ≤ 3)=
9

MA 3014 – S3 (’20 Batch) - 2022

d) 𝑃(𝑋 > 3.5|𝑋 > 2.5)=𝑃(𝑋 > 2.5 + 1|𝑋 > 2.5)=𝑃(𝑋 > 1)

Theorem :X has an exponential distribution iff X is a positive continuous r.v. and


P(X>s+t | X>s) = P(X>t) for all s,t> 0.
Proof:Omitted

Gamma distribution
Gamma distribution is more suitable to describe some of the real world applications when they
follow exponential patterns. The general command of a such probability density is given by

𝑓(𝑥) = 𝑘𝑥 𝑒 ; 𝑓𝑜𝑟 𝑥 > 0


0 ; 𝑒𝑙𝑠𝑒𝑤ℎ𝑒𝑟𝑒
where 𝛼 > 0 and 𝛽 > 0, and k must be such that the total area under the curve is equal to 1.

In evaluating k, using calculus theory, the Gamma function which only depends on 𝛼 is derived:

𝜏(𝛼) = 𝑦 𝑒 𝑑𝑦 𝑓𝑜𝑟 𝛼 > 0

The Gamma function follows the recursion formula


𝜏(𝛼) = (𝛼 − 1) 𝜏(𝛼 − 1);
𝜏(𝛼) = (𝛼 − 1)!

where 𝜏(1) = ∫ 𝑦 𝑒 𝑑𝑦 = 1 𝑎𝑛𝑑 𝜏 12 = √𝜋

Thus ∫ 𝑘𝑥 𝑒 𝑑𝑥 = 𝑘𝛽 𝜏(𝛼) = 1

A random variable X has a Gamma distribution has the probability density function

1
𝑥 𝑒 , 𝑓𝑜𝑟 𝑥 > 0
𝑔(𝑥; 𝛼, 𝛽) = 𝛽 𝜏(𝛼)
0 , 𝑒𝑙𝑠𝑒𝑤ℎ𝑒𝑟𝑒
Where 𝛼 > 0 and 𝛽 > 0.
 The mean 𝜇 = 𝛼𝛽 and 𝑉(𝑋) = 𝛼𝛽
 Observe the graphs of gamma functions for different pairs of values for 𝛼and 𝛽
10

MA 3014 – S3 (’20 Batch) - 2022

Exercise: In a certain city, the daily consumption of electric power in millions of kilowatt-hours
can be treated as a random variable having a Gamma distribution with 𝛼 = 3 and 𝛽 = 2.
(i) What is the average consumption of electric power per day by the city?
(ii) If the power plant of this city has a daily capacity of 12 million kilowatt-hours, what
is the probability that this power supply will be inadequate on any given day?
Answer:
(i) Average = 𝛼𝛽 = 3 ∗ 2 = 6
𝑥
∞ 1
(ii) 𝑃(daily consumption of electric power ≥ 12) = ∫12 3 𝑥3−1 𝑒−2 𝑑𝑥
2 𝜏(3)

1
=1− 𝑥 𝑒 𝑑𝑥
2 𝜏(3)
11

MA 3014 – S3 (’20 Batch) - 2022

Sampling distributions

Let’s draw all possible samples of size n from a given population of size 𝑁. Then consider
computing a statistic; the mean or a proportion or the standard deviation for each sample.

The probability distribution of this statistic is called a sampling distribution.


Variability of a Sampling Distribution

The variability of a sampling distribution is measured by its variance (or by its std. deviation).

This variability will depend on;

 𝑁 : The number of observations in the population.


 𝑛 : The number of observations in the sample.
 The method used to select the samples at random.

Note: If 𝑁 is much larger than 𝑛, then𝑛/𝑁is fairly small and the sampling distribution has
roughly the same sampling error, irrespective of whether sampling is done with or
without replacement.

If sampling is done without replacement and the sample represents a significant fraction
(say, 1/10) of the population size, the sampling error will be clearly smaller.

The Central Limit Theorem


The Central Limit Theorem (CLT) states that the probability distribution of any statistic (or the
sampling distribution for any statistic) will be normal or nearly normal, if the sample size is
“large enough”. Thus the CLT permits approximate calculations for a variety of distributions.
Many statisticians say that a sample size of 30 is “large enough” as a rule of thumb.
These are some other instances in which the sample size can be considered as large enough.
 The population distribution is normal.
 The sampling distribution is symmetric, unimodal, without outliers, and the sample
size is 15 or less.
 The sampling distribution is moderately skewed, unimodal, without outliers, and the
sample size is between 16 and 40.
 The sample size is greater than 40, without outliers.
12

MA 3014 – S3 (’20 Batch) - 2022

1. Sampling Distribution of the Mean

Let us take 𝑥̅ as the mean of a sample of size 𝑛. Suppose


n there are𝑚 number of such samples drawn from this large
n n population.

If you take the average of the sample means by


n N n
∑ ̅ ∑ ̅
; 𝝁 (popln mean)
= 𝝁𝒙 = µ
n n
And, the standard error of the sampling distribution
n
𝜎 ̅ =𝜎̅= 𝜎 (1 𝑛 − 1 𝑁 ) = 𝜎 (1 𝑛) as N→∝

Thus, 𝝈𝒙 = 𝝈
√𝒏

Therefore, we can specify the sampling distribution of the mean 𝑥 ~𝑁(𝜇 ̅ , 𝜎 ̅ )as

𝒙 ~𝑵(𝝁 , 𝝈 𝒏); whenever two conditions are met:


𝟐

 The population is normally distributed, or the sample size is sufficiently large.


 The population standard deviation σ is known.
13

MA 3014 – S3 (’20 Batch) - 2022

2. Sampling Distribution of the Proportion

Let the probability of getting a success is P; and the probability of a failure is Q in a population.
From this population of size 𝑁, suppose that we draw all possible samples of size n. And finally,
within each sample, suppose that we determine the proportion of successes p and failures q. In
this way, we create a sampling distribution of the proportion.

Let us take p as proportion of successes in a sample of size 𝑛.

Suppose there are𝑚 number of such samples drawn from this large population.

If you take the mean of the sample proportions by


∑ ∑
; = 𝝁𝒑 = 𝑷 (Population proportion of success)

And, the standard error of the sampling distribution

𝜎 =𝜎 = 𝜎 (1 𝑛 − 1 𝑁 ) = 𝑃𝑄(1 𝑛 − 1 𝑁) = 𝑃𝑄/𝑛as N→∝

𝑷𝑸
Thus, 𝝈𝒑 = 𝒏

Therefore, we can specify the sampling distribution of the proportion 𝑝~𝑁(𝜇 , 𝜎 ) as

𝒑~𝑵(𝑷 ,
𝑷𝑸 ; whenever the sample size is sufficiently large and the population probability of
𝒏)
success (P) is known.

Example:

1. Suppose that a biased coin has probability p=0.4of heads. In 1000 tosses, what is the
probability that the number of heads exceeds 410?

1. Find the probability that of the next 120 births, no more than 40% will be boys. Assume
equal probabilities for the births of boys and girls. Assume also that the number of births in
the population (N) is very large, essentially infinite.
14

MA 3014 – S3 (’20 Batch) - 2022

Exercises:

1. A true-false examination has 48 questions. Jane has probability 3/4 of answering a question
correctly. Ama just guesses on each question. A passing score is 30 or more correct answers.
Compare the probability that Jane passes the exam with the probability that Ama passes it.
Jane’s score has distribution B(48,0.75), so the probability that Jane’s score is 30 or more is
1-P(X<=29) = 0.9627. In case your calculator doesn’t give an answer, you will have to use a
normal approximation to the Binomial distribution (based on the Central Limit Theorem)

2. A restaurant feeds 400 customers per day. On the average 20 percent of the customers order
apple pie.
(a) Give a range for the number of pieces of apple pie ordered on a given day such that you
can be 95 percent sure that the actual number will fall in this range.
(b) How many customers must the restaurant have, on the average, to be at least 95 percent
sure that the number of customers ordering pie on that day falls in the 19 to 21 percent
range?

3. A rookie is brought to a baseball club on the assumption that he will have a 0.3 batting
average. (Batting average is the ratio of the number of hits to the number of times at bat.) In
the first year, he comes to bat 300 times and his batting average is 0.267. Assume that his at
bats can be considered Bernoulli trials with probability 0.3 for success. Could such a low
average be considered just bad luck or should he be sent back to the minor leagues?
15

MA 3014 – S3 (’20 Batch) - 2022

3) Student’s t Distribution

A particular form of the t distribution is determined by its degrees of freedom. The “degrees of
freedom” refers to the number of independent observations in a set of data.

In general, the degrees of freedom of an estimate of a parameter is equal to the number of


independent scores that go into the estimate minus the number of parameters used as
intermediate steps in the estimation of the parameter itself (which, in sample variance, is one,
since the sample mean is the only intermediate step).

Lane, David M.."Degrees of


Freedom".HyperStatOnline.StatisticsSolutions.http://davidmlane.com/hyperstat/A42408.html.
Retrieved 2008-08-21.

Suppose we have a simple random sample of size n drawn from a Normal population with mean
𝜇 and standard deviation 𝜎. Let 𝑥̅ denote the sample mean and s, the sample standard deviation.
16

MA 3014 – S3 (’20 Batch) - 2022


𝒙 𝝁
Then the quantity 𝒕 = 𝒔 has a t distribution with n-1 degrees of freedom.
√𝒏

The 𝑡 score produced by this transformation can be associated with a unique cumulative
probability. This cumulative probability represents the likelihood of finding a sample mean less
than or equal to 𝑥̅ , given a random sample of size n.

The notation 𝑡 represents the t-score that has a cumulative probability of (1 - α).

Example: t0.05 = 2.92, then t0.95 = -2.92 for df=3

Properties of the t Distribution

 The mean of the distribution is equal to 0 .


 The variance is equal to v / ( v - 2 ), where v is the degrees of freedom and v> 2.
 The variance is always greater than 1, although it is close to 1 when there are many
degrees of freedom. With infinite degrees of freedom, the t distribution is the same as the
standard normal distribution.

When to use the t Distribution

The t distribution can be used with any statistic having a bell-shaped distribution (i.e.,
approximately normal). i.e. when the population size is large but the sample sizes are small and
the standard deviation of the population is unknown t-Distribution can be applied.

𝐸𝑥𝑎𝑚𝑝𝑙𝑒: 𝒕 = (𝒑 − 𝑷)/ (𝑷𝑸 ⁄ 𝒏)has a t distribution with n-1 degrees of freedom

When not to use the t-distribution

The t distribution should not be used with small samples from populations that are not
approximately normal.
17

MA 3014 – S3 (’20 Batch) - 2022

Example:

1. A random sample of 12 observations from a normal population with mean 48 produced the
following
Estimates: 𝑥̅ = 47.1and 𝑠 = 4.7. Find the probability of getting a sample of the same size
with its mean less than or equal to the population mean.

2. The MD of Orrange light bulb manufactures claims that an average of their light bulbs lasts
300 days. An investigator randomly selects 15 bulbs for testing and those bulbs last an
average of 290 days, with a standard deviation of 50 days. Assuming MD’s claim as true,
determine the probability that 15 randomly selected bulbs would have an average life of no
more than 290 days?
18

MA 3014 – S3 (’20 Batch) - 2022

4. Chi-square Distribution

The chi-square statistic can be calculated from a sample of size 𝑛drawn from a population,
which is normal, using the following equation:

(𝑛 − 1)𝑠
𝜒 = 𝜎

When sampling is done for an infinite number of times, and by calculating the chi-square statistic
for each sample, the sampling distribution for the chi-square statistic can be obtained. It is then
called the chi-square distribution.

The chi-square distribution also depends on the degrees of freedom; (𝑛 − 1).

Properties of the chi-square distribution:

 The mean of the distribution is equal to the number of degrees of freedom: μ = v.


 The variance is equal to two times the number of degrees of freedom: σ2 = 2v
 When the degrees of freedom are greater than or equal to 2, the maximum value for
𝑓(𝑥), 𝑡ℎ𝑒 𝑝𝑑𝑓 𝑜𝑓 𝑐ℎ𝑖 − 𝑠𝑞𝑢𝑎𝑟𝑒 occurs.
19

MA 3014 – S3 (’20 Batch) - 2022


 As the degrees of freedom increase, the chi-square curve approaches a normal
distribution.

Cumulative Probability of the Chi-Square Distribution

The chi-square distribution is constructed so that the total area under the curve is equal to 1. The
probability that the value of a chi-square statistic will fall between 0 and A; 𝑃(𝜒 ≤ 𝐴) is
illustrated by the following diagram.

Using the following Chi-Square Distribution table, one can find thecritical 𝜒 value, when the
probability of exceeding the critical value is given.
20

MA 3014 – S3 (’20 Batch) - 2022

Example: My Cell company has developed a new cell phone battery. On average, the battery
lasts 60 minutes on a single charge. The standard deviation is 5 minutes. Suppose the
manufacturing department runs a quality control test. They randomly select 10 batteries. The
standard deviation of the selected batteries is 6 minutes.

a) What is the chi-square statistic which represents this test?

b) What is the probability that the standard deviation of any sample of size 10 would be greater
than 6 minutes?

5. F Distribution

The distribution of all possible values of the f statistic is called an F distribution, with v1 = n1 -
1 and v2 = n2 - 1 degrees of freedom. The f statistic, also known as an f value, is a random
variable that has an F distribution.

How to compute an f statistic:

 Select a random sample of size n1 from a normal population, having a standard deviation
equal to σ1.
 Select an independent random sample of size n2 from a normal population, having a
standard deviation equal to σ2.
 The f statistic is the ratio of s12/σ12 and s22/σ22.
21

MA 3014 – S3 (’20 Batch) - 2022

The following equations are commonly used in equivalent to an f statistic:

f(v1, v2) = [ s12/σ12 ] / [ s22/σ22 ]


f(v1, v2) = [ s12. σ22 ] / [ s22. σ12 ]

f(v1, v2) = [ χ21 / v1 ] / [ χ22 / v2 ]


f(v1, v2) = [ χ21.v2 ] / [ χ22.v1 ]

The curve of the F distribution depends on the degrees of freedom, v1 and v2.

Properties of the F distribution:

 The mean of the distribution is equal to v2 / ( v2 - 2 ) for v2> 2.


 The variance is equal to [ 2v22( v1 + v2 - 2 ) ] / [ v1( v2 - 2 )2( v2 - 4 )] for v2> 4.
22

MA 3014 – S3 (’20 Batch) - 2022

Cumulative Probability of the F Distribution

This cumulative probability represents the likelihood that the f statistic is less than or equal to a
specified value.

F-distribution table can be used to find the value of an f statistic having a cumulative probability
of (1 - α); represented by fα.

Thus, f0.05(v1, v2) refers to value of the f statistic having a cumulative probability of (1-0.05)=
0.95, with v1and v2degrees of freedom.

Example:

Suppose a sample of 11 of cows was selected at random from a population of them having the
population standard deviation of their weight is 5 kg and the estimated sample sd is 4.5 kg.
Another sample of size 7 of bulls was taken in a similar way with their population sd is 3.5 kg
and sample sd is 4 kg.

a) Compute an f-statistic.

b) Determine the associated cumulative probability by finding an approximate f-value to the


above answer from the f-tables available for different significance levels ( 𝛼 ).

c) Interpret the probability you found.

Reference for f-table: http://www.socr.ucla.edu/Applets.dir/F_Table.html

Upgrade your knowledge by:

 Finding the pdf ‘s of the above sampling distributions.


 Studying the patterns of cdf ‘s of the above sampling distributions.
23

MA 3014 – S3 (’20 Batch) - 2022

Confidence Intervals

A confidence interval will give you a range of values for a given population parameter, within which the
parameter falls in 100(1-α)% of the time.

In general a Confidence Interval (CI) is

𝑺𝒕𝒂𝒕𝒊𝒔𝒕𝒊𝒄 ± 𝒎𝒂𝒓𝒈𝒊𝒏𝒆 𝒐𝒇 𝒆𝒓𝒓𝒐𝒓 𝒂𝒕 𝟏𝟎𝟎(𝟏 − 𝜶)% 𝒄𝒐𝒏𝒇𝒊𝒅𝒆𝒏𝒄𝒆 𝒍𝒆𝒗𝒆𝒍

Confidence interval for mean and variance

The following table includes the standard errors of some statistics to help you in finding the confidence
intervals for the respective population parameters.

Statistic Standard Error (SE)

Sample mean, 𝑥 s/√n

Sample proportion, p √ [ p(1 - p) / n ]

Difference between means, 𝑥̅ − 𝑥̅ √ [ s21 / n1 + s22 / n2 ]

Difference between proportions, p1 - p2 √ [ p1(1-p1) / n1 + p2(1-p2) / n2 ]

𝟏𝟎𝟎(𝟏 − 𝜶)% confidence interval for population mean (µ)

𝒙 ± 𝒁𝜶 (𝑺𝑬)
𝟐

Margin of Error

𝟏𝟎𝟎(𝟏 − 𝜶)% confidence interval for population proportion of success (P)

𝒑 ± 𝒁𝜶 (𝑺𝑬)
𝟐
24

MA 3014 – S3 (’20 Batch) - 2022


𝟏𝟎𝟎(𝟏 − 𝜶)% confidence interval for population variance(𝝈𝟐 )

You know that;

(𝑛 − 1)𝑠 (𝑛 − 1)𝑠
𝐶ℎ𝑖 − 𝑠𝑞𝑢𝑎𝑟𝑒 𝑠𝑡𝑎𝑡𝑖𝑠𝑡𝑖𝑐 = 𝑎𝑛𝑑 ~𝜒2( )
𝜎 𝜎

(𝑛 − 1)𝑠
Thus, 𝜒2 / ,( ) ≤ ≤ 𝜒2 / ,( )
𝜎

= 𝜒2 1
,( )/ (𝑛 − 1)𝑠 ≤
𝜎
≤ 𝜒2 ,( ) /(𝑛 − 1)𝑠

= (𝑛 − 1)𝑠 /𝜒2 ,( ) ≤ 𝜎 ≤ (𝑛 − 1)𝑠 /𝜒2 ,( )

Computational Exercise

Breakdown voltage is a characteristic of an insulator that defines the maximum voltage difference that
can be applied across the material before the insulator collapses and conducts. In solid insulating
materials, this usually creates a weakened path within the material by creating permanent molecular or
physical changes by the sudden current. Within rarefied gases found in certain types of lamps,
breakdown voltage is also sometimes called the "striking voltage". [Wikipedia]
The breakdown voltage of a material is observed on 17 experimental units as it is not a definite value
because it is a form of failure. Thus we have n = 17 and 𝑠 = 137324:3. Find the 95% confidence interval
for 𝜎 , to describe more about the population variance (variance of the breakdown voltage of the
material)?
25

MA 3014 – S3 (’20 Batch) - 2022

Hypothesis Testing
A statistical hypothesis is an intelligent educated guess/assumption about a population parameter,
which may or may not be true. There are two forms of statistical hypotheses.

 Null hypothesis: This is denoted by H0, is usually the hypothesis that sample observations result
purely from chance.

 Alternative hypothesis: This is denoted by H1 or Ha, is the hypothesis that sample observations
are influenced by some non-random cause.

The process of hypothesis testing needs

 A hypothesis on a population parameter


 A test statistic under the null hypothesis
 The p value of the test statistic
 Decision rule based on a significance level

Decision Errors

𝐑𝐞𝐣𝐞𝐜𝐭 𝑯𝟎 𝐃𝐨 𝐧𝐨𝐭 𝐫𝐞𝐣𝐞𝐜𝐭 𝑯𝟎

𝑯𝟎 𝐓𝐫𝐮𝐞 Type I error (𝜶)

𝑯𝟎 𝐅𝐚𝐥𝐬𝐞 Type II error (𝜷)

 Significance level =𝜶=P(Type I error)


 Power of the test = 1- 𝜷

Rejection Criterion for null hypothesis

 P-value: The strength of evidence in support of a null hypothesis is measured by the P-value.

 Region of rejection in One-Tailed and Two-Tailed Tests


26

MA 3014 – S3 (’20 Batch) - 2022


Mean test (One sample)

Null Alternative Test Statistic, the type of test & rejection criterion
Hypothesis Hypothesis

𝑍 = (𝑥 − 𝜇)⁄𝜎 ; 𝜎 known; testing on a single sample Two tail test


point
𝐻 :𝜇 =𝐴

𝑍= ; testing on a single sample mean


𝐻 :𝜇 ≠𝐴
Normal √

Population
has mean “A”
𝑇 = (𝑋 − 𝜇)⁄𝑠 or 𝑇 = ; 𝜎 unknown

𝐻 :𝜇 ≤𝐴 𝐻 :𝜇 >𝐴 One tail test

𝐻 :𝜇 ≥𝐴 𝐻 :𝜇 <𝐴

Examples:
1. Bon Air Elementary School has 300 students. The principal of the school thinks that the average IQ
of students at Bon Air is at least 110. To prove her point, she administers an IQ test to 20 randomly
selected students. Among the sampled students, the average IQ is 108 with a standard deviation of
10. Based on these results, should the principal accept or reject her original hypothesis? Assume a
significance level of 0.01.
2. An inventor has developed a new, energy-efficient lawn mower engine. He claims that the engine
will run continuously for 5 hours (300 minutes) on a single gallon of regular gasoline. Suppose a
simple random sample of 50 engines is tested. The engines run for an average of 295 minutes, with
a standard deviation of 20 minutes. Test the null hypothesis that the mean run time is 300 minutes
against the alternative hypothesis that the mean run time is not 300 minutes. Use a 0.05 level of
significance. (Assume that run times for the population of engines are normally distributed.)
27

MA 3014 – S3 (’20 Batch) - 2022


Mean Test (Two Sample)

Hypotheses Test Statistic, the type of test & rejection criterion

When the two population variances are known and not equal

(𝑥 − 𝑥 ) − (𝜇 − 𝜇 )
𝑍=
+

When the two population variances are known and equal

When the two population variances are unknown but equal

𝐻 :𝜇 = 𝜇

Two tail test


𝐻 :𝜇 ≠ 𝜇

When the two population variances are unknown and not equal

df = (s12/n1 + s22/n2)2 /

{ [ (s12 / n1)2 / (n1 - 1) ] + [ (s22 / n2)2 / (n2 - 1) ] }

or smaller of n1 - 1 & n2 - 1
28

MA 3014 – S3 (’20 Batch) - 2022


Proportion Tests

Null Alternative Test Statistic, the type of test & rejection criterion
Hypothesis Hypothesis

Two tail test

One tail test

One tail test


29

MA 3014 – S3 (’20 Batch) - 2022


Paired t-test

Example:

The weights of 9 obese women before and after 12 weeks on a very low calorie diet were as follows:

Before After Difference


117.3 83.3 -34.0
111.4 85.9 -25.5
98.6 75.8 -22.8
104.3 82.9 -21.4
105.4 82.3 -23.1
100.4 77.7 -22.7
81.7 62.7 -19.0
89.5 69.0 -20.5
78.2 63.9 -14.3
Test whether the expected weight loss is at least 20kg for obese women after the treatment of this low-
calorie diet for 12 weeks. Use 5% significance level.
30

MA 3014 – S3 (’20 Batch) - 2022

Variance Tests

Acknowledged http://www.xycoon.com/

Example:

1. The data 159.9, 187.2, 180.1, 158.1, 225.5, 163.7, and 217.3 consists of the weights, in pounds, of a
random sample of seven individuals taken from a population that is normally distributed. The
variance of this sample is given as 753.04.
2 2
Test the null hypothesis H0: = 750.0 against the alternative hypothesis H1: 750.0 at a level of
significance of 0.3.

2. Students have collected the data 27, 29, 22, 21, 26, 28, 24, and 29 from one population and the data
19, 18, 24, 18, 22, and 15 from another. The variance of the first sample is 9.64286 and the variance
of the second sample is 10.2667. The ratio of the first variance to the second is 0.939239. Test the
null hypothesis that the two variances are equal, H0: 12 / 22 = 1, against the alternative hypothesis
that the two variances are not equal, H1: 12 / 22 1, where 12 is the variance of the first
population and 22 is the variance of the second population, at a significance level of .10.
31

MA 3014 – S3 (’20 Batch) - 2022

Exercises on hypothesis testing:

1. An insurance company is reviewing its current policy rates. When originally setting the rates
they believed that the average claim amount was $1,800. They are concerned that the true
mean is actually higher than this, because they could potentially lose a lot of money. They
randomly select 40 claims, and calculated a sample mean of $1,950. Assuming that the standard
deviation of claims is $500, and set 𝛼= 0.05, test to see if the insurance company should be
concerned.

2. Trying to encourage people to stop driving to campus, the university claims that on average it
takes people 30 minutes to find a parking space on campus. I do not think it takes so long to find
a spot. In fact I have a sample of the last five times I drove to campus, and I calculated 𝑥̅ = 20.
Assuming that the timeit takes to find a parking spot is normal, and that s2= 6 minutes, then
perform a hypothesis test with level𝛼= 0.10 to see if my claim is correct.

3. A sample of 40 sales receipts from a grocery store has 𝑥̅ = $137 and s2= $30.2. Use these values
to test whether or not the mean value of a receipt at the grocery store is different from $150.

4. The actual proportion of families in a certain city who own, rather than rent their home is 0.70.
If 84 families in this city are interviewed at random and their response to the question of
whether they own their home, are recorded. 61 of them have responded saying that they own
the home. Using a suitable test statistic test the claim that the population proportion of owning
a home is 0.7.
32

MA 3014 – S3 (’20 Batch) - 2022

Chi-square tests
1. Chi-square test of Association

2× 𝟐 Contingency Table

The outcome of a certain IQ test is tabularized as follows.

Pass Fail Total

Male O=28 E= O=12 E= 40

Female O=34 E= O=26 E= 60

Total 62 38

The two variables here are “Gender of Candidate” and “Results of the Candidate”.

When you have two categorical variables from the same population, you may test whether there
is a significant association between the two variables using the Chi-square Test.

Hypothesis

𝐻 ∶ 𝑇ℎ𝑒𝑟𝑒 𝑖𝑠 𝑛𝑜 𝑟𝑒𝑙𝑎𝑡𝑖𝑜𝑛𝑠ℎ𝑖𝑝 𝑏𝑒𝑡𝑤𝑒𝑒𝑛 𝑔𝑒𝑛𝑑𝑒𝑟 𝑎𝑛𝑑 𝑟𝑒𝑠𝑢𝑙𝑡𝑠

𝐻 ∶ 𝑇ℎ𝑒𝑟𝑒 𝑖𝑠 𝑎 𝑠𝑖𝑔𝑛𝑖𝑓𝑖𝑐𝑎𝑛𝑡 𝑟𝑒𝑙𝑎𝑡𝑖𝑜𝑛𝑠ℎ𝑖𝑝 𝑏𝑒𝑡𝑤𝑒𝑒𝑛 𝑔𝑒𝑛𝑑𝑒𝑟 𝑎𝑛𝑑 𝑟𝑒𝑠𝑢𝑙𝑡𝑠

Test Statistic

(𝑂 − 𝐸 )
𝜒 = ~ 𝜒 , % ; 𝑑𝑓 = (𝑟 − 1)(𝑐 − 1)
𝐸
Under H
×
Expected frequency =

𝜒 =

𝜒 , % = 3.84

Decision
33

MA 3014 – S3 (’20 Batch) - 2022

𝒉 × 𝒌 Contingency Table

Example: A survey of 200 families, known to the regular television viewers was undertaken.
They were asked which of the TV channels they watched most during a common week, and the
observations are as follows.

TV channel Region
watched most
North East South West

1 19 16 42 23

2 6 11 26 7

3 15 3 12 10

Test the hypothesis that there is no association between the TV channel watched most and the
Region, using the Chi-square test.

2. Chi-square Goodness of Fit test

Example 1

From a list of 500 digits the occurrence of each distinct digit is observed. Test at 5% significance
level, whether the sequence is a random sample from the Uniform distribution.

Digit 0 1 2 3 4 5 6 7 8 9

Frequency 40 58 49 53 38 56 61 53 60 32
34

MA 3014 – S3 (’20 Batch) - 2022

Example 2

The table below gives the number of heavy rainstorms reported by 330 weather stations over a
one year period.
a) Find the expected frequencies of rainstorms given by the Poisson distribution having the
same mean and the total as the observed distribution.

b) Use the Chi-square test to check the adequacy of the Poisson distribution to model these
data.

# rainstorms 0 1 2 3 4 5 More
than 5

# weather stations 102 114 74 28 10 2 0


35

MA 3014 – S3 (’20 Batch) - 2022

Moments and Moment Generating Function


Moments

In Statistics, the mathematical expectation is called the moments of the distribution of a


random variable.

Definition(rth moment of ar.v.)

The rth moment of a random variable 𝑋 denoted by 𝝁𝒓 is the expected value of the random
variable’s rth power; i.e.𝑬(𝑿𝒓 ).

𝐹𝑜𝑟 𝑟 = 1,2,3, …

𝜇 = 𝐸(𝑋 )= ∑ 𝑥 𝑃(𝑥) ; when 𝑋 is discrete



𝜇 = 𝐸(𝑋 )= ∫ ∝
𝑥 𝑓(𝑥)𝑑𝑥 ; when 𝑋 is continuous

Special Application: 𝝁𝟏 = 𝑬(𝑿𝟏 ) = 𝑬(𝑿) = 𝝁 (mean of r.v. X)

Definition (rth moment about the mean of ar.v.)

The rth moment about the mean of a random variable 𝑋 denoted by 𝝁𝒓 is the expected value of
(𝑋 − 𝜇) ; i.e.𝑬[(𝑿 − 𝝁)𝒓 ].

𝐹𝑜𝑟 𝑟 = 1,2,3, …

𝜇 = 𝐸[(𝑋 − 𝜇) ]= ∑(𝑥 − 𝜇) 𝑃(𝑥) ; when 𝑋 is discrete



𝜇 = 𝐸[(𝑋 − 𝜇) ]= ∫ ∝
(𝑥 − 𝜇) 𝑓(𝑥)𝑑𝑥 ; when 𝑋 is continuous

Special Application: 𝝁𝟐 = 𝑬[(𝑿 − 𝝁)𝟐 ] = V(𝑿) = 𝝈𝟐 (variance of r.v. X)

Theorem:

𝝈𝟐 = 𝝁𝟐 − 𝝁𝟐 ; i.e. 𝑽(𝑿) = 𝑬(𝑿𝟐 ) − [𝑬(𝑿)]𝟐

Proof:
36

MA 3014 – S3 (’20 Batch) - 2022

Moment Generating Function (MGF)

 The moments of most distributions can be determined directly by evaluating the respective
integrals or sums.
 MGF is an alternative procedure, which sometimes provides considerable simplifications to
find the moments.
 MGF can be used to find the expected value of ar.v. and its variance.

Definition

𝑀 (𝑡)is the value which the function 𝑀 assumes for the real variable (𝑡).

The MGF of ar.v. X is given by;

𝑀 (𝑡) = 𝐸(𝑒 ) = ∑ 𝑒 𝑃(𝑥); when𝑋 is discrete



𝑀 (𝑡) = 𝐸(𝑒 ) = ∫ ∝
𝑒 𝑓(𝑥)𝑑𝑥; when𝑋 is continuous

(𝑡𝑥) (𝑡𝑥) (𝑡𝑥)


𝑤ℎ𝑒𝑟𝑒, 𝑀 (𝑡) = 𝐸(𝑒 ) = 𝐸[1 + 𝑡𝑥 + + + ⋯+ +⋯
2! 3! 𝑟!
𝑡 𝐸(𝑥 ) 𝑡 𝐸(𝑥 ) 𝑡 𝐸(𝑥 )
𝑀 (𝑡) = 𝐸(𝑒 ) = 1 + 𝑡𝐸(𝑥) + + + ⋯+ +⋯
2! 3! 𝑟!

𝑑 𝑀 (𝑡)
𝑎𝑛𝑑, 𝑀 (𝑡) =
𝑑𝑡

𝑑 𝐸(𝑒 ) 2𝑡 𝐸(𝑥 ) 3𝑡 𝐸(𝑥 ) 𝑟𝑡 𝐸(𝑥 )


= = 0 + 𝐸(𝑥) + + + ⋯+ +⋯
𝑑𝑡 2! 3! 𝑟!
𝑑 𝑀 (𝑡)
𝑀 (𝑡) =
𝑑𝑡

𝑑 𝐸(𝑒 ) 2𝐸(𝑥 ) 6𝑡 𝐸(𝑥 ) 𝑟(𝑟 − 1_𝑡 𝐸(𝑥 )


= =0+0+ + + ⋯+ +⋯
𝑑𝑡 2! 3! 𝑟!
Properties of the MGF

1. 𝑀 (𝑡)| = 𝐸(𝑋)= 𝜇
2. 𝑀 (𝑡)| = 𝐸(𝑋 )
3. 𝑉(𝑋) = 𝑀 (𝑡)| − (𝑀 (𝑡)| )
37

MA 3014 – S3 (’20 Batch) - 2022

Example 1

Find the MGF and E(X) and V(X) for the r.v.X whose pdf is given by

𝑒 , 𝑓𝑜𝑟 𝑥 > 0
𝑓(𝑥) =
0, 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒

Example 2

Suppose Y~ Bin (n,p). Find E(X) and V(X) using its MGF.

Exercises

1. Let Y be a continuous r.v with pdff(y)=2e-3y; y≥ 0, Find the mean and the variance of Y.
2. Given that the probability distribution of a r.v. is 1/8 𝐶 for r=1,2,3. Find the MGF, mean
and variance for this random variable.
38

MA 3014 – S3 (’20 Batch) - 2022

Linear Regression
The simplest way to predict values of a random variable in Statistics can be considered as Linear
Regression technique.

Simple Linear Regression


Mainly the relationship between two continuous variables, which can be measured
simultaneously on an experimental unit in an experiment is considered, where one variable will
be taken as the dependent variable (y) and the other is said to be the independent or the
explanatory variable (x).

Eg: Temperature and Pressure

Indications

1. Scatter Diagram
2. Correlation Coefficient

Relationship between X and Y


28
27
26
25
24
Y

23
22
21
20
20 21 22 23 24 25 26 27 28 29 30
X

Simple Linear Regression

In simple linear regression, we allow only one independent variable to predict the dependent
variable. Under multiple linear regression there can be many independent variables predicting a
sing dependant variable.
39

MA 3014 – S3 (’20 Batch) - 2022

In here, a set of measurements(𝑥 , 𝑦 ); i=1,2,3…non n individuals are taken and if evidence is


available on a scatter diagram for a linear relationship between x and y, a regression function
will be established to model the relationship.

Relationship between X and Y


28
27
26
𝜀
25
y = -0.5113x + 38.405
24
Y

R² = 0.5231
23
22
21
20
20 21 22 23 24 25 26 27 28 29 30
X

The coefficient of determination (denoted by R2) is a key output of regression analysis. It is


interpreted as the proportion of the variance in the dependent variable that is predictable from the
independent variable.

General Model

𝒀 = 𝜶 + 𝜷𝑿

For an existing individual

𝑦 = 𝛼 + 𝛽𝑥 + 𝜀 ;

𝜀 = 𝑡ℎ𝑒 𝑜𝑏𝑠𝑒𝑟𝑣𝑎𝑡𝑖𝑜𝑛′𝑠 𝑑𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛 𝑓𝑟𝑜𝑚 𝑡ℎ𝑒 𝑚𝑜𝑑𝑒𝑙 𝑌 = 𝛼 + 𝛽𝑋

OR

𝜀 = 𝑒𝑟𝑟𝑜𝑟 𝑖𝑛 𝑝𝑟𝑒𝑑𝑖𝑐𝑡𝑖𝑜𝑛

Fitting a Linear Regression Function

First Step :Plot the scatters and look for any evidence for a linear relationship.
40

MA 3014 – S3 (’20 Batch) - 2022

Assumption:

Error terms are independently and identically distributed as Normal with mean zero and
variance𝜎 .

i.e.𝜀~𝑁(0, 𝜎 )

Parameter Estimation (estimating 𝜶 &𝜷):

Estimation will be carried out based on the principle of “least squares”.

In least square estimation, the “sum of squares of errors” will be minimized.

i.e we will find 𝛼, 𝛽 such that ∑ 𝜀 is at minimum.

Let ESS = Error Sum of Squares

𝑬𝑺𝑺 = 𝜀 = (𝑦 − 𝛼 − 𝛽𝑥 )

( ) ( )
ESS is at minimum when = 0 and =0

Computing 𝛽 ; the least square estimate of 𝛽 (regression coefficient)

𝜕(𝐸𝑆𝑆) 𝜕(∑ (𝑦 − 𝛼 − 𝛽𝑥 ) )
= =2 𝑥 (𝑦 − 𝛼 − 𝛽𝑥 ) = 0
𝜕𝛽 𝜕𝛽

𝑥 𝑦 −𝛼 𝑥 −𝛽 𝑥 =0

𝑥 𝑦 =𝛼 𝑥 +𝛽 𝑥 − − − − − − − −(1)
41

MA 3014 – S3 (’20 Batch) - 2022

Let 𝛼 be the least square estimate of 𝛼 and

𝜕(𝐸𝑆𝑆) 𝜕(∑ (𝑦 − 𝛼 − 𝛽𝑥 ) )
= =2 (𝑦 − 𝛼 − 𝛽𝑥 ) = 0
𝜕𝛼 𝜕𝛼

𝜕(𝐸𝑆𝑆)
= 𝑦 − 𝑛𝛼 − 𝛽 𝑥 =0
𝜕𝛼

𝑦 = 𝑛𝛼 + 𝛽 𝑥 − − − − − − − −(2)

Equation (1) and (2) are called the Normal Equations.

From (2)

∑ 𝑦 ∑ 𝑥
𝛼= − 𝛽
𝑛 𝑛

𝜶 = 𝒚 − 𝜷𝒙

Substituting for 𝛼 in equation (1)

∑ 𝑦 ∑ 𝑥
𝑥𝑦 = − 𝛽 𝑥 +𝛽 𝑥
𝑛 𝑛

∑ 𝑦 ∑ 𝑥 (∑ 𝑥 )
𝑥𝑦 = −𝛽 +𝛽 𝑥
𝑛 𝑛

𝑛∑ 𝑥 𝑦 −∑ 𝑥 ∑ 𝑦
𝛽=
𝑛 ∑ 𝑥 − (∑ 𝑥 )

𝑺𝒙𝒚 ∑(𝒙𝒊 − 𝒙)(𝒚𝒊 − 𝒚)


𝜷= =
𝑺𝒙𝒙 ∑(𝒙𝒊 − 𝒙)𝟐
42

MA 3014 – S3 (’20 Batch) - 2022

∑ 𝑥 𝑦 − 𝑛𝑥̅ 𝑦
𝛼 = 𝑦 − 𝛽 𝑥̅ 𝑎𝑛𝑑 𝛽=
∑ 𝑥 − 𝑛𝑥̅

Example:

X y
26.8 26.5
28.9 24.2
23.6 27.1
28.1 22.5
22.6 25.8
27.7 23.6
24.7 26.3
25.6 24.9

Confidence Interval for 𝜷

It can be proved that under the assumption of 𝜀~𝑁(0, 𝜎 ); 𝛽 ~𝑁(𝛽, ).

When 𝜎 𝑖𝑠 𝑢𝑛𝑘𝑛𝑜𝑤𝑛 it can be estimated by𝜎 = .

𝛽−𝛽
𝑃 ⎛𝑡 , ≤ ≤ 𝑡 ,
⎞ = (1 − 𝛼)

⎝ ⎠

Hypothesis Testing on Regression Coefficient

𝐻 ∶𝛽=0

𝐻 ∶ 𝛽 ≠ 0; 𝑟𝑒𝑔𝑟𝑒𝑠𝑠𝑖𝑜𝑛 𝑐𝑜𝑒𝑓𝑓𝑖𝑐𝑖𝑒𝑛𝑡 𝑖𝑠 𝑠𝑖𝑔𝑛𝑖𝑓𝑖𝑐𝑎𝑛𝑡𝑙𝑦 𝑑𝑖𝑓𝑓𝑒𝑟𝑒𝑛𝑡 𝑓𝑟𝑜𝑚 0


43

MA 3014 – S3 (’20 Batch) - 2022

Test Statistic and its distribution

𝛽−𝛽
~𝑡

Analysis of Variance (ANOVA)

𝐻 ∶ 𝑅𝑒𝑔𝑟𝑒𝑠𝑠𝑖𝑜𝑛 𝐿𝑖𝑛𝑒 𝑑𝑜𝑒𝑠 𝑛𝑜𝑡 𝑓𝑖𝑡 𝑡ℎ𝑒 𝑑𝑎𝑡𝑎 𝑤𝑒𝑙𝑙

𝐻 ∶ 𝑅𝑒𝑔𝑟𝑒𝑠𝑠𝑖𝑜𝑛 𝐿𝑖𝑛𝑒 𝑓𝑖𝑡𝑠 𝑡ℎ𝑒 𝑑𝑎𝑡𝑎 𝑤𝑒𝑙𝑙

Source of Sum of Degrees of Mean Sum of F-ratio p-value


variation Squares Freedom Squares(MS) (prob.>F)
(SS) (df)

REGRESSION
∑𝒏𝒊 𝟏(𝒚 − 𝑭𝒄𝒂𝒍 =
(estimation via 1 RSS/1
𝒚)2
reg. line)
𝐑𝐒𝐒/𝟏
ERROR(Residual)
∑𝒏𝒊 𝟏(𝒚𝒊 − 𝐄𝐒𝐒/(𝐧 − 𝟐) P(𝑭𝟏,𝒏 𝟐 ≥
(error in n-2 ESS/(n-2) 𝑭𝒄𝒂𝒍 )
𝒚 )2
estimation)

TOTAL
∑𝒏𝒊 𝟏(𝒚𝒊 −
(estimation + n-1 ~𝑭𝟏,𝒏
𝒚)2 𝟐
error)

Example :

Description: These data are on the production of power from wind mills. Direct Current (DC)
output was measured against wind speed (in miles per hour).

Number of observations: 25

Variable Description
output Current output produced by the wind mill
speed Windspeed (in miles per hour)
44

MA 3014 – S3 (’20 Batch) - 2022

Source:Joglekar, G., Schuenemeyer, J.H. and LaRiccia, V. (1989) Lack-of-fit testing when
replicates are not available, American Statistician, 43, pp. 135-143.

(speed,output)≡ (𝑥, 𝑦)
0.123,2.45 1.582,5.00 2.166,8.15
0.500,2.70 1.501,5.45 2.112,8.80
0.653,2.90 1.737,5.80 2.303,9.10
0.558,3.05 1.822,6.00 2.294,9.55
1.057,3.40 1.866,6.20 2.386,9.70
1.137,3.60 1.930,6.35 2.236,10.00
1.144,3.95 1.800,7.00 2.310,10.20
1.194,4.10 2.088,7.40
1.562,4.60 2.179,7.85

ANOVAa

Model Sum of Squares df Mean Square F Sig.

Regression 134.282 1 134.282 160.257 .000b

1 Residual 19.272 23 .838

Total 153.554 24

a. Dependent Variable: Electricity Production at the Wind Mill


b. Predictors: (Constant), Wind Speed (in miles per hour)
45

MA 3014 – S3 (’20 Batch) - 2022

Multiple Linear Regression


Simple linear regression analysis is done when there is only one predictor variable. More often
the regression analysis is conducted with multiple predictor variables to predict a single response
variable.

We can represent multiple regression model using matrix notations as follows,

𝑌 = 𝛽𝑋 + 𝜀

𝑦 𝛽 𝜀
1 𝑋 ⋯ 𝑋
𝑦. 𝛽. 𝜀.
Where, 𝑌 = .. 𝛽= .. 𝜀= .. 𝑋= ⋮ ⋱ ⋮
. . .
𝑦 𝜀 1 𝑋 ⋯ 𝑋
𝛽

ANOVA for multiple regression model

In multiple regression also total variation is decomposed as follows,

(𝑦 − 𝑦) = (𝑦 − 𝑦) + (𝑦 − 𝑦)

ANOVA table

SOV SS DF MS F-ratio

Regression 𝑅𝑆𝑆 𝑘−1 𝑀𝑅𝑆


= (𝑦 − 𝑦) = 𝑅𝑆𝑆 𝑘−1

error 𝐸𝑆𝑆 𝑛−𝑘 𝑀𝑆𝐸 𝐹


− 𝑏𝑦 𝑠𝑢𝑏𝑡𝑟𝑎𝑐𝑡𝑖𝑜𝑛 = 𝐸𝑆𝑆 𝑛−𝑘 = 𝑀𝑅𝑆 𝑀𝑆𝐸 ~𝐹

Total 𝑇𝑆𝑆 𝑛−1


= (𝑦 − 𝑦)

Partial F-test

This test assesses whether the addition of any specific independent variable, given others
already in the model significantly contributes to the prediction of 𝑌.

Suppose that we wish to test whether adding a variable 𝑋 to the model significantly improves the
prediction of the response 𝑌, given that variables 𝑋 , 𝑋 , … , 𝑋 are already in the model.
46

MA 3014 – S3 (’20 Batch) - 2022

𝐻 : 𝑋 doesn’t significantly add to the prediction of 𝑌, given 𝑋 , 𝑋 , … , 𝑋 are already in the


model.

𝐻 : 𝑋 does significantly add to the prediction of 𝑌.

Test Statistics

[𝐸𝑆𝑆(𝑅𝑀) − 𝐸𝑆𝑆(𝐹𝑀)]
𝐹= 1 ~𝐹
𝐸𝑆𝑆(𝐹𝑀)
𝑛−𝑝−1

Multi-collinearity in Regression Analysis

Multi-collinearity is defined as having predictor variables that are strongly correlated


with one or more of the other predictor variable.

Effect of multi-collinearity

 Effects on Regression coefficient


 Effects on extra sum of squares
 Effects on fitted values and predictions
 Effects on tests of regression coefficient
Detecting multi-collinearity

 Correlation coefficient between predictor variables


 VIFs of the predictors
 Eigen values of design matrix
Example: Refer Data Set 1-MLR in SPSS and determine the best model to describe 𝑌 and test
the multicollinearity. Comment on your results.

Exercise: Refer Data Set 2-MLR in SPSS,

a) Identify the relationship among variables.


b) Determine the best model to describe fish’s weight and comment on your results.
c) Test the multicollinearity
47

MA 3014 – S3 (’20 Batch) - 2022

Design of Experiment
The statistical Design of Experiment is an efficient procedure for planning experiments
so that the data obtained can be analyzed to yield valid and objective conclusions.

An experimental design is the laying out of a detailed experimental plan in advance of


doing the experiment. Well-chosen experimental designs maximize the amount of information
that can be obtained for a given amount of experimental effort.

Treatments: These are the different procedures whose effects are to be measured and compared.

Experimental Units: The group of material to which a treatment is applied in a single trial of
the experiment.

Replication: Each treatment appearing more than once or applied more than one unit in the
experiment.

Example:

1) An experiment was conducted to determine how five different kind of work tasks affect a
worker’s pulse rate. In this experiment, 60 male workers were assigned at random to five
different groups so that there were 12 workers in each group. The five tasks were
randomly assigned to the groups so that each group gets one and only one task. The pulse
rates of workers were measured after each task.
a. Identify the experimental unit here.
b. How many treatments are there, and what are they?
c. How many replications are there in the experiment?
d. What is the response variable that should be analyzed to compare the effects of
the treatments?
2) A plant breeder wants to study the effect of three levels of nitrogen and three levels of
potassium on his new variety of paddy. He wants to try all possible nitrogen potassium
combinations in his experiment. He has four blocks of lands, each of which is divided
into 9 parts. Each combination is assigned at random to one part in each block.
a. Identify the experimental unit here.
b. How many treatments or combinations are there, and what are they?
c. How many replications are there?
Structures of the Experimental Design

Treatment Structure: Consists of the set of treatments, treatment combinations or population, that
the experimenter has selected for comparison.

Design Structure: Consists of the grouping of the units into homogeneous groups or blocks.
There are 4 types of design structures which are commonly used.
48

MA 3014 – S3 (’20 Batch) - 2022

1. Completely Randomized Design (CRD)


CRD is a simplest of all experimental designs. In here all experimental units are assumed
homogeneous and treatments are assigned to the experimental units completely at random.

Testing ANOVA

The model for one way classification in CRD is,

𝑦 =𝜇+𝛼 +𝜀 ; 𝑖 = 1,2, … , 𝑡 ; 𝑗 = 1,2, … , 𝑛

In ANOVA we are interested in testing the equality of the 𝑡 treatment means. Therefore, the
appropriate hypothesis of interest is,

𝐻 :𝜇 =𝜇 =⋯= 𝜇

𝐻 : 𝜇 ≠ 𝜇 𝑓𝑜𝑟 𝑎𝑡 𝑙𝑒𝑎𝑠𝑡 𝑜𝑛𝑒 𝑝𝑎𝑖𝑟 𝑜𝑓 (𝑖, 𝑗)

Total sum of squares can be decomposed as,

𝑆𝑆(𝑇𝑜𝑡𝑎𝑙) = 𝑆𝑆(𝑇𝑟𝑒𝑎𝑡𝑚𝑒𝑛𝑡) + 𝑆𝑆(𝐸𝑟𝑟𝑜𝑟)

𝑦 − 𝑦.. = (𝑦 . − 𝑦.. ) + 𝑦 −𝑦.

ANOVA table for CRD model in general as follows,

SOV DF SS MS Test Statistic

Treatment 𝑡−1 𝑦. 𝑦.. 𝑀𝑆𝑇𝑟𝑡


𝑆𝑆𝑇𝑟𝑡 = −
(between 𝑛 𝑁 = 𝑆𝑆𝑇𝑟𝑡 (𝑡 − 1)
Trts)

Error (within 𝑁−𝑡 𝑆𝑆𝐸 – 𝑏𝑦 𝑆𝑢𝑏𝑡𝑟𝑎𝑐𝑡𝑖𝑜𝑛 𝑀𝑆𝐸 𝐹


Trts) = 𝑆𝑆𝐸 𝑀𝑆𝑇𝑟𝑡
𝑁−𝑡 = ~𝐹
𝑀𝑆𝐸
Total 𝑁−1 𝑦..
𝑆𝑆𝑇 = 𝑦 −
𝑁
49

MA 3014 – S3 (’20 Batch) - 2022

Example:

1) An engineer is interested in determining if the RF power setting affects the etch rate, and
she runs a completely randomized experiment with four levels of power and five
replicates. Data are given below. Construct one way ANOVA table stating hypothesis,
model clearly. What is the conclusion from the hypothesis test?

RF Power Rate 1 Rate 2 Rate 3 Rate 4 Rate 5

160 575 542 530 539 570

180 565 593 590 579 610

200 600 651 610 637 629

220 725 700 715 685 710

2) A manufacturer suspects that the batches of raw material furnished by her supplier differ
significantly in calcium content. There is a large number of batches currently in the
warehouse. Five of these are randomly selected for study. A chemist makes five
determinations on each batch and obtains the following data.

Batch 1 Batch 2 Batch 3 Batch 4 Batch 5

23.46 23.59 23.51 23.28 23.29

23.48 23.46 23.64 23.40 23.46

23.56 23.42 23.46 23.37 23.37

23.39 23.49 23.52 23.46 23.32

23.40 23.50 23.49 23.39 23.38

Is there a significant variation in calcium content from batch to batch?


50

MA 3014 – S3 (’20 Batch) - 2022

2. Randomized Complete Block Design (RCBD)


RCBD is any design in which the number of experimental units within a block is a multiple of
the number of treatments, and thus a complete set of treatments can be assigned completely at
random in each block.

Usually blocking is done in such a way that experimental units within a block are as
homogeneous as possible and experimental units between blocks are not homogeneous.

Testing ANOVA

The model for an RCBD with 𝑡 treatments and 𝑏 blocks is given by,

𝑦 =𝜇+𝛼 +𝛽 +𝜀 ; 𝑖 = 1,2, … , 𝑡 ; 𝑗 = 1,2, … , 𝑏

𝐻 : 𝛼 = 𝛼 = ⋯ = 𝛼 (Treatment have no effects)

𝐻 : 𝛼 ≠ 𝛼 at least one pair

Total sum of squares can be decomposed for RCBD as,

𝑆𝑆(𝑇𝑜𝑡𝑎𝑙) = 𝑆𝑆(𝑇𝑟𝑒𝑎𝑡𝑚𝑒𝑛𝑡) + 𝑆𝑆(𝐵𝑙𝑜𝑐𝑘) + 𝑆𝑆(𝐸𝑟𝑟𝑜𝑟)

𝑦 − 𝑦.. = (𝑦 . − 𝑦.. ) + 𝑦. − 𝑦.. + 𝑦 − 𝑦 . − 𝑦. + 𝑦..

ANOVA table for RCBD model in general as follows,

SOV DF SS MS Test Statistic

Between 𝑏−1 𝑦. 𝑦.. 𝑀𝑆𝐵 =


𝑆𝑆𝐵 = −
Blocks 𝑡 𝑏𝑡 𝑆𝑆𝐵
(𝑏 − 1)

Between 𝑡−1 𝑦. 𝑀𝑆𝑇𝑟𝑡 𝐹


𝑆𝑆𝑇𝑟𝑡 =
Treatments 𝑏 = 𝑆𝑆𝑇𝑟𝑡 𝑀𝑆𝑇𝑟𝑡
𝑡−1 = ~𝐹( )(
𝑀𝑆𝐸
𝑦..

𝑏𝑡
Error (𝑏 − 1)(𝑡 𝑆𝑆𝐸 – 𝐵𝑦 𝑆𝑢𝑏𝑡𝑟𝑎𝑐𝑡𝑖𝑜𝑛 𝑀𝑆𝐸
− 1) = 𝑆𝑆𝐸 (𝑏 − 1)(𝑡 − 1

Total 𝑏𝑡 − 1 𝑦..
𝑆𝑆𝑇 = 𝑦 −
𝑏𝑡
51

MA 3014 – S3 (’20 Batch) - 2022

In an RCBD one usually is not interested in comparison between blocks as the experimental
design is based on the assumptions that block effects are different.

Example:

1) In a study to reduce stress on air traffic controllers, 3 alternative systems A, B and C have
been proposed. Six controllers were chosen for the study and each controller had to use
all 3 systems but in random order. The following data provide a measure of the stress for
each controller on each system.

Controller Controller Controller Controller Controller Controller


1 2 3 4 5 6

System 15 14 10 13 16 13
A
System 15 14 11 12 13 13
B
System 18 14 15 17 16 13
C

Analyze the data and test if the 3 treatments are different in their level of stress.

2) Three laboratories, A, B, and C are used by food manufacturing companies for making
nutrition analyses of their products. The following data are the fat contents (in grams) of
the same weight of three similar types of peanut butter.

Peanut Butter Lab A Lab B Lab C Lab D

Brand 1 16.6 17.7 16.0 16.3

Brand 2 16.0 15.5 15.6 15.9

Brand 3 16.4 16.3 15.9 16.2

Analyze the data at 5% significance by,


a. carrying out a one-way ANOVA to see if there is a difference between the fat
content of the three brands
b. performing a two-way ANOVA to see if there is any difference between the
Brands using the laboratories as blocks.
c. Do you think there is any evidence that the results were not reasonably consistent
between the four laboratories?
52

MA 3014 – S3 (’20 Batch) - 2022

3. Latin Squares Design (LSD)


The Latin Squares design structure consists of blocking in two directions. For an experiment
involving 𝑡 treatments, 𝑡 Experimental units are arranged in to a 𝑡 × 𝑡 square. To construct the
design, the treatments are randomly assigned to experimental units in the square such that each
treatment occurs once and only once.

Example: Consider five gasolines A, B, C, D, and E have to be tested for the miles they do per
gallon on cars. Suppose five different brands of cars are available and each can only test one
gasoline per day. The day-to-day weather difference is also likely to influence to mileage on
cars. (Here there are two sources of variation other than gasoline differences, that one need to
remove). Show that how can we effectively use a Latin Square design for this experiment.

Assuming that interaction do not exist between any pair of factors, the statistical model of LS
can be given as,

𝑦 =𝜇+𝛼 +𝛽 +𝛾 ( ) +𝜀

The ANOVA table, which is an extension of the ANOVA for an RCBD as follows,

SOV DF SS MS Test Statistics

Rows 𝑡−1 𝑆𝑆𝑅 𝑀𝑆𝑅


𝑦. 𝑦.. = 𝑆𝑆𝑅 𝑡−1
= −
𝑡 𝑡

Columns 𝑡−1 𝑆𝑆𝐶 𝑀𝑆𝐶


𝑦. 𝑦.. = 𝑆𝑆𝐶 𝑡−1
= −
𝑡 𝑡

Treatments 𝑡−1 𝑆𝑆𝑇𝑟𝑡 𝑀𝑆𝑇𝑟𝑡


𝑦 𝑦.. = 𝑆𝑆𝑇𝑟𝑡 𝑡−1
= −
𝑡 𝑡

Errors (𝑡 − 1)(𝑡 𝑀𝑆𝐸 𝑀𝑆𝐸 𝐹


− 2) − 𝐵𝑦 𝑆𝑢𝑏𝑡𝑟𝑎𝑐𝑡𝑖𝑜𝑛 = 𝑆𝑆𝐸 ⁄ (𝑡 𝑀𝑆𝑇𝑟𝑡
= ~𝐹( )( )
− 1)(𝑡 − 2) 𝑀𝑆𝐸

Total 𝑡 −1 𝑆𝑆𝑇
= 𝑦

𝑦..

𝑡
53

MA 3014 – S3 (’20 Batch) - 2022

Example:

1) The following data resulted from an experiment to compare three burners B1, B2, and
B3. A Latin square design was resulted was used as the tests were made on three engines
and were spread over three days.
Engine 1 Engine 2 Engine 3

Day 1 B1 16 B2 17 B3 20

Day 2 B2 16 B3 21 B1 15

Day 3 B3 15 B1 12 B2 13

Test the hypothesis that there is no difference between the burners at 5% level of
significance yields.
2) In an experiment to compare the effects of three treatments A, B, and C on the milk yield
of 3 dairy cows and 3 successive periods during lactation were used as columns and rows
respectively, in a LS. Total nutrient consumptions are given as follows,

Cow 1 Cow 2 Cow 3

I A 608 B 885 C 940

II B 715 C 1087 A 766

III C 844 A 711 B 832

Compute the ANOVA and test whether the treatments are significantly different at 5%
level of significance.

Wish You All The Best!

You might also like