0% found this document useful (0 votes)
7 views

Interval of Confidence

This document discusses confidence intervals. It defines confidence intervals and explains that they give a range of values that have a certain probability of including the true population parameter. It then covers how to calculate confidence intervals for the mean when sample sizes are large or small, using the normal and t-distributions, respectively. It also addresses how to calculate confidence intervals for proportions. An example is provided to demonstrate calculating a 90% confidence interval for the mean time to write a file to disk based on 8 measurements.

Uploaded by

Henrique Calhau
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views

Interval of Confidence

This document discusses confidence intervals. It defines confidence intervals and explains that they give a range of values that have a certain probability of including the true population parameter. It then covers how to calculate confidence intervals for the mean when sample sizes are large or small, using the normal and t-distributions, respectively. It also addresses how to calculate confidence intervals for proportions. An example is provided to demonstrate calculating a 90% confidence interval for the mean time to write a file to disk based on 8 measurements.

Uploaded by

Henrique Calhau
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 34

Confidence intervals

2022/2023

Luís Paquete
University of Coimbra
Confidence intervals

Outline

● Point estimation
● Confidence interval for the mean (large and small number of measurements)
● Confidence interval for the proportion
Confidence intervals

Confidence intervals

● A confidence interval gives an interval of values that has a certain probability


(confidence level) of including the true value.
● Note that the confidence interval is an indication only of the precision (random
errors) of the measuring process, not its accuracy (systematic errors).
Confidence intervals

Point estimation

● A point estimator of x1, x2, ..., xn is a statistic used to estimate a population


parameter. For example the function

is a point estimator of the population mean . The function

is a point estimator of the population variance


Confidence intervals

Point estimation

● The value of a point estimator over x1, x2, ..., xn is a point estimate of the population
parameter.

● Example: if xi are the observed grade averages of a sample of 88 students, then

is a point estimate of , the mean grade average of all the students in the population.
Confidence intervals

Central Limit Theorem

● If each of the n measurements used to compute are independent and come from
the same population with mean and standard deviation , the Central Limit
Theorem ensures that for large n (typically, n ≥ 30), the sample mean follows a
normal distribution with mean = and standard deviation =
Confidence intervals

A side note: Test the Central Limit Theorem

● The Standard Uniform distribution has and


Histogram of all

● Generate 100 000 samples of n = 30 following the Standard

15000
Uniform distribution and collect the mean for each sample.
Print the mean and the standard deviation of the sample

10000
means and plot the histogram of the sample means.

Frequency
all = c()

5000
for (i in 1:100000) {
all = c(all,mean(runif(30)))
}
print(mean(all));print(sd(all))

0
print(0.5); print(1/sqrt(12)/sqrt(30))
0.3 0.4 0.5 0.6 0.7
hist(all)
all

● The mean is 0.5001 and the standard deviation is 0.05257 (it may differ slightly if you repeat).
Confidence intervals

Confidence intervals for the mean (large number of measurements)

● Confidence interval has the form ( - margin of error, + margin of error)


● Find margin of error such that

Pr[ - margin of error ≤ ≤ + margin of error] =

● ( )x100% is the confidence level,


which is defined a priori.

● Which distribution?
Confidence intervals

Confidence intervals for the mean (large number of measurements)

● Confidence interval has the form ( - margin of error, + margin of error)


● Find margin of error such that

Pr[ - margin of error ≤ ≤ + margin of error] =

● ( )x100% is the confidence level,


which is defined a priori.

● Recall Central Limit Theorem:


follows N( , )
margin margin
Confidence intervals

Confidence intervals for the mean (large number of measurements)

● Confidence interval has the form ( - margin of error, + margin of error)


● Find margin of error such that

Pr[ - margin of error ≤ ≤ + margin of error] =

● ( )x100% is the confidence level,


which is defined a priori.

● Recall Central Limit Theorem:


follows N( , )
margin margin
Confidence intervals

Confidence intervals for the mean (large number of measurements)

● Confidence interval has the form ( - margin of error, + margin of error)


● Find margin of error such that

Pr[ - margin of error ≤ ≤ + margin of error] =

● ( )x100% is the confidence level,


which is defined a priori.

● Recall Central Limit Theorem:


follows N( , )
margin margin
Confidence intervals

Confidence intervals for the mean (large number of measurements)

● Confidence interval has the form ( - margin of error, + margin of error)


● Find margin of error such that

Pr[ - margin of error ≤ ≤ + margin of error] =

● ( )x100% is the confidence level,


which is defined a priori.

● Recall Central Limit Theorem:


follows N( , )
margin margin
Confidence intervals

Confidence intervals for the mean (large number of measurements)

● Since the number of measurements is large, the sample mean follows a normal
distribution with mean and standard deviation This is our reference
distribution (let's assume that is known).

● In order to have the smallest interval, we choose


c1 and c2 to form a symmetric interval:

margin margin
Confidence intervals

Confidence intervals for the mean (large number of measurements)

● We use the standard normal distribution (SND), which is a normal distribution with
mean 0 and standard deviation 1. The transformation from a normal distribution to
the SND is performed as follows:

● We define the extreme points - and


such that

- 0
Confidence intervals

Confidence intervals for the mean (large number of measurements)

Then, we have the following interval

( , )
Confidence intervals

Confidence intervals for the mean (large number of measurements)

● The lenght of the interval is

● Meaning of the confidence interval: if we collect


100 sample means, ( )x100% of the
confidence intervals centered in each sample
mean would contain .
Confidence intervals

Confidence intervals for the mean (large number of measurements)

● In practice, we do not know but we can use its estimation s.

● This leads to the following extreme points for the confidence interval:
Confidence intervals

A side note: How to obtain

● In R for = 0.05: qnorm(1-0.05/2)

● From a z-table
Confidence intervals

Confidence intervals for the mean (small number of measurements)

● When the number of measurements is small (n<30) and is unknown, the


distribution of the value z follows a t distribution with n-1 degrees of freedom. This
should be the reference distribution to compute the confidence interval:

( , )

where is the value of the t distribution with n-1 degrees of freedom that
has an area of to its right.

● The t distribution for large n approximates the standard normal distribution quite well.
Confidence intervals

Confidence intervals for the mean (small number of measurements)

● t distribution with n-1 degrees of freedom


Confidence intervals

A side note: How to obtain

● In R for = 0.05 and n = 8: qt(1-0.05/2,7)

● From a t-table
Confidence intervals

Example

● Determine, on average, the time required to write a file of a particular size to a disk
drive. You collected the following values in seconds from 8 measurements: 8.0, 7.0,
5.0, 9.0, 9.5, 11.3, 5.2, and 8.5. Then, =7.94 and s = 2.14.
● Assume 90% confidence level ( =0.1). Since n=8, we use the t distribution with
seven degrees of freedom.

● Then, the 90% confidence interval is (6.5, 9.4)


Confidence intervals

Example

● In R:

> d <- c(5.0,9.0,9.5,11.3,5.2,8.5,8.0,7.0)


> n <- 8
> alpha <- 0.1
> t <- qt(1-alpha/2,n-1)
> mean(d) + c(-1, 1) * t * sd(d) / sqrt(n)
[1] 6.500893 9.374107
Confidence intervals

Example

● In R:

> d <- c(5,9,9.5,11.3,5.2,8.5,8.0,7.0)


> t.test(d,conf.level=0.9)

One Sample t-test

data: d
t = 10.468, df = 7, p-value = 1.581e-05
alternative hypothesis: true mean is not equal to 0
90 percent confidence interval:
6.500893 9.374107
sample estimates:
mean of x
7.9375
Confidence intervals

Example

confidence level confidence interval


90% (6.5, 9.4)
95% (6.1, 9.7)
99% (5.3, 10.6)

● By increasing the confidence level, the confidence interval widens


● By increasing the sample size, the confidence levels shortens
Confidence intervals

Confidence intervals for the proportions

● One may be interested on proportions, for instance, the fraction of the time each
event occurs.
● The proportion p can be estimated by the sample proportion where m is
the number of times the desired outcome occurs out of n measurements.
Confidence intervals

Confidence intervals for the proportions

● Binomial distribution with parameters n and p is the probability distribution of the


number of successes in a sequence of n independent samples with probability p of
success. Its mean is np and its standard deviation is
Confidence intervals

Confidence intervals for the proportions

● The distribution of the sample proportion of size n has mean np/n = p and
standard deviation .

● This distribution can be approximated by a normal distribution with mean p and


standard deviation if np ≥5 and n(1-p) ≥5.

● The confidence interval for proportion p is computed as follows:

( , )
Confidence intervals

Example

● Determine how much time the processor spends executing the operating system
compared with how much time it spends executing the users' applications programs.
● A counter, n, is incremented every time the interrupt-service routine is executed and
counts how many times the interrupt occurs. A second counter, m, is incremented
everytime the operating system was executing when the interrupt occured. During
one minute, you recorded m=658 and n=6000.
● Then, is the proportion that the operating system is being executed. A
95% confidence interval
Confidence intervals

Example

● In R :

> p <- 658/6000


> n <- 6000
> alpha <- 0.05
> z <- qnorm(1-alpha/2)
> p + c(-1, 1) * z * sqrt(p * (1 - p) / n)
[1] 0.1017601 0.1175732
Confidence intervals

Example

● In R (an exact test):

> binom.test(658,6000,conf.level=0.95)
Exact binomial test

data: 658 and 6000


number of successes = 658, number of trials = 6000, p-value < 2.2e-16
alternative hypothesis: true probability of success is not equal to 0.5
95 percent confidence interval:
0.1018680 0.1178471
sample estimates:
probability of success
0.1096667
Confidence intervals

Example

● In R (a variant):

> prop.test(x=658,n=6000,conf.level=0.95)
1-sample proportions test with continuity correction

data: 658 out of 6000, null probability 0.5


X-squared = 3655.1, df = 1, p-value < 2.2e-16
alternative hypothesis: true p is not equal to 0.5
95 percent confidence interval:

0.1019278 0.1179103
sample estimates:
p
0.1096667
Confidence intervals
Recap:

● A confidence interval gives an interval of values that has a certain probability of


including the true value (mean, proportion).
● Increasing the confidence level, the confidence interval widens; increasing the sample
size, the confidence levels shortens.
Confidence intervals

References:

● D.J.Lilja, Measuring computer performance, Cambridge University Press, 2002 (see


chapter 4)

You might also like