Statistics and Probability Module 4 Moodle
Statistics and Probability Module 4 Moodle
Introduction:
Inferential statistics focuses on estimating and predicting the results of a research study. Through
the process of estimation, a parameter value is obtained using the information gathered from a particular
sample. This process is not always accepted since interpretations and generalizations must be made from
the data collected from the whole population. You have learned earlier that a parameter is a numerical
description of the entire population.
Definition
Estimation is the process used to calculate these population parameters by analyzing only a
small random sample from the population. The value or range of values used to approximate a
parameter is called an estimate.
The two types of parameter estimates are the point estimate and the interval estimate.
Point estimate refers to a single values the best determines the true parameter value of the
population. Interval estimate, on the other hand, gives a range of values within which the
parameter value possibly falls. Suppose you work for a manufacturer of light bulbs and you
want to predict the average life expectancy of all the light bulbs you produced. Using a
sample of 100 light bulbs, you claim that the light bulbs last for an average of 5 months.
This is a point estimate of the population parameter. However, if you state that the true life
expectancy of the light bulbs is between 4 and 6 months, then you are giving an interval
estimate of the parameter.
Sample measures , such as the sample mean, can be used to estimate population parameters, say
the population mean. These sample measures are called estimators. The following are the properties of a
good estimator:
1. Unbiasedness – Any parameter estimate can be considered a random variable since its value may
change depending on certain factors including the selection of the members of the sample. Like all
random variables, you can compute its expected value. An estimate is said to be unbiased when the
expectation (i.e., the mean) of all the estimates taken from samples with size n is shown to be equal to
the parameter being estimated.
2
2. Consistency – In the previous lesson, you understood that the standard deviation of the sample statistic
taken from the population is also called the standard error. This, the standard deviation of an estimate is
the standard error of the estimate and, thus, it gives the possible amount of error of predicting the
population parameter. Consistency of an estimator is amount of error of predicting the population
parameter. Consistency of an estimator is achieved when the estimate produced a relatively smaller
standard error. This may be done by increasing the sample used to estimate the population parameter.
As the sample size increases, the value of the estimator approaches the value of the parameter being
estimated.
3. Efficiency – From all the unbiased estimators of the population parameter, the efficient estimator is the
one that gives the smallest variance.
1 1 𝑛
𝑥̅ = ∑𝑛𝑖=1 𝑥𝑖 𝑠2 = ∑𝑖=1(𝑥𝑖 − 𝑥̅ )2
𝑛 𝑛−1
As you can see, the functions are the same formulas for sample mean and sample variance,
Respectively.
Example 1: A researcher wants estimate the average grade of all mathematics students in a certain school.
He determines the grades of five students as follows: 76, 82, 88, 90, and 96. Estimate the
average mathematics grade of all the students of all the students and the variance of their
grades.
Solution:
Given: n= 5; x1=76; x2=82; x3=88; x4=90; x5=95.
1
𝑥̅ = ∑𝑛𝑖=1 𝑥𝑖 Substitute in the formula the given n= 5; x1=76; x2=82;
𝑛
x3=88; x4=90; x5=95.
1
𝑥̅ = (76 + 82 + 88 + 90 + 96) Find the sum.
5
1
𝑥̅ = (432)
5
3
𝑥̅ = 86.4
The estimate of the population mean is 86.4. now, to get the estimate of the population variance,
1 𝑛
𝑠2 = ∑𝑖=1(𝑥𝑖 − 𝑥̅ )2
𝑛−1
1
𝑠2 = [(76 − 86.4)2 + (82 − 86.4)2 + (88 − 86.4)2 +(90 − 86.4)2 + (96 − 86.4)2 ]
4
1
= [(−10.4)2 +(−4.4)2 +(1.6)2 +(3.6)2 +(9.6)2 ]
4
1
= [(108.16) + (19.36) +(2.56) +(12.96 ) + (92.16) ]
4
1
= (235.2)
4
Interval Estimation
A sample statistic contains all the information the sample provides. However, it is
sometimes insufficient to estimate the value of the population parameter using only a single
point estimator. For example, though the sample mean 𝑥̅ is a “good “ estimator of the
population mean, 𝜇, 𝑥̅ is not likely to be equal to 𝜇, and you are unsure of the accuracy of
such a point estimate. But you can use your knowledge of the sampling distribution of 𝑥̅ to
construct an interval around the point estimate and to state your degree of certainty that
the population mean 𝜇 is within that interval. Thus, a < 𝜇 < b, where a and b are the
endpoints of the interval between which the population mean lies. This is the process of
interval estimation and the interval of which the population mean lies. This is the process of
interval estimation and the interval of values that predicts where the true population
parameter belongs is called interval estimate, or most commonly known as the confidence
interval.
You will observe that an interval estimate usually indicates accuracy where you can find
the true sample estimate around the point estimate. When this happens, it covers a specific
range of interval within which the parameter of the population lies.
You may estimate, for instance, the average salary of employees in a certain company but
you are not sure how much an employee gets or the exact figure he or she gets. It is only
within a given range of mean salaries of that company, which is confidential, and thus, you
resort to guessing. You can only make estimation but not get the exact number. The degree
of certainty that the true population parameter falls within the constructed confidence
interval is referred to as the confidence level. For example, if your interval estimate is made
using an 80% confidence level, it indicates that your estimate is correct 80% of the time.
Common choices for confidence levels are 90%, 95%, and 99%.
4
Confidence levels correspond to probabilities (or percentages of area) associated with the
normal curve. To illustrate, a 90% confidence interval covers 90% of the normal curve. Since the
curve is symmetric, then the interval covers 45% of the area to the right and 45% to the left, and
the probability of observing a value outside this interval, usually denoted as 𝛼, is 10%: 5% to the
left.
Figure 4.1 illustrates a 90% confidence interval on the standard normal curve.
45% 45%
𝛼 = 5% 𝛼 = 5%
In general, if C represents the confidence level of the interval of the interval estimate
and 𝛼 the area outside the boundaries of the interval estimate, then under the standard
normal curve, the interval estimate may be written as −𝑧𝑎 < Z < 𝑧𝑎 , where 𝑧𝑎 is simply the
2 2 2
𝐶
z-score corresponding to the probability found in the z-distribution table.
2
Note that 𝛼 = 1 – C is the area on the tails under the normal curve. One may use the
alpha in locating the z-value in constructing the confidence interval as shown in Figure 9.1.
further, remember that because the standard normal table is based on areas between z = 0
𝛼
and 𝑧𝑎 , the z-value is found by locating the area of 0.5 - which is part of the normal curve
2 2
between the middle of the curve and one of the tails. Or you may locate this z-value by
changing the confidence level from percentage to proportion, divide it in half, and go to the
table with this value.
Example 2: Find an interval of values for Z using the standard normal distribution, corresponding to an
area of 95%.
𝐶
Solution: Given: C = 95% or 0.95, = 0.475
2
-3 -2 -1 0 1 2 3
The value of z corresponding to the area 0.475 is 1.96. Thus, the 95% confidence interval is
-1.96 < Z < 1.96.
Example 3: Find an interval of values for Z using the standard normal distribution, corresponding to an
area of 90.1%.
5
Solution: C = 90.1 %
C= 0.901 In decimal form. Move 2 decimal places to the left and remove the
percent symbol.
𝐶 0.901 𝐶
= 0.4505 means the curve is divided by 2 equal parts as seen above.
2 2 2
Hence, the area is 0.4505. Locate in the z-table. The area of 0.4505 corresponds in the z-table located in
1.6 column 0.05. The value left of zero is negative and the value right of zero is positive. Hence,
𝑧𝑎 = ± 1.65. The 09.1% confidence interval is -1.65 < Z < 1.65 or
2
(-1.65, 1.65).
Solution:
Given: C=80%
C=0.80 Change percent to decimal by moving 2 places to the left.
𝐶 𝟎.𝟖𝟎
= = 0.4 The curve is divided into 2 equal parts.
2 𝟐
The area is 0.4. Locate this on the z-table. The area 0.4 is in between of 0.3997 and 0.4015 in the z-
table. We will just get the average instead of interpolation method.
0.3997 0.4 0.4015
0.3997 0.4015 Their corresponding z value in the table are:
1.28 1.29
Find the average of 1.28 and 1.29. We have,
1.28+1.29 2.57
= ≈1.285
2 2
Like all approximations made in the real world setting, an estimate needs to be as close as
possible to the actual value of the parameter being estimated. So with our walk with Him,
we should be more closed to Him as much as possible.
References:
Canlapan, R. and Campena, F. 2016. Diwa Senior High School Series: Statistics and Probability.
Makati City, Philippines. University Press First Asia.
Creative commons.org
Online Math learning
Statistics for Managers Using Microsoft excel 2004 Prentice-Hall.
6