The document discusses different types of statistical intervals used for estimating population parameters from sample data. It introduces confidence intervals, which estimate where the true population parameter lies with a certain probability. It then describes how to construct confidence intervals for a single sample when the population variance is known and unknown. The document also discusses prediction intervals, which predict where a future observation will fall, and tolerance intervals, which specify a range that is expected to contain a certain proportion of the population.
Download as PPTX, PDF, TXT or read online on Scribd
0 ratings0% found this document useful (0 votes)
1K views
Week 8 Statistical Intervals
The document discusses different types of statistical intervals used for estimating population parameters from sample data. It introduces confidence intervals, which estimate where the true population parameter lies with a certain probability. It then describes how to construct confidence intervals for a single sample when the population variance is known and unknown. The document also discusses prediction intervals, which predict where a future observation will fall, and tolerance intervals, which specify a range that is expected to contain a certain proportion of the population.
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 32
STATISTICAL INTERVALS EDDIE SANTILLAN JR.
, ECE INSTRUCTOR TOPICS:
Confidence Intervals: Single Sample
Confidence Intervals: Multiple Samples Prediction Intervals Tolerance Intervals INTRODUCTION Even the most efficient unbiased estimator is unlikely to estimate the population parameter exactly. It is true that estimation accuracy increases with large samples, but there is still no reason we should expect a point estimate from a given sample to be exactly equal to the population parameter it is supposed to estimate. There are many situations in which it is preferable to determine an interval within which we would expect to find the value of the parameter. Such an interval is called an interval estimate. INTRODUCTION Suppose that X1, X2, … , Xn is a random sample from a normal distribution with unknown mean μ and known variance σ2. We know that the sample mean 𝑋ത is normally distributed with mean μ and variance σ2/n. We may standardize 𝑋ത by subtracting the mean and dividing by the standard deviation, which results in the variable ത 𝑋−𝜇 Z= 𝜎/ 𝑛
The random variable Z has a standard normal distribution.
A confidence interval (CI) estimate for μ is an interval of the form l ≤ μ ≤ u, where the endpoints l and u are computed from the sample data. Because different samples will produce different values of l and u, these end-points are values of random variables L and U, respectively. Suppose that we can determine values of L and U such that the following probability statement is true: P{L ≤ μ ≤ U} = 1 − α, where 0 ≤ α ≤ 1. There is a probability of 1 − α of selecting a sample for which the CI will contain the true value of μ. Once we have selected the sample, so that X1 = x1, X2 = x2, …, Xn = xn, and computed l and u, the resulting confidence interval for μ is l≤μ≤u The end-points or bounds l and u are called the lower- and upper- confidence limits, respectively, and 1 − α is called the confidence coefficient. For example, a random sample of SAT verbal scores for students in the entering freshman class might produce an interval from 530 to 550, within which we expect to find the true average of all SAT verbal scores for the freshman class. The values of the endpoints, 530 and 550, will depend on the computed sample mean 𝑥ҧ and the sampling distribution of 𝑋.ത As the sample size increases, we know that σ𝑋2ത = σ2/n decreases, and consequently our estimate is likely to be closer to the parameter μ, resulting in a shorter interval. CONFIDENCE INTERVALS: SINGLE SAMPLE Confidence Interval on μ, σ2 Known If 𝑥ҧ is the mean of a random sample of size n from a population with known variance σ2, a 100(1 − α)% confidence interval (CI) for μ is given by 𝛔 𝛔 𝐱ത – zα/2 < μ < 𝐱ത + zα/2 , 𝐧 𝐧 where zα/2 is the z-value leaving an area of α/2 to the right. CONFIDENCE INTERVALS: SINGLE SAMPLE EXAMPLE 1: The average zinc concentration recovered from a sample of measurements taken in 36 different locations in a river is found to be 2.6 grams per milliliter. Find the 95% and 99% confidence intervals for the mean zinc concentration in the river. Assume that the population standard deviation is 0.3 gram per milliliter. Answer: 95%: 2.502 < μ < 2.698 99%: 2.47 < μ < 2.73 CONFIDENCE INTERVALS: SINGLE SAMPLE Sample Size for Specified Error on the Mean, Variance Known If 𝑥ҧ is used as an estimate of μ, we can be 100(1 − α)% confident that the error, e, will not exceed 𝜎 e ≤ zα/2 . 𝑛
If 𝑥ҧ is used as an estimate of μ, we can be 100(1 − α)%
confident that the error will not exceed a specified amount e when the sample size is 𝑧𝛼/2 𝜎 2 n= ( ). 𝑒 CONFIDENCE INTERVALS: SINGLE SAMPLE EXAMPLE 2: How large a sample is required if we want to be 95% confident that our estimate of μ in Example 1 is off by less than 0.05? (139) CONFIDENCE INTERVALS: SINGLE SAMPLE One-Sided Confidence Bounds on μ, σ2 Known If 𝑋ത is the mean of a random sample of size n from a population with variance σ2, the one-sided 100(1 − α)% confidence bounds for μ are given by upper one-sided bound: xത + zασ/ n; lower one-sided bound: xത − zασ/ n. CONFIDENCE INTERVALS: SINGLE SAMPLE EXAMPLE 3: In a psychological testing experiment, 25 subjects are selected randomly and their reaction time, in seconds, to a particular stimulus is measured. Past experience suggests that the variance in reaction times to these types of stimuli is 4 sec2 and that the distribution of reaction times is approximately normal. The average time for the subjects is 6.2 seconds. Give an upper 95% bound for the mean reaction time. (6.858 seconds) CONFIDENCE INTERVALS: SINGLE SAMPLE t – Distribution Let X1, X2, … , Xn be a random sample from a normal distribution with unknown mean μ and unknown variance σ2. The random variable ത 𝑋−𝜇 T= 𝑆/ 𝑛
has a t distribution with n - 1 degrees of freedom
CONFIDENCE INTERVALS: SINGLE SAMPLE Confidence Interval on μ, σ2 Unknown If xത and s are the mean and standard deviation of a random sample from a normal population with unknown variance σ2, a 100(1−α)% confidence interval for μ is 𝒔 𝒔 𝐱ത − tα/2 < μ < 𝐱ത + tα/2 , 𝒏 𝒏 where tα/2 is the t-value with v = n − 1 degrees of freedom, leaving an area of α/2 to the right. CONFIDENCE INTERVALS: SINGLE SAMPLE EXAMPLE 4: The contents of seven similar containers of sulfuric acid are 9.8, 10.2, 10.4, 9.8, 10.0, 10.2, and 9.6 liters. Find a 95% confidence interval for the mean contents of all such containers, assuming an approximately normal distribution. (9.74 < μ < 10.26) CONFIDENCE INTERVALS: MULTIPLE SAMPLES Often statisticians recommend that even when normality cannot be assumed, σ is unknown, and n ≥ 30, s can replace σ and the confidence interval 𝑠 𝑥ҧ ± zα/2 𝑛
may be used. This is often referred to as a large-sample
confidence interval. The justification lies only in the presumption that with a sample as large as 30 and the population distribution not too skewed, s will be very close to the true σ and thus the Central Limit Theorem prevails. It should be emphasized that this is only an approximation and the quality of the result becomes better as the sample size grows larger. CONFIDENCE INTERVALS: MULTIPLE SAMPLES EXAMPLE 5: Scholastic Aptitude Test (SAT) mathematics scores of a random sample of 500 high school seniors in the state of Texas are collected, and the sample mean and standard deviation are found to be 501 and 112, respectively. Find a 99% confidence interval on the mean SAT mathematics score for seniors in the state of Texas. (488.1 < μ < 513.9) PREDICTION INTERVALS Sometimes, other than the population mean, the experimenter may also be interested in predicting the possible value of a future observation. For instance, in quality control, the experimenter may need to use the observed data to predict a new observation. A process that produces a metal part may be evaluated on the basis of whether the part meets specifications on tensile strength. On certain occasions, a customer may be interested in purchasing a single part. In this case, a confidence interval on the mean tensile strength does not capture the required information. The customer requires a statement regarding the uncertainty of a single observation. This type of requirement is nicely fulfilled by the construction of a prediction interval. PREDICTION INTERVALS Prediction Interval of a Future Observation, σ2 Known For a normal distribution of measurements with unknown mean μ and known variance σ2, a 100(1 − α)% prediction interval of a future observation x0 is xത − zα/2σ 1 + 1/n < x0 < xത + zα/2σ 1 + 1/n, where zα/2 is the z-value leaving an area of α/2 to the right. PREDICTION INTERVALS EXAMPLE 6: Due to the decrease in interest rates, the First Citizens Bank received a lot of mortgage applications. A recent sample of 50 mortgage loans resulted in an average loan amount of $257,300. Assume a population standard deviation of $25,000. For the next customer who fills out a mortgage application, find a 95% prediction interval for the loan amount. ($207,812.43 < xo < $306,787.57) PREDICTION INTERVALS Prediction Interval of a Future Observation, σ2 Unknown For a normal distribution of measurements with unknown mean μ and unknown variance σ2, a 100(1 − α)% prediction interval of a future observation x0 is ഥ − tα/2 s 1 + 1/n < x0 < 𝒙 𝒙 ഥ + tα/2 s 1 + 1/n, where tα/2 is the t-value with v = n − 1 degrees of freedom, leaving an area of α/2 to the right. PREDICTION INTERVALS EXAMPLE 7: A meat inspector has randomly selected 30 packs of 95% lean beef. The sample resulted in a mean of 96.2% with a sample standard deviation of 0.8%. Find a 99% prediction interval for the leanness of a new pack. Assume normality. (93.96, 98.44) TOLERANCE INTERVALS If process specifications are important (e.g., manufacturing), the manager of the process is concerned about long-range performance, not the next observation. One must attempt to determine bounds that, in some probabilistic sense, “cover” values in the population (i.e., the measured values of the dimension). One method of establishing the desired bounds is to determine a confidence interval on a fixed proportion of the measurements. This is best motivated by visualizing a situation in which we are doing random sampling from a normal distribution with known mean μ and variance σ2. TOLERANCE INTERVALS Clearly, a bound that covers the middle 95% of the population of observations is μ ± 1.96σ. This is called a tolerance interval, and indeed its coverage of 95% of measured observations is exact. However, in practice, μ and σ are seldom known; thus, the user must apply xത ± ks. TOLERANCE INTERVALS Tolerance Limits For a normal distribution of measurements with unknown mean μ and unknown standard deviation σ, tolerance limits are given by xത ± ks, where k is determined such that one can assert with 100(1 − γ)% confidence that the given limits contain at least the proportion 1 − α of the measurements. TOLERANCE INTERVALS EXAMPLE 8: A meat inspector has randomly selected 30 packs of 95% lean beef. The sample resulted in a mean of 96.2% with a sample standard deviation of 0.8%. Find a tolerance interval that gives two-sided 95% bounds on 90% of the distribution of packages of 95% lean beef. Assume the data came from an approximately normal distribution. TOLERANCE INTERVALS Distinction among Confidence Intervals, Prediction Intervals, and Tolerance Intervals In the case of confidence intervals, one is attentive only to the population mean. The tolerance limit interpretation is somewhat related to the confidence interval but if specification is required, then tolerance interval is important than confidence interval. Prediction intervals are applicable when it is important to determine a bound on a single value. The mean is not the issue here, nor is the location of the majority of the population. Rather, the location of a single new observation is required ASSIGNMENT: 1. ASTM Standard E23 defines standard test methods for notched bar impact testing of metallic materials. The Charpy V-notch (CVN) technique measures impact energy and is often used to determine whether or not a material experiences a ductile-to-brittle transition with decreasing temperature. Ten measurements of impact energy (J) on specimens of A238 steel cut at 60ºC are as follows: 64.1, 64.7, 64.5, 64.6, 64.5, 64.3, 64.6, 64.8, 64.2, and 64.3. Assume that impact energy is normally distributed with σ = 1J. Find a 95% CI for μ, the mean impact energy. (63.84 ≤μ≤ 65.08) ASSIGNMENT: 2. Consider the CVN test described in Assignment 1, and suppose that we wanted to determine how many specimens must be tested to ensure that the 95% CI on μ for A238 steel cut at 60°C has a length of at most 1.0 J. Since the bound on error in estimation e is one-half of the length of the CI, to determine sample size n. (16) ASSIGNMENT: 3. The same data for impact testing from Assignment1 are used to construct a lower, one-sided 95% confidence interval for the mean impact energy. Find the lower one-sided interval. (63.9 ≤ 𝜇) ASSIGNMENT: 4. An article in the journal Materials Engineering (1989, Vol. II, No. 4, pp. 275–281) describes the results of tensile adhesion tests on 22 U-700 alloy specimens. The load at specimen failure is as follows (in megapascals): 19.8 10.1 14.9 7.5 15.4 15.4 15.4 18.5 7.9 12.7 11.9 11.4 11.4 14.1 17.6 16.7 15.8 19.5 8.8 13.6 11.9 11.4 (a) Determine the sample mean and the sample standard deviation (b) Find the CI if the population is normally distributed. ASSIGNMENT: 5. A machine produces metal pieces that are cylindrical in shape. A sample of these pieces is taken and the diameters are found to be 1.01, 0.97, 1.03, 1.04, 0.99, 0.98, 0.99, 1.01, and 1.03 centimeters. Use these data to calculate three interval types and draw interpretations that illustrate the distinction between them in the context of the system. For all computations, assume an approximately normal distribution. The sample mean and standard deviation for the given data are 𝑥ҧ = 1.0056 and s = 0.0246. (a) Find a 99% confidence interval on the mean diameter. (b) Compute a 99% prediction interval on a measured diameter of a single metal piece taken from the machine. (c) Find the 99% tolerance limits that will contain 95% of the metal pieces produced by this machine.