Statistics Important Points: Properties of Normal Distribution
Statistics Important Points: Properties of Normal Distribution
Another difference between populations and samples is the way summary measures are described. Summary measures for a population are called parameters. Summary measures for a sample are called statistics. As an example, suppose you calculate the average for a set of data. If the data set is a population of values, the average is a parameter, which is called the population mean. If the data set is a sample of values, the average is a statistic, which is called the sample average (or the average, for short). The rest of this book uses the word mean to indicate the population mean, and the word average to indicate the sample average. The mean is a measure that describes the center of a distribution of values. The variance is a measure that describes the dispersion around the mean. The standard deviation is the square root of the variance. The sample average is denoted with X and is called x bar. The population mean is denoted with and is called Mu. The sample variance is denoted with s2 and is called s-squared. The population variance is denoted with 2 and is called sigma-squared. Because the standard deviation is the square root of the variance, it is denoted with s for a sample and for a population.
*Note: Use a parametric method if the data meets the assumptions, and use a nonparametric method if it doesnt.
Recall that a normal distribution is the theoretical distribution of values for a population. Many statistical methods assume that the data values are a sample from a normal distribution. For a given sample, you need to decide whether this assumption is reasonable. Because you have only a sample, you can never be absolutely sure that the assumption is correct. What you can do is test the assumption, and, based on the results of the test, decide whether the assumption is reasonable. This testing and decision process is called testing for normality.
Statistical Test for Normality When testing for normality, you start with the idea that the sample is from a normal distribution. Then, you verify whether the data agrees or disagrees with this idea. Using the sample, you calculate a statistic and use this statistic to try to verify the idea. Because this statistic tests the idea, it is called a test statistic. The test statistic compares the shape of the sample distribution with the shape of a normal distribution.
The result of this comparison is a number called a p-value, which describes how doubtful the idea is in terms of probability. A p-value can range from 0 to 1. A p-value close to 0 means that the idea is very doubtful, and provides evidence against the idea. If you find enough evidence to reject the idea, you decide that the data is not a sample from a normal distribution. If you cannot find enough evidence to reject the idea, you proceed with the analysis based on the assumption that the data is a sample from a normal distribution. SAS provides the formal test for normality in PROC UNIVARIATE
with the NORMAL option.
A Type I error means concluding that the alternative hypothesis is true when it is not. A Type II error occurs when you fail to reject the null hypothesis when it is false.
Independent Groups Independent groups of data contain measurements for two unrelated samples of items.
Example, suppose a researcher selects a random sample of children, some who use fluoride toothpaste, and some who do not. There is no relationship between the children who use fluoride toothpaste and the children who do not. A dentist counts the number of cavities for each child. The goal of analysis is to compare the average number of cavities for children who use fluoride toothpaste and for children who do not.
Paired Groups Paired groups of data contain measurements for one sample of items, but there are two measurements for each item. A common example of paired groups is before-and-after measurements, where the goal of analysis is to decide whether the average change from before to after is greater than what could happen by chance. For example, a doctor weighs 30 people before they begin a program to quit smoking, and weighs them again six months after they have completed the program. The goal of analysis is to decide whether the average weight change is greater than what could happen by chance.