Lesson 7:: Normal Distribution in Statistics
Lesson 7:: Normal Distribution in Statistics
The normal distribution is a probability function that describes how the values of a variable are distributed. It is a
symmetric distribution where most of the observations cluster around the central peak and the probabilities for values
further away from the mean taper off equally in both directions. It is also known as the Gaussian distribution and the bell
curve.
We say the data is "normally distributed":
Example:
1. Mean
The mean is the central tendency of the distribution. It defines the location of the peak for normal distributions.
2. Standard Deviation
The standard deviation is a measure of variability. It defines the width of the normal distribution. The standard
deviation determines how far away from the mean the values tend to fall. It represents the typical distance
between the observations and the average.
Common Properties for All Forms of the Normal Distribution
Despite the different shapes, all forms of the normal distribution have the following characteristic properties:
• They’re all symmetric. The normal distribution cannot model skewed distributions.
• The mean, median, and mode are all equal.
• Half of the population is less than the mean and half is greater than the mean.
• The Empirical Rule allows you to determine the proportion of values that fall within certain distances from the
mean.
Example:
Assume that a pizza restaurant has a mean delivery time of 30 minutes and a standard deviation of 5 minutes.
A value on the standard normal distribution is known as a standard score or a Z-score. A standard score represents the
number of standard deviations above or below the mean that a specific observation fall.
Standard scores are a great way to understand where a specific observation falls relative to the entire distribution. They
also allow you to take observations drawn from normally distributed populations that have different means and standard
deviations and place them on a standard scale. This standard scale enables you to compare observations that would
otherwise be difficult. This process is called standardization.
To standardize your data, you need to convert the raw measurements into Z-scores.
To calculate the standard score for an observation, take the raw measurement, subtract the mean, and divide by the
standard deviation. Mathematically, the formula for that process is the following:
X - x̅
Z=
σ
Example:
Suppose we literally want to compare apples to oranges. Specifically, let’s compare their weights. Imagine that we have
an apple that weighs 110 grams and an orange that weighs 100 grams.
Apples Oranges
Mean weight grams 100 140
Standard Deviation 15 25
Then:
Apples: Orages
10 -40
Z= Z=
15 25
Z = 0.667 Z = -1.6
Finding Areas Under the Curve of a Normal Distribution
The normal distribution is a probability distribution. As with any probability distribution, the proportion of the area that
falls under the curve between two points on a probability distribution plot indicates the probability that a value will fall
within that interval.
Using a Table of Z-scores
Let’s take the Z-score for our apple (0.667) and use it to determine its weight percentile. A percentile is the proportion
of a population that falls below a specific value. In the portion of the table below, the closest Z-score to ours is 0.65,
which we’ll use.
The table value indicates that the area of the curve between -0.65 and +0.65 is 48.43%.
So, if the area for the interval from -0.65 and +0.65 is 48.43%, then the range from 0 to +0.65 must be half of that:
48.43/2 = 24.215%.
Therefore, the area for all scores up to 0.65 = 50% + 24.215% = 74.215%
Example:
We measure the heights of 40 randomly chosen men, and get a mean height of 175cm, We also know the standard
deviation of men's heights is 20cm. What is the confidence interval at 95%?
Step 2: decide what Confidence Interval we want: 95% or 99% are common choices. Then find the "Z" value for that
Confidence Interval here:
Confidence Z
Interval
80% 1.282
85% 1.440
90% 1.645
95% 1.960
99% 2.576
99.5% 2.807
99.9% 3.291
Step 3: use that Z value in this formula for the Confidence Interval
s
x̅ ± Z
n
Where:
• X is the mean
• Z is the chosen Z-value from the table above
• s is the standard deviation
• n is the number of observations
And we have:
175 ± 1.960 × 20√40
Which is:
175cm ± 6.20cm