statistics notes part-2
statistics notes part-2
Probability Distribution is a function that shows all the possible values a variable can take and how often
it occurs.
Probability Distribution is defined by the underlying probability and graph is just the visual
representation.
Let’s take an example and find out probability distribution of all possible outcomes of rolling a die:
Possible
Outcomes Probability
1 1/6
2 1/6
3 1/6
4 1/6
5 1/6
6 1/6
The above table represents the probability distribution of the variable (possible outcomes).
When we plot the above table, we get the probability distribution graph as below.
Let’s take another example and find the probability distribution of getting sums of rolling two dice: -
Possible outcomes =
{(1,1),(1,2),(1,3),(1,4),(1,5),(1,6),(2,1),(2,2),(2,3),(2,4),(2,5),(2,6),(3,1),(3,2),(3,3),(3,4),(3,5),(3,6),(4,1),(4,2)
,(4,3),(4,4),(4,5),(4,6),(5,1),(5,2),(5,3),(5,4),(5,5),(5,6),(6,1),(6,2),(6,3),(6,4),(6,5),(6,6)}
Number of outcomes = 36
Possible
Sums Occurrence Probability
2 1 0.03
3 2 0.06
4 3 0.08
5 4 0.11
6 5 0.14
7 6 0.17
8 5 0.14
9 4 0.11
10 3 0.08
11 2 0.06
12 1 0.03
This table represents the Probability distribution of different outcomes and if we plot the above table,
we get the below Probability distribution graph.
Probability Distributions: Discrete vs. Continuous
For example, when we toss a coin or roll a die, we know the possible outcome will be 1 head, 2 tails or in
case of die 1,2 ,6 etc. We can never get values like 1.5,3.2. Such variables are types of Discret variables.
On the other hand, let’s say criteria for selection in basket ball team is that one should have a height
between 170 cm to 200 cm. Here height will be a type of continuous variable as one can take any value
between 170 and 200.
1) Binomial Distribution
2) Poisson distribution
3) Bernoulli distribution
The examples we took above for rolling a die is a type of Discrete Probability Distribution.
In a continuous probability distribution, unlike a discrete probability distribution, the probability that a
continuous random variable will assume a particular value is zero. Thus, we cannot express a continuous
probability distribution in tabular form. We describe it using an equation or a formula also known as
Probability Density Function (pdf).
For a continuous probability distribution, the probability density function has the following properties:
• The graph of the density function will always be continuous over the range in which the random
variable is defined.
• The area bounded by the curve of the density function and the x-axis is equal to 1, when
computed over the domain of the variable.
• The probability that a random variable assumes a value between a and b is equal to the area
under the density function bounded by a and b.
Different types of Continuous Probability Distribution
4) Normal Distribution
5) Student’s T distribution
6) Chi Squared distribution.
Normal Distribution
Normal distribution, also known as the Gaussian distribution, is a continuous probability distribution
that is symmetric about the mean, showing that data near the mean are more frequent in occurrence
than data far from the mean. In graph form, normal distribution will appear as a bell curve (as in the
figure below).
Note: For all the below examples we will use the “Normal Distribution Calculator” as it is very easy for
calculation purposes. We can do the calculations with the above given formula also but it’s not
necessary. We will learn how to solve problems using z-score table in the next section.
Examples:
Q) The Light Bulb Company has found that an average light bulb lasts 900 hours with a standard
deviation of 80 hours. Assuming that bulb life is normally distributed. What is the probability that a
randomly selected light bulb will burn out in 1000 hours or less?
Answer:
Mean = 900
Standard deviation = 80
x(a) = 1000
we have to use the above given formula for finding the probability. But for the time being we are taking
help of the Normal distribution calculator (shown below).
P(X<=1000) = 89.4% i.e. there is 89.4% chance that the bulb will burnout within 1000 hours.
Q) Suppose scores on a mathematics test are normally distributed. If the test has a mean of 55 and a
standard deviation of 10, what is the probability that a person who takes the test will score between
45 and 65?
Ans.
If we find the cumulative probability for x<=45 and cumulative probability for x<=65, we can subtract
them to find the required probability.
Mean = 55
Standard deviation = 10
Again, using the normal distribution calculator, we find both the values as:
= 0.682
So, we have 68.2% probability of a person taking test to score between 45 and 65 marks
-----------------------------------------------------------------------------------------------------------------
When we talk about the heights or weights of people in the world, it is seen that it follows a normal
distribution. Why does it seem obvious? It is simple because it is more probable to find people with
heights near the average rather finding very short heighted or very tall people.
Just look around in your class, you will find the majority of people will fall in the range near the average
height of the class.
Distribution of wealth is another example of Normal Distribution. Most of the people fall in the average
wealth category. (“Middle class”).
1) 68% of the data falls within the first standard deviation from the mean.
1) Distributions of sample means with large sample sizes can be approximated t0 normal
distribution
2) Decisions based on normal distribution insights have proven to be of good value
3) All computable statistics are elegant
4) It approximates a wide variety of random variables
The standard normal distribution is a special case of the normal distribution. It is the distribution that
occurs when a normal random variable has a mean of zero and a standard deviation of one.
The normal random variable of a standard normal distribution is called a standard score or a z score.
Every normal random variable X can be transformed into a z score via the following equation:
z = (X - μ) / σ
X = [1,2,2,3,3,4,4]
steps:
1) mean= 2.71
2) std_deviation = 1.11
Let’s take another example and see how the transformation works:
Given a data set, we were provided with the initial data, we transformed the data and found the z score
as shown in the below table.
We can see from the graphs that our normal distribution when transformed has zero mean and
deviation of 1.
z score
z-score is a measure of position that indicates the number of standard deviations a data value lies from
the mean. z-scores may be positive or negative, with a positive value indicating the score is above the
mean and a negative score indicating it is below the mean.
z score is very powerful tool to find the probability distribution using the z-score table.
We do not need the normal distribution calculator and will simply use z-score table.
Z-score table:
let’s see the application of z-score table and see how to use it:
We will use the same example from above where we used the Normal Distribution calculator, let’s try to
calculate the same using z-score table and compare the results.
Q) The Light Bulb Company has found that an average light bulb lasts 900 hours with a standard
deviation of 80 hours. Assuming that bulb life is normally distributed. What is the probability that a
randomly selected light bulb will burn out in 1000 hours or less?
Mean = 900
Std. deviation = 80
x(a) = 1000
z-score = 1.25 = 1.2 + 0.05 (in the table we will match the value corresponding to 1.2 and 0.05)
This is exactly same as we found out with the Normal distribution calculator.
Thus, if we standardize a normal distribution, z-score becomes a very important tool in helping finding
the probability distribution.
Q) Ravi scored 980 in a Physics Olympiad. The mean test score was 870 with a standard deviation of
120. How many students scored more than Ravi? (Assume that test scores are normally distributed.)
Mean = 870
Thus, we can estimate 17.88% students scored more than Ravi in the test.
Note: When you encounter a negative z score, you can use a negative z score table, or find the value
for positive value of the z-score and subtract it from 1.
e.g. p (-2.5) = 1 – p (2.5) = 1 – 0.99379 = 0.00621, which same as the value you will find in negative z
score table.
Central Limit Theorem
In the study of probability theory, the central limit theorem (CLT) states that the distribution of sample
means approximates a normal distribution (also known as a “bell curve”), as the sample size becomes
larger, assuming that all samples are identical in size, and regardless of the population distribution
shape.
CLT is a statistical theory stating that given a sufficiently large sample size from a population with a finite
level of variance, the mean of all samples from the same population will be approximately equal to the
mean of the population. Furthermore, all the samples will follow an approximate normal distribution
pattern, with all variances being approximately equal to the variance of the population, divided by each
sample's size. The samples extracted should be bigger than 30 observations.
Let’s visualize the CLT with few data sets with different sample sizes:
x= [9,2,1]
x2= [6,6,8,3,8]
x3= [5,3,6,4,7,2,6,9,7,1,1,7]
x4 = [8,1,7,1,4,3,1,7,8,9,8,3,1,6,8,3,4]
Plotting the distribution graph for above samples with different sample size
We can clearly see that among the number of samples extracted from the population, as the sample size
increases, sample moves closer to a Normal Distribution.
We will talk more about Central limit theorem and see it’s uses, but first let’s discuss one more
important concept.
Standard Error
The standard error (SE) of a statistic is the approximate standard deviation of a statistical sample
population. The standard error is a statistical term that measures the accuracy with which a sample
distribution represents a population by using standard deviation. In statistics, a sample mean deviates
from the actual mean of a population—this deviation is the standard error of the mean.
When a population is sampled, the mean, or average, is generally calculated. The standard error can
include the variation between the calculated mean of the population and one which is considered
known, or accepted as accurate. This helps compensate for any incidental inaccuracies related to the
gathering of the sample.
In cases where multiple samples are collected, the mean of each sample may vary slightly from the
others, creating a spread among the variables. This spread is most often measured as the standard
error, accounting for the differences between the means across the datasets.
The more data points involved in the calculations of the mean, the smaller the standard error tends to
be. When the standard error is small, the data is said to be more representative of the true mean. In
cases where the standard error is large, the data may have some notable irregularities.
The standard deviation is a representation of the spread of each of the data points. The standard
deviation is used to help determine the validity of the data based on the number of data points
displayed at each level of standard deviation. Standard errors function more as a way to determine the
accuracy of the sample or the accuracy of multiple samples by analyzing deviation within the means.
Here σ is the standard deviation of the population, whereas σ(u) is the standard deviation of the sample.
We can see that as the size of our sample increases the Standard error decrease.
Now, that we know the Standard error, let’s rephrase our Central Limit theorem as:
The central limit theorem states that the sample mean follows approximately the normal distribution
with mean(μ) and standard deviation (σ/√n), where μ and σ are the mean and standard deviation of
the population from where the sample was selected. The sample size n has to be large (usually n≥30)
if the population from where the sample is taken is non normal.
So, when we transform our sample data, we will use following formula for the z-score:
z = (X - μ) / (σ/√n)
Q) Let X be a random variable with μ= 10 and σ= 4. A sample of size 100 is taken from this population.
Find the probability that the sample mean of these 100 observations is less than 9.
Sample mean = 9
We will use the z-score table and find the value to be 0.0062
P(X<9) = 0.0062
Q) A large freight elevator can transport a maximum of 9800 pounds. Suppose a load of cargo
containing 49 boxes must be transported via the elevator. Experience has shown that the weight of
boxes of this type of cargo follows a distribution with mean= 205 pounds and standard deviation = 15
pounds. Based on this information, what is the probability that all 49 boxes can be safely loaded onto
the freight elevator and transported?
Ans: For all the boxes to be loaded the total weight must be at most 9800.
Std. deviation = 15
Bernoulli Distribution
It is a type of Discrete Probability distribution. The Bernoulli distribution essentially models a single trial
of flipping a weighted coin. It is the probability distribution of a random variable taking on only two
values, 1 ("success") and 0 ("failure") with complementary probabilities p and 1−p respectively. The
Bernoulli distribution therefore describes events having exactly two outcomes, which are present in real
life.
Suppose We have a single trial of with only two possible outcomes success or failure:
P(Success) = p
P(Failure)= 1-p
for x= (0,1)
The Expected value (mean) for the Bernoulli’s distribution is given as:
We will see some more examples when Bernoulli trial is repeated for many times.
Binomial Distribution
A binomial experiment is a series of n Bernoulli trials, whose outcomes are independent of each other. A
random variable, X, is defined as the number of successes in a binomial experiment.
For example, consider a fair coin. Flipping the coin once is a Bernoulli trial, since there are exactly two
complementary outcomes (flipping a head and flipping a tail), and they are both 1/2 no matter how
many times the coin is flipped. Note that the fact that the coin is fair is not necessary; flipping a
weighted coin is still a Bernoulli trial.
A binomial experiment might consist of flipping the coin 100 times, with the resulting number of heads
being represented by the random variable X. The binomial distribution of this experiment is the
probability distribution of X.
If X is the number of success in a given Bernoulli trial with n independent trials, with probability of
success being p and probability of failure being 1-p, then for exactly k success in the experiment, the
probability distribution is given as:
P(X=k) =
Q) Let’s flip a coin 6 times with probability of getting a tail be 0.3. Let’s write the binomial distribution
for this experiment.
Ans:
P(X=2) = 0.324
Solution:
Mean= n*p
Variance = n* p(1-p),
where n is the number of trials, p is probability of success and 1-p is probability of failure
Poisson Distribution
The Poisson distribution is the discrete probability distribution of the number of events occurring in a
given time period, given the average number of times the event occurs over that time period.
Example
A certain car wash shop gets an average of 3 visitors to the center per hour. This is just an average,
however. The actual amount can vary.
A Poisson distribution can be used to analyze the probability of various events regarding how many
customers go to the center. It can allow one to calculate the probability of a dull activity (when there are
0 customers coming) as well as the probability of a high activity (when there are 5 or more customers
coming). This information can, in turn, help the owner to plan for these events with staffing and
scheduling.
If X is the number of events observed over a given time period, then probability of observing k events
over the time period is:
The Poisson distribution is often used as an approximation for binomial probabilities when n is large and
p is small.
Q) In a coffee shop, the average number of customers per hour is 2. Find the probability of getting k
number of customers in the shop.
We can clearly see that probability of getting number customers starts declining after 6.
Q) Suppose the average number of elephants seen on a 1-day safari is 6. What is the probability that
tourists will see fewer than 4 elephants on the next 1-day safari?
Solution:
= 0.0025+0.0149+0.0446+0.0892
= 0.1512