0% found this document useful (0 votes)
7 views

statistics notes part-2

The document explains probability distributions, detailing both discrete and continuous types, with examples such as rolling dice and normal distribution. It highlights the properties of normal distribution, the empirical rule, and the significance of z-scores in calculating probabilities. Additionally, it introduces the Central Limit Theorem and standard error, emphasizing their importance in statistical analysis.

Uploaded by

Logess War
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views

statistics notes part-2

The document explains probability distributions, detailing both discrete and continuous types, with examples such as rolling dice and normal distribution. It highlights the properties of normal distribution, the empirical rule, and the significance of z-scores in calculating probabilities. Additionally, it introduces the Central Limit Theorem and standard error, emphasizing their importance in statistical analysis.

Uploaded by

Logess War
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 24

Probability Distribution

Probability Distribution is a function that shows all the possible values a variable can take and how often
it occurs.

Probability Distribution is defined by the underlying probability and graph is just the visual
representation.

Let’s take an example and find out probability distribution of all possible outcomes of rolling a die:

Possible
Outcomes Probability
1 1/6
2 1/6
3 1/6
4 1/6
5 1/6
6 1/6

The above table represents the probability distribution of the variable (possible outcomes).

When we plot the above table, we get the probability distribution graph as below.
Let’s take another example and find the probability distribution of getting sums of rolling two dice: -

Possible outcomes =
{(1,1),(1,2),(1,3),(1,4),(1,5),(1,6),(2,1),(2,2),(2,3),(2,4),(2,5),(2,6),(3,1),(3,2),(3,3),(3,4),(3,5),(3,6),(4,1),(4,2)
,(4,3),(4,4),(4,5),(4,6),(5,1),(5,2),(5,3),(5,4),(5,5),(5,6),(6,1),(6,2),(6,3),(6,4),(6,5),(6,6)}

Number of outcomes = 36

Possible
Sums Occurrence Probability
2 1 0.03
3 2 0.06
4 3 0.08
5 4 0.11
6 5 0.14
7 6 0.17
8 5 0.14
9 4 0.11
10 3 0.08
11 2 0.06
12 1 0.03

This table represents the Probability distribution of different outcomes and if we plot the above table,
we get the below Probability distribution graph.
Probability Distributions: Discrete vs. Continuous

Variables can be of two types discreet and continuous.

For example, when we toss a coin or roll a die, we know the possible outcome will be 1 head, 2 tails or in
case of die 1,2 ,6 etc. We can never get values like 1.5,3.2. Such variables are types of Discret variables.

On the other hand, let’s say criteria for selection in basket ball team is that one should have a height
between 170 cm to 200 cm. Here height will be a type of continuous variable as one can take any value
between 170 and 200.

Discrete Probability Distribution


If a random variable is discret, its probability distribution is said to be Discrete Probability Distribution.

Different types of Continuous Probability Distribution

1) Binomial Distribution
2) Poisson distribution
3) Bernoulli distribution

The examples we took above for rolling a die is a type of Discrete Probability Distribution.

Continuous Probability Distribution


Similarly, when a random variable is continuous, its probability distribution is said to be continuous
Probability Distribution.

In a continuous probability distribution, unlike a discrete probability distribution, the probability that a
continuous random variable will assume a particular value is zero. Thus, we cannot express a continuous
probability distribution in tabular form. We describe it using an equation or a formula also known as
Probability Density Function (pdf).

For a continuous probability distribution, the probability density function has the following properties:

• The graph of the density function will always be continuous over the range in which the random
variable is defined.
• The area bounded by the curve of the density function and the x-axis is equal to 1, when
computed over the domain of the variable.
• The probability that a random variable assumes a value between a and b is equal to the area
under the density function bounded by a and b.
Different types of Continuous Probability Distribution

4) Normal Distribution
5) Student’s T distribution
6) Chi Squared distribution.

Normal Distribution

Normal distribution, also known as the Gaussian distribution, is a continuous probability distribution
that is symmetric about the mean, showing that data near the mean are more frequent in occurrence
than data far from the mean. In graph form, normal distribution will appear as a bell curve (as in the
figure below).

The probability density function for Normal distribution is given as:


Given a variable x(a), the probability of the random variable X, which follows a normal distribution, is
less than or equal to x(a) is given as:

Here we integrate the pdf function to get the cumulative probability.

Note: For all the below examples we will use the “Normal Distribution Calculator” as it is very easy for
calculation purposes. We can do the calculations with the above given formula also but it’s not
necessary. We will learn how to solve problems using z-score table in the next section.

Examples:

Q) The Light Bulb Company has found that an average light bulb lasts 900 hours with a standard
deviation of 80 hours. Assuming that bulb life is normally distributed. What is the probability that a
randomly selected light bulb will burn out in 1000 hours or less?

Answer:

Mean = 900

Standard deviation = 80

x(a) = 1000

we have to use the above given formula for finding the probability. But for the time being we are taking
help of the Normal distribution calculator (shown below).

Putting the above values, we get a probability of:

P(X<=1000) = 89.4% i.e. there is 89.4% chance that the bulb will burnout within 1000 hours.
Q) Suppose scores on a mathematics test are normally distributed. If the test has a mean of 55 and a
standard deviation of 10, what is the probability that a person who takes the test will score between
45 and 65?

Ans.

Here we need to find the probability for 45<= x(a) <=65.

If we find the cumulative probability for x<=45 and cumulative probability for x<=65, we can subtract
them to find the required probability.

Mean = 55

Standard deviation = 10

Again, using the normal distribution calculator, we find both the values as:

P(x(a)< = 45) = 0.159

P(x(a)< = 65) = 0.841

So, P (45< X(a) <65) = 0.841 – 0.159

= 0.682

So, we have 68.2% probability of a person taking test to score between 45 and 65 marks

-----------------------------------------------------------------------------------------------------------------

The Normal Distribution has following properties:


1) mean = median = mode
2) symmetry about the center
3) 50% of values less than the mean and 50% greater than the mean
4) The probability that X is greater than ‘a’ is equal to the area under the normal curve as
shown by the non-shaded area in the figure below.
5) The probability that X is less than ‘a’ is equal to the area under the normal curve as
shown by the shaded area in the figure below.
Let’s see some real-life examples which follows normal distribution

When we talk about the heights or weights of people in the world, it is seen that it follows a normal
distribution. Why does it seem obvious? It is simple because it is more probable to find people with
heights near the average rather finding very short heighted or very tall people.

Just look around in your class, you will find the majority of people will fall in the range near the average
height of the class.

Similar is the case with the weights and IQ of people.

Distribution of wealth is another example of Normal Distribution. Most of the people fall in the average
wealth category. (“Middle class”).

The empirical rule (Three Sigma Rule)


It states that for a Normal Distribution, nearly all of the data will fall within three range of standard
deviations of the mean. The empirical rule can be understood when broken down into three parts:

1) 68% of the data falls within the first standard deviation from the mean.

2) 95% fall within two standard deviations.

3) 99.7% fall within three standard deviations.


Why do we use Normal Distribution?

1) Distributions of sample means with large sample sizes can be approximated t0 normal
distribution
2) Decisions based on normal distribution insights have proven to be of good value
3) All computable statistics are elegant
4) It approximates a wide variety of random variables

Standard Normal Distribution

The standard normal distribution is a special case of the normal distribution. It is the distribution that
occurs when a normal random variable has a mean of zero and a standard deviation of one.

The normal random variable of a standard normal distribution is called a standard score or a z score.

Every normal random variable X can be transformed into a z score via the following equation:

z = (X - μ) / σ

where, X is a normal random variable,

μ is the mean, and σ is the standard deviation.


Steps to transform into Standard N-Distribution

Suppose we have a dataset with elements

X = [1,2,2,3,3,4,4]

we can see the data is normally distributed.

steps:

1) mean= 2.71

2) std_deviation = 1.11

3) transform to z scores using z= (x-mean)/std_deviation

4)After transforming X = {-1.54,-0.64,-0.64,0.26,0.26,1.16,1.16}


Plotting new data:

mean=0, standard deviation=1

Let’s take another example and see how the transformation works:

Given a data set, we were provided with the initial data, we transformed the data and found the z score
as shown in the below table.

Initial_ Data Initial data - mean z-score =(Initial_data - mean)/St. Deviation


45.363 17.295 1.727449369
33.435 5.367 0.536092603
17.713 -10.355 -1.034292115
26.019 -2.049 -0.204671347
27.508 -0.560 -0.055885355
11.194 -16.874 -1.685409951
19.894 -8.174 -0.816445735
23.133 -4.935 -0.492868567
29.126 1.058 0.105720916
30.427 2.359 0.235629855
44.729 16.661 1.664114735
7.393 -20.675 -2.065026984
26.983 -1.085 -0.108386711
30.616 2.548 0.254473261
32.662 4.594 0.458842761
23.013 -5.055 -0.504921837
30.949 2.881 0.287795782
37.189 9.121 0.910991079
21.990 -6.078 -0.607100949
42.025 13.957 1.393988737

mean =28.068 mean= 0


St. deviation
=10.012 St. Deviation= 1
Let’s see the plots for both the initial data and transformed data:

Initial data Transformed data

We can see from the graphs that our normal distribution when transformed has zero mean and
deviation of 1.

z score
z-score is a measure of position that indicates the number of standard deviations a data value lies from
the mean. z-scores may be positive or negative, with a positive value indicating the score is above the
mean and a negative score indicating it is below the mean.

z score is very powerful tool to find the probability distribution using the z-score table.

We do not need the normal distribution calculator and will simply use z-score table.

Z-score table:
let’s see the application of z-score table and see how to use it:

We will use the same example from above where we used the Normal Distribution calculator, let’s try to
calculate the same using z-score table and compare the results.

Q) The Light Bulb Company has found that an average light bulb lasts 900 hours with a standard
deviation of 80 hours. Assuming that bulb life is normally distributed. What is the probability that a
randomly selected light bulb will burn out in 1000 hours or less?

Answer: let’s convert our data in standard normal form.

Mean = 900

Std. deviation = 80

x(a) = 1000

standardized x(a) = (1000-900)/80 = 1.25

z-score = 1.25 = 1.2 + 0.05 (in the table we will match the value corresponding to 1.2 and 0.05)

Let’s use the z-score table for this:

We found that that Probability coming as 89.44 %.

This is exactly same as we found out with the Normal distribution calculator.

Thus, if we standardize a normal distribution, z-score becomes a very important tool in helping finding
the probability distribution.
Q) Ravi scored 980 in a Physics Olympiad. The mean test score was 870 with a standard deviation of
120. How many students scored more than Ravi? (Assume that test scores are normally distributed.)

Answer: Let’s standardize the test score

Mean = 870

St. deviation= 120

z-score = (980-870)/120 = 0.917 (we will approximate it to 0.92 )

So, P(x<=980) = 0.8212

We need to find the probability of scoring more than 980,

P(x>980) = 1 – 0.8212 = 0.1788

Thus, we can estimate 17.88% students scored more than Ravi in the test.

Note: When you encounter a negative z score, you can use a negative z score table, or find the value
for positive value of the z-score and subtract it from 1.

e.g. p (-2.5) = 1 – p (2.5) = 1 – 0.99379 = 0.00621, which same as the value you will find in negative z
score table.
Central Limit Theorem

In the study of probability theory, the central limit theorem (CLT) states that the distribution of sample
means approximates a normal distribution (also known as a “bell curve”), as the sample size becomes
larger, assuming that all samples are identical in size, and regardless of the population distribution
shape.

CLT is a statistical theory stating that given a sufficiently large sample size from a population with a finite
level of variance, the mean of all samples from the same population will be approximately equal to the
mean of the population. Furthermore, all the samples will follow an approximate normal distribution
pattern, with all variances being approximately equal to the variance of the population, divided by each
sample's size. The samples extracted should be bigger than 30 observations.

Let’s visualize the CLT with few data sets with different sample sizes:

x= [9,2,1]

x2= [6,6,8,3,8]

x3= [5,3,6,4,7,2,6,9,7,1,1,7]

x4 = [8,1,7,1,4,3,1,7,8,9,8,3,1,6,8,3,4]
Plotting the distribution graph for above samples with different sample size

We can clearly see that among the number of samples extracted from the population, as the sample size
increases, sample moves closer to a Normal Distribution.

We will talk more about Central limit theorem and see it’s uses, but first let’s discuss one more
important concept.

Standard Error

The standard error (SE) of a statistic is the approximate standard deviation of a statistical sample
population. The standard error is a statistical term that measures the accuracy with which a sample
distribution represents a population by using standard deviation. In statistics, a sample mean deviates
from the actual mean of a population—this deviation is the standard error of the mean.

When a population is sampled, the mean, or average, is generally calculated. The standard error can
include the variation between the calculated mean of the population and one which is considered
known, or accepted as accurate. This helps compensate for any incidental inaccuracies related to the
gathering of the sample.
In cases where multiple samples are collected, the mean of each sample may vary slightly from the
others, creating a spread among the variables. This spread is most often measured as the standard
error, accounting for the differences between the means across the datasets.

The more data points involved in the calculations of the mean, the smaller the standard error tends to
be. When the standard error is small, the data is said to be more representative of the true mean. In
cases where the standard error is large, the data may have some notable irregularities.

The standard deviation is a representation of the spread of each of the data points. The standard
deviation is used to help determine the validity of the data based on the number of data points
displayed at each level of standard deviation. Standard errors function more as a way to determine the
accuracy of the sample or the accuracy of multiple samples by analyzing deviation within the means.

Standard Error is given by the following formula:

Here σ is the standard deviation of the population, whereas σ(u) is the standard deviation of the sample.

We can see that as the size of our sample increases the Standard error decrease.

Now, that we know the Standard error, let’s rephrase our Central Limit theorem as:

The central limit theorem states that the sample mean follows approximately the normal distribution
with mean(μ) and standard deviation (σ/√n), where μ and σ are the mean and standard deviation of
the population from where the sample was selected. The sample size n has to be large (usually n≥30)
if the population from where the sample is taken is non normal.

So, when we transform our sample data, we will use following formula for the z-score:

z = (X - μ) / (σ/√n)

where, X is the sample mean,

μ is the mean of the population,

and σ is the standard deviation of the population.


Let’s see an example based on the above explanation.

Q) Let X be a random variable with μ= 10 and σ= 4. A sample of size 100 is taken from this population.
Find the probability that the sample mean of these 100 observations is less than 9.

Ans: population mean = 10 population std. deviation = 4 sample size(n) = 100

Sample mean = 9

z= (9-10)/ (4/ (100) ^0.5) = -2.5

We will use the z-score table and find the value to be 0.0062

P(X<9) = 0.0062

Q) A large freight elevator can transport a maximum of 9800 pounds. Suppose a load of cargo
containing 49 boxes must be transported via the elevator. Experience has shown that the weight of
boxes of this type of cargo follows a distribution with mean= 205 pounds and standard deviation = 15
pounds. Based on this information, what is the probability that all 49 boxes can be safely loaded onto
the freight elevator and transported?

Ans: For all the boxes to be loaded the total weight must be at most 9800.

So, the sample mean should be = 9800/49 = 200 sample size(n) = 49

Population mean = 205

Std. deviation = 15

z-score = (200-205)/(15/(49)^0.5) = -2.33

using z-score table:


P(X<200) = 0.0099

Bernoulli Distribution

It is a type of Discrete Probability distribution. The Bernoulli distribution essentially models a single trial
of flipping a weighted coin. It is the probability distribution of a random variable taking on only two
values, 1 ("success") and 0 ("failure") with complementary probabilities p and 1−p respectively. The
Bernoulli distribution therefore describes events having exactly two outcomes, which are present in real
life.

Suppose We have a single trial of with only two possible outcomes success or failure:

P(Success) = p

P(Failure)= 1-p

Let, X=1 when Success and X=0 when failure,

Then the probability distribution function is given as:

for x= (0,1)

So, P (1) = p^1 * (1-p) ^0 = p

P (0) = P^0*(1-p) ^1 = 1-p

A simple graphical representation of Bernoulli’s distribution will look like this.


Here, p =0.3

The Expected value (mean) for the Bernoulli’s distribution is given as:

The variance for the Bernoulli’s distribution is given as:

Some real-life cases that follows a Bernoulli distribution:

1) Results of Exam (Pass or Fail)


2) Gender of New born baby (Male or Female)
3) Result of Cricket World Cup (Win or Lose)
4) Tossing a coin (Heads or Tails)

We will see some more examples when Bernoulli trial is repeated for many times.

Binomial Distribution

A binomial experiment is a series of n Bernoulli trials, whose outcomes are independent of each other. A
random variable, X, is defined as the number of successes in a binomial experiment.

For example, consider a fair coin. Flipping the coin once is a Bernoulli trial, since there are exactly two
complementary outcomes (flipping a head and flipping a tail), and they are both 1/2 no matter how
many times the coin is flipped. Note that the fact that the coin is fair is not necessary; flipping a
weighted coin is still a Bernoulli trial.

A binomial experiment might consist of flipping the coin 100 times, with the resulting number of heads
being represented by the random variable X. The binomial distribution of this experiment is the
probability distribution of X.

If X is the number of success in a given Bernoulli trial with n independent trials, with probability of
success being p and probability of failure being 1-p, then for exactly k success in the experiment, the
probability distribution is given as:
P(X=k) =

Here (n k) is the numner of ways of choosing k from n also written as C(n,k).

Q) Let’s flip a coin 6 times with probability of getting a tail be 0.3. Let’s write the binomial distribution
for this experiment.

Ans:

Possible outcomes(k) Probability(X=k) Binomial Distribution Probability(X=k)


0 tail P(X=0) C(6,0) * (0.30^0)*(0.70)^6 0.118
1 tail P(X=1) C(6,1) * (0.30^1)*(0.70)^5 0.302
2 tail P(X=2) C(6,2) * (0.30^2)*(0.70)^4 0.324
3 tail P(X=3) C(6,3) * (0.30^3)*(0.70)^3 0.185
4 tail P(X=4) C(6,4) * (0.30^4)*(0.70)^2 0.06
5 tail P(X=5) C(6,5) * (0.30^5)*(0.70)^1 0.01
6 tail P(X=6) C(6,6) * (0.30^6)*(0.70)^0 0.0007

**How to calculate C(6,1) = 6!/[(6-1)! * 1!]

So looking at the above table we can find probability of obtaining k outcomes.

e.g. what is the probability of getting exactly 2 tails in above experiment

P(X=2) = 0.324

Let’s plot the above table,


Q) A basketball player takes 5 independent free throws with a probability of 0.65 of getting a basket
on each shot. Let X=the number of baskets he gets. Show the probability distribution for X.

Solution:

Possible outcomes(k) Probability(X=k) Binomial Distribution Probability(X=k)


0 basket P(X=0) C(5,0) * (0.65^0)*(0.35)^5 0.0052
1 basket P(X=1) C(5,1) * (0.65^1)*(0.35)^4 0.049
2 baskets P(X=2) C(5,2) * (0.65^2)*(0.35)^3 0.181
3 baskets P(X=3) C(5,3) * (0.65^3)*(0.35)^2 0.336
4 baskets P(X=4) C(5,4) * (0.65^4)*(0.35)^1 0.312
5 baskets P(X=5) C(5,5) * (0.65^5)*(0.35)^0 0.116

Probability distribution Graph:

Mean and variance for Binomial Distribution

Mean= n*p

Variance = n* p(1-p),

where n is the number of trials, p is probability of success and 1-p is probability of failure
Poisson Distribution

The Poisson distribution is the discrete probability distribution of the number of events occurring in a
given time period, given the average number of times the event occurs over that time period.

Example

A certain car wash shop gets an average of 3 visitors to the center per hour. This is just an average,
however. The actual amount can vary.

A Poisson distribution can be used to analyze the probability of various events regarding how many
customers go to the center. It can allow one to calculate the probability of a dull activity (when there are
0 customers coming) as well as the probability of a high activity (when there are 5 or more customers
coming). This information can, in turn, help the owner to plan for these events with staffing and
scheduling.

If X is the number of events observed over a given time period, then probability of observing k events
over the time period is:

The Poisson distribution is often used as an approximation for binomial probabilities when n is large and
p is small.
Q) In a coffee shop, the average number of customers per hour is 2. Find the probability of getting k
number of customers in the shop.

Let’s plot the probability distribution:

We can clearly see that probability of getting number customers starts declining after 6.
Q) Suppose the average number of elephants seen on a 1-day safari is 6. What is the probability that
tourists will see fewer than 4 elephants on the next 1-day safari?

Solution:

Number of Elephants Probability(X=k) Poisson Distribution


0 P(X=0) 0.0025
1 P(X=1) 0.0149
2 P(X=2) 0.0446
3 P(X=3) 0.0892
4 P(X=4) 0.1339
Mean = 6

We need values P(X<4) = P(X<=3) = P(X=0) + P(X=1) + P(X=2) + P(X=3)

= 0.0025+0.0149+0.0446+0.0892
= 0.1512

Some applications that obey a Poisson distribution is below:

a. the number of mutations on a given strand of DNA per time unit


b. the number of bankruptcies that are filed in a month
c. the number of arrivals at a car wash in one hour
d. the number of network failures per day
e. the number of file server virus infection at a data center during a 24-hour period
f. the number of Airbus 330 aircraft engine shutdowns per 100,000 flight hours
g. the number of asthma patient arrivals in a given hour at a walk-in clinic
h. the number of hungry persons entering McDonald's restaurant per day

You might also like