0% found this document useful (0 votes)

441 views

Introduction To Probability

Uploaded by

mathewsujith31

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

441 views

Introduction To Probability

Uploaded by

mathewsujith31

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 88

INTRODUCTION TO PROBABILITY

©–
If you torture the data long enough, it will confess!
- Ronald Coase

©–
Probability Theory - Terminologies

©–
Random Experiment

• Random experiment is an experiment in which the outcome

is not known with certainty.

• Predictive analysis mainly deals with random experiment

like:
– Predicting quarterly revenue of an organization
– Customer churn
– Demand for a product at future time period etc.

–
Sample Space
• It is the universal set that consist of all possible
outcomes of an experiment.

• It is represented using letter “S”

• Individual outcomes are called elementary events

• Sample Space can be finite or infinite.

–
Event
• Event(E) is a subset of a sample space and probability is
usually calculated with respect to an event.

• The Venn diagram indicates that the event E is a subset

of the sample space S, that is, E  S (E is a subset of S)

–
Probability Estimation using Relative
Frequency
• The classical approach to probability estimation of an event is
based on the relative frequency of the occurrence of that event

• According to frequency estimation, the probability of an event

X, P(X), is given by

Number of observations in favour of event X n( X )

P( X )  
Total number of observations N

–
Example 3.1
A website displays 10 advertisements and the revenue generated by the website
depends on the number of visitors to the site clicking on any of the advertisements
displayed in the website. The data collected by the company has revealed that out
of 2500 visitors, 30 people clicked on 1 advertisement, 15 clicked on 2
advertisements, and 5 clicked on 3 advertisements. Remaining did not click on any
of the advertisements. Calculate

(a) The probability that a visitor to the website will click on an advertisement.
(b) The probability that the visitor will click on at least two advertisements.
(c) The probability that a visitor will not click on any advertisements.

–
Solution
(a) Number of customers clicking an advertisement is 50 and the
total number of visitors is 2500. Thus, the probability that a
visitor to the website will click on an advertisement is
50
 0.02
2500

(b) Number of customers clicking on at least 2 advertisements is

20. Thus, the probability that a visitor will click on at least 2
advertisements is 20
 0.008
2500

(c) Probability that a visitor will not click on any advertisement

is
2450
 0.98
2500

–
Algebra of Events
• Assume that X, Y and Z are three events of a sample space. Then the
following algebraic relationships are valid and are useful while deriving
probabilities of events:

• Commutative rule: X  Y = Y  X and X  Y = Y  X

• Associative rule: (X  Y)  Z = X  (Y  Z) and (X  Y)  Z = X  (Y  Z)

• Distributive rule: X  (Y  Z) = (X  Y)  (X  Z)
X  (Y  Z) = (X  Y)  (X  Z)

–
Contd…

• The following rules known as DeMorgan’s Laws on

complementary sets are useful while deriving
probabilities:
(X  Y)C = XC  YC
(X  Y)C = XC  YC

where XC and YC are the complementary events of X and

Y, respectively

–
Axioms of Probability

According to axiomatic theory of probability, the probability of an

event E satisfies the following axioms

1. The probability of event E always lies between 0 and 1. That

is, 0  P(E) 1.

2. The probability of the universal set S is 1. That is, P(S) = 1

3. P(X  Y) = P(X) + P(Y), where X and Y are two mutually

exclusive events.

–
The elementary rules of probability are directly deduced from the original three
axioms of probability, using the set theory relationships

1. For any event A, the probability of the complementary event, written AC, is given
by

P(A) = 1 – P(AC)

If P(A) is a probability of observing a fraudulent transaction at an e-commerce

portal, then P(AC) is the probability of observing a genuine transaction.

2. The probability of an empty or impossible event, , is zero:

P( )  0

–
3. If occurrence of an event A implies that an event B also occurs, so that the
event class A is a subset of event class B, then the probability of A is less than
or equal to the probability of B:
P ( A)  P ( B )
4. The probability that either events A or B occur or both occur is given by
P( A  B )  P ( A)  P( B)  P ( A  B)
5. If A and B are mutually exclusive events, so that P( A  B) , 0then

P ( A  B)  P ( A)  P ( B )
6. If A1, A2, …, An are n events that form a partition of sample space S, then
their probabilities must add up to 1:
n
P( A1 )  P( A2 )    P( A n )   P( Ai )  1
i 1

–
Joint Probability

• Let A and B be two events in a sample space. Then

the joint probability of the two events, written as P(A
 B), is given by

Number of observations in A  B
P( A  B) 
Total number of observations

–
Example 3.2
At an e-commerce customer service centre a total of 112
complaints were received. 78 customers complained about late
delivery of the items and 40 complained about poor product
quality.

(a) Calculate the probability that a customer will complain about

both late delivery and product quality.
(b) What is the probability that a complaint is only about poor
quality of the product?

–
Solution to Example 3.2
• Let A = Late delivery and B = Poor quality of the product. Let
n(A) and n(B) be the number of events in favour of A and B.
So n(A) = 78 and n(B) = 40. Since the total number of
complaints is 112, hence
n(A  B) = 118 – 112 = 6
• Probability of a complaint about both delivery and poor
product quality is
n(A  B ) 6
P(A  B)    0.0535
Total number of complaints 112
• Probability that the complaint is only about poor quality =
1P(A) = 1  78  0.3035
112

–
• Marginal probability is simply a probability of an event X, denoted by P(X),
without any conditions

• Independent Events : Two events A and B are independent when

occurrence of one event (say event A) does not affect the probability of
occurrence of the other event (event B). Mathematically, two events A
and B are independent when

P(A  B) = P(A)  P(B).

• Conditional Probability: If A and B are events in a sample space, then the

conditional probability of the event B given that the event A has already
occurred, denoted by P(B|A), is defined as

P( A  B)
P( B | A)  , P( A)  0
P( A)

–
Application of Simple Probability Rules in Analytics

• Association rule mining is one of the popular algorithms used

to solve problems such as market basket analysis and
recommender systems

• Market basket analysis (MBA) is used frequently by retailers to

predict products a customer is likely to buy together, which
further can be used for designing planogram and product
promotions

–
Association Rule Mining

• Association rule learning (also known as association rule mining)

is a method of finding association between different entities in a
database

• Association rule is a relationship of the form

X  Y (that is, X implies Y).

–
Association rule learning Example - Binary
representation of point of sale data

Transaction ID Apple Orange Grapes Strawberry Plums Green Apple Banana

1 1 1 1 0 1 1 1

2 0 1 0 0 0 1 1

3 0 0 0 0 0 1 1

4 1 0 0 0 1 0 0

5 1 0 0 0 1 1 1

6 0 1 1 0 0 0 1

7 0 1 1 0 0 0 1

–
• In Table , transaction ID is the transaction reference number and apple,
orange, etc. are the different SKUs sold by the store. Binary code is used to
represent whether the SKU was purchased (equal to 1) or not (equal to 0)
during a transaction. The strength of association between two mutually
exclusive subsets can be measured using ‘support’, ‘confidence’, and ‘lift’

• Support between two sets (of products purchased) is calculated using the
joint probability of those events:

n( X  Y )
Support  P( X  Y ) 
N

• Where n(X  Y) is the number of times both X and Y is purchased together

and N is the total number of transactions

–
Association Rule Leaning Cont…

• Confidence is the conditional probability of purchasing product Y given the

product X is purchased. It measures probability of event Y (customer
buying a product Y) given the event X has occurred (the customer has
already purchased product X). That is,

P( X  Y )
Confidence = P(Y | X ) 
P( X )

• Lift: The third measure in association rule mining is lift, which is given by

P( X  Y )
Lift =
P ( X )  P (Y )
Association rules can be generated based on threshold values of support,
confidence and lift. For example, assume that the cut-off for support is 0.25
and confidence is 0.5 (Lift should be more than 1)

–
Bayes Theorem
• Bayes theorem is one of the most important concepts in analytics since
several problems are solved using Bayesian statistics
P( A  B) P( A  B)
P( A | B)  and P( B | A) 
P( B) P( A)

• Using the two equations, we can show that

P ( A | B) P( B )
P( B | A) 
P( A)

–
Terminologies used to describe various components in
Bayes Theorem
1. P(B) is called the prior probability (estimate of the probability without
any additional information).
P( A | B) P( B)
P ( B | A) 
P ( A)
2. P(B|A) is called the posterior probability (that is, given that the event A
has occurred, what is the probability of occurrence of event B). That is,
post the additional information (or additional evidence) that A has
occurred, what is estimated probability of occurrence of B.

3. P(A|B) is called the likelihood of observing evidence A if B is true.

4. P(A) is the prior probability of A

–
Monty Hall Problem

–
Monty Hall Problem Using Bayes Theorem
• Let C1, C2, and C3 be the events that the car is behind door 1, 2, and 3,
respectively. Let D1, D2, and D3 be the events that Monty opens door
1, 2, and 3, respectively. Prior probabilities of C1, C2, and C3 are

P(C1) = P(C2) = P(C3) = 1/3

• Assume that the player has chosen door 1 and Monty opens door 2 to
reveal a goat. Now we would like to calculate the posterior
probability P(C1|D2), that is, the probability that the car is behind
door 1 (door chosen initially by the player) when Monty has provided
the additional information that the car is not behind door 2

–
• Using, Bayes theorem
P( D2 | C1 )  P (C1 ) (1 / 2)  (1 / 3)
P (C1 | D2 )   1/ 3
P( D2 ) (1 / 2)

• P(D2|C1) = 1(if the car is behind door 1, then Monty can open
2
either door 2 or 3)
1 1
P(D2) = 2 3
2
Note that P(C2|D2) = 0. Thus P(C3|D2) = 1 – P(C1|D2) = 1 – = 3

Thus, changing the initial choice will increase the probability of

winning the car. Alternatively,
P( D2 | C3 )  P (C3 ) 1  (1 / 3)
P(C3 | D2 )   2/3
P( D2 ) (1 / 2)

P(D2|C3) = 1 (if the car is behind door 3 and the player has
chosen door 1, Monty has to open door 2 with probability 1)

–
• Bayes theorem
Using,
P( D2 | C1 )  P(C1 ) (1/ 2)  (1/ 3)
P(C1 | D2 )    1/ 3
P( D2 ) (1 / 2)

• P(D2|C1) = (if the car is behind door 1, then Monty can open either
door 2 or 3)
P(
Note that P(C2|D2) = 0. Thus P(C3|D2) = 1 – P(C1|D2) = 1– =
Thus, changing the initial choice will increase the probability of
winning the car. Alternatively,
P( D2 | C3 )  P(C3 ) 1  (1/ 3)
P(C3 | D2 )   2/3
P( D2 ) (1/ 2)
• P(D2|C3) = 1 (if the car is behind door 3 and the player has chosen
door 1, Monty has to open door 2 with probability 1)

–
GENERALIZATION OF BAYES THEOREM
Event generated from mutually exclusive subsets

©–
Example 3.4

Black boxes used in aircrafts manufactured by three companies

A, B and C. 75% are manufactured by A, 15% by B, and 10% by C.
The defect rates of black boxes manufactured by A, B, and C are
4%, 6%, and 8%, respectively. If a black box tested randomly is
found to be defective, what is the probability that it is
manufactured by company A?

–
Solution to Example 3.4
• Let P(A), P(B), P(C) be events corresponding to the black box
being manufactured by companies A, B, and C, respectively,
and P(D) be the probability of defective black box. We are
interested in calculating the probability P(A|D).
P ( D | A)  P ( A)
P( A | D) 
P( D)
• Now P(D|A) = 0.04 and P(A) = 0.75. Using Eq.
P(D) = 0.75 × 0.04 + 0.15 × 0.06 + 0.10 × 0.08 = 0.047

So, 0.04  0.75

P( A | D)   0.6382
0.047

–
Random Variables
• Random variable is a
function that maps every
outcome in the sample
space to a real number.

• A function that assigns a

real number to each
sample point in the
sample space S.

• Random variable is a
robust and convenient
way of representing the
outcome of a random
experiment

–
Discrete Random Variables
• If the random variable X can assume only a finite or countably infinite set of
values, then it is called a discrete random variable.
• Examples of discrete random variables are:
– Credit rating (usually classified into different categories such as low,
medium and high or using labels such as AAA, AA, A, BBB, etc.).
– Number of orders received at an e-commerce retailer which can be
countably infinite.
– Customer churn (the random variables take binary values, 1. Churn and 2.
Do not churn).
– Fraud (the random variables take binary values, 1. Fraudulent transaction
and 2. Genuine transaction).
– Any experiment that involves counting (for example, number of returns in
a day from customers of e-commerce portals such as Amazon, Flipkart;
number of customers not accepting job offers from an organization).

–
Probability mass function
• For a discrete random variable,
the probability that a random
variable X taking a specific value
xi, P(X = xi), is called the
probability mass function P(xi).

• That is, a probability mass

function is a function that maps
each outcome of a random
experiment to a probability

–
Expected Value
• Expected value (or mean) of a discrete random
variable is given by
n
E ( X )   xi P ( xi )
i 1

• Where xi is the specific value taken by a discrete

random variable X and P(xi) is the corresponding
probability, that is, P(X = xi).

–
Variance and Standard Deviation

Variance of a discrete random variable is given by

n
Var( X )    xi  E ( X )  P( xi )
2

i 1

Standard deviation of a discrete random variable is given by

  VAR (X )

–
Probability Density Function (pdf)
• The probability density function, f(xi), is defined as
probability that the value of random variable X lies
between an infinitesimally small interval defined by
xi and xi + x

P( xi  X  xi  x)
f ( x)  lim
x 0 x

–
Cumulative Distribution Function (CDF)
• The cumulative distribution function (CDF) of a
continuous random variable is defined by
a
F (a )  P( X  a )  

f ( x)dx

Cumulative distribution function

–
Probability density function The probability between two
and cumulative distribution values a and b, P(a  X  b), is the
function of a continuous area between the values a and b
random variable satisfy the under the probability density
following properties function
f(x)  0


F ()   f ( x ) dx  1


b
P(a  X  b)   f ( x)dx  F (b)  F (a)
a

–
• The expected value of a continuous random variable,
E(X), is given by

E( X )   xf ( x)dx


• The variance of a continuous random variable,

Var(X), is given by


  x  E ( x) 
2
Var( X )  f ( x)dx


–
Binomial Distribution
• A random variable X is said to follow a Binomial
distribution when
– The random variable can have only two outcomes success
and failure (also known as Bernoulli trials).
– The objective is to find the probability of getting k
successes out of n trials.
– The probability of success is p and thus the probability of
failure is (1  p).
– The probability p is constant and does not change between
trials.

–
Probability Mass Function (PMF) of Binomial
Distribution
• The PMF of the Binomial distribution (probability that the
number of success will be exactly x out of n trials) is given by
n x
PMF ( x)  P( X  x)    p (1  p ) n  x , 0 xn
 x
Where n
 
 x
 

n!
x!( n  x )!

–
Mean and Variance of Binomial Distribution
The Mean of a binomial distribution is given by:
n
n x n
Mean  E ( X )   x  PMF( x)   x    p (1  p) n  x  np
x 0 x 0  x

The variance of a binomial distribution is given by

n n
n x
Var( X )   ( x  E ( X ))  PMF( x)   ( x  E ( X ))    p (1  p) n  x  np(1  p)
2 2

x 0 x 0  x

If the number of trials (n) in a binomial distribution is large, then it can be approximated
by normal distribution with mean np and variance npq.

–
Example 3.5

Fashion Trends Online (FTO) is an e-commerce company that sells women

apparel. It is observed that about 10% of their customers return the items
purchased by them for many reasons (such as size, color, and material
mismatch). On a particular day, 20 customers purchased items from FTO.
Calculate:
(a) Probability that exactly 5 customers will return the items.
(b) Probability that a maximum of 5 customers will return the items.
(c) Probability that more than 5 customers will return the items
purchased by them.
(d) Average number of customers who are likely to return the items.
(e) The variance and the standard deviation of the number of returns.

–
Solution
In this case, the value of n = 20 and p = 0.1.
(a) Probability that exactly 5 customers will return the items purchased is
 20 
P( X  5)     (0.1)5  (0.9)15  0.03192
5 
(b) Probability that a maximum of 5 customers will return the items purchased is
5  20 
P( X  5)      (0.1) k  (0.9) 20 k  0.9887
k 0  k 

(c) Probability that more than 5 customers will return the product is
5  20 
P ( X  5)  1  P( X  5)  1      (0.1) k  (0.9)20 k  1  0.9887  0.0113
k 0  k 

(d) The average number of customers who are likely to return the items is
E(X) = n × p = 20 × 0.1 = 2
(e) Variance of a binomial distribution is given by

Var(X) = n × p × (1  p) = 20 × 0.1 × 0.9 = 1.8

and the corresponding standard deviation is 1.3416

–
Poisson Distribution
• Poisson distribution is used when we have to find the
probability of number of events
• The probability mass function of a Poisson distribution is
given by 
e  k
P( X  k )  , k  0, 1, 2, ...
k!
• where  is the rate of occurrence of the events per unit of
measurement
• Cumulative distribution function of a Poisson distribution is
given by
k
e    k
P[ X  k ]  
i 0 k!

–
• The mean and variance of a Poisson random variable are given by E ( X )  
and Var( X )  

Probability mass function of a Poisson Cumulative distribution function of a

random variable ( = 4). Poisson random variable ( = 4).

–
Example
On average, about 20 customers per day cancel their order placed at Fashion
Trends Online. Calculate the probability that the number of cancellations on a
day is exactly 20 and the probability that the maximum number of
cancellations is 25

Solution
The probability that the number of cancellations is exactly 20 is given by
e 20 20 20
P ( X  20)   0.0888
20!
Probability that the maximum number of cancellation will be 25 is given by

e20 20k
25
P( X  25)    0.8878
k 0 k!

–
Geometric Distribution
• Geometric distribution represents a random experiment in which the
random variable predicts the number of failures before the success

• The probability density function of a geometric distribution is given by

P( X  x)  P(success at xth trial)  (1  p) x 1 p, where x  1, 2, 3, ...

• The cumulative distribution function is given by:

F ( x )  P ( X  x )  1  (1  p ) x
• Mean and variance of a geometric distribution are given by E ( X )  1
p

and Var( X ) 
(1  p )
p2

–
Probability mass function of a geometric Cumulative distribution function of a
distribution (p = 0.3). geometric distribution (p = 0.3).

–
Memoryless Property of Geometric Distribution
• Memoryless property is a special property of a geometric distribution in
which the conditional probability, P( X  i  j | X depends
 i ), only on the value j,
not on the value i. We know that
P( X  i)  1  P( X  i)  1  [1  (1  p)i ]  (1  p)i
P( X  i  j  X  i) P( X  i  j ) (1  p)i  j
P( X  i  j | X  i )     (1  p ) j

P( X  i ) P( X  i) (1  p)i

• Note that, P ( X  j )  (1  p ) j Thus,P( X  i  j | X  i)  P( X  j ).

• Memoryless property is an important property that simplifies calculations

associated with conditional probabilities

–
Example
Local Dhaniawala (LD) is an online grocery store and has an
innovative feature which predicts whether the customer has forgotten to
buy an item which is very common among customers of grocery items.
The probability that a customer buys milk in each shopping visit is 0.2.

(a) Calculate the probability that the customer’s first purchase of milk
happens during the 5th visit.
(b) Calculate the average time between purchases of milk.
(c) If a customer has not purchased milk during the past 3 shopping
visits, what is the probability that the customer will not buy milk for
another 2 visits?

–
Solution
(a) Probability that the customer’s first purchase of milk happens
on 5th trip is given by
P( X  5)  (1  0.2) 4  0.2  0.08192
(b) The average time between purchase of milk is
1 1
E( X )   5
p 0.2
(c) Given that a customer has not purchased milk for the past 3
shopping visits, the probability that the customer will not buy
for another 2 visits is given by

P( X  3  2 | X  3)  P( X  2)  (1  p) 2  (1  0.2)2  0.64

–
Parameters of Continuous Distributions
• Scale parameter: Scale parameter defines the range of the
continuous distribution. The larger the scale parameter value,
larger is the spread of the distribution.

• Shape parameter: Shape parameter defines the shape of the

probability distribution. The changes to the value of shape
parameter will change the shape of the distribution.

• Location parameter: Location parameter locates (or shifts)

the distribution on the horizontal axis.

–
Uniform Distribution
Probability density function Cumulative distribution functions
0, xa
 1 x a
 , x  [ a , b] 
F ( x)   a xb
f ( x)   b  a ,
b  a

0, otherwise 
1, xb

Mean and variance of uniform distribution are

1 and 1
E ( X )  (a  b ) Var( X )  (b  a)2
2 12

–
Exponential Distribution
• Exponential distribution is a single parameter continuous distribution that is
traditionally used for modelling time to failure of electronic components

• The probability density function and cumulative distribution of exponential

distribution are given by

f ( x)  e x , 0

F ( x )  1  e  x

• The parameter  is the scale parameter and represents the rate of occurrence
of the event, (1/) is the mean time between events.

–
Probability density function of an
exponential distribution

The mean and variance of an exponential distribution are given by

1 1
) 
E ( X and Var( X ) 
 2

The expected value (1/) is the mean time between events.

–
Memoryless Property of Exponential Distribution
• Exponential distribution is the only continuous probability
distribution that has the memoryless property. That is ,

P( X  t  s | X  t )  P( X  s)

P ( X  t  s  X  t ) P ( X  t  s ) e   (t  s )  s
P( X  t  s | X  t )     t  e
P( X  t ) P( X  t ) e

–
Example
The time to failure of an avionic system follows an exponential
distribution with a mean time between failures (MTBF) of 1000
hours.

(a) Calculate the probability that the system will fail before 1000
hours.
(b) Calculate the probability that it will not fail up to 2000 hours.
(c) Calculate the time by which 10% of the systems will fail (that
is calculate P10 life)

–
Solution
(a) The probability that the system will fail by 1000 hours is
F (1000)  1  et 1
 1000
In this case   1 / 1000, t  1000 so , F (1000)  1  e 1000  1  e 1  0.6321
(b) The probability that the system will not fail up to 2000 hours
1
is P( X  2000)  1  P( X  2000)  1  F (t )  et  e10002000  e2  0.1353

(c) The time by which 10% of the systems will fail is

F (t )  0.10  1  et  0.1 e t  0.9
So ,
1 hours
t     ln(0.9)  1000  ln(0.9)  105.61


That is, by 105.61 hours, 10% of items will fail.

–
Normal Distribution
• Normal distribution, also known as Gaussian distribution, is
one of the most popular continuous distribution in the field of
analytics especially due to its use in multiple contexts
• The probability density function and the cumulative
distribution function are given by
2
1  x  
1   
f ( x)  e 2  
,    x  
 2
2
x 1  t  
1   
F ( x)      x  
2  
e dt ,
 2
• Here  and  are the mean and standard deviation of the
normal distribution

–
NORM.DIST(x, , , true) can be used for calculating the probability density
function and cumulative distribution function of a normal distribution with
mean  and standard deviation .

Probability density function of a normal Cumulative distribution function of a

distribution normal distribution.

–
Properties of Normal Distribution
1. Theoretical normal density functions are defined between 
and +.

2. It is a two parameter distribution, where the parameter  is

the mean (location parameter) and the parameter  is the
standard deviation (scale parameter).

3. All normal distributions have symmetrical bell shape around

mean  (thus it is also median).  is also the mode of the
normal distribution, that is,  is the mean, median as well as
the mode.

–
4. For any normal distribution, the areas between specific values
measured in terms of  and  are given by:
Value of Random Variable Area under the Normal Distribution (CDF)

0.6828
    X   +  (area between one

sigma from the mean)

  2  X   + 2 (area between 0.9545

two sigma from the mean)

0.9973
  3  X   + 3 (area between

three sigma from the mean)

5. Any linear transformation of a normal random variable is also

normal random variable. That is, if X is a normal random
variable, then the linear transformation AX + B (where A and B
are two constants) is also a normal random variable.

–
• If X1 and X2 are two independent normal random variables
with mean 1 and 2 and variance 12 and  22 respectively, then
X1 + X2 is also a normal distribution with mean 1 + 2 and
2 2
variance  1 2

• Sampling distribution of mean values a large sample drawn

form a population of any distribution is likely to follow a
normal distribution, this result is known as the central limit
theorem

–
Standard Normal Variable

• A normal random variable with mean  = 0 and  = 1 is called

the standard normal variable and usually represented by Z
• The probability density function and cumulative distribution
function of a standard normal variable are given by

z2
1 
f ( z)  e 2
2

x2
z 1 
F ( z)   e 2 dz
 2

–
• By using the following transformation, any normal random
variable X can be converted into a standard normal variable

X 
Z 

• The random variable X can be written in the form of a
standard normal random variable using the relationship
X=+Z

–
• A simple approximation of standard normal CDF is given by
Tocher (1963)
e 2 kz
P( Z  z )  F ( z ) 
1  e2 kz

where k  2/

Another more accurate approximation is provided by Byrc

(2002):

 z 2
 A z  A  z2 / 2
P( Z  z )  F ( z )  1   1 2 e
 2  z 3  B z 2  B z  2 A 
 1 2 2

–
Example
According to a survey on use of smart phones in India, the smart
phone users spend 68 minutes in a day on average in sending
messages and the corresponding standard deviation is 12
minutes. Assume that the time spent in sending messages
follows a normal distribution.

(a) What proportion of the smart phone users are spending more
than 90 minutes in sending messages daily?
(b) What proportion of customers are spending less than 20
minutes?
(c) What proportion of customers are spending between 50
minutes and 100 minutes?

–
Solution
It is given that  = 68 minutes and  = 12 minutes.
(a) Proportion of customers spending more than 90 minutes is given by P(X 
90) = 1  P(X  90) = 1  F(90)
The standard normal random variable value for X = 120 is given by

x   90  68
Z   1.8333
 12
That is, F(X = 90) = F(Z = 1.8333). From standard normal distribution table,
we get for Z = 1.8333. The area under the standard normal distribution curve
is 0.9666. Thus , P(X  90) = 1 P(X  90) = 1  F(90) = 1 – 0.9666 = 0.0334

Alternatively, using Excel, we get

P(X  90) = 1  P(X  90) = 1 – Normdist (90, 68, 12, true) = 0.0334

–
(b) Proportion of customers spending less than 20 minutes is
P(X  20) = F(20)
Using Excel function, we have Normdist(20, 68, 12, true) =
3.1671 × 105

(c) Proportion of customers spending between 50 and 100

minutes is given by
P(50  X  100)  F (100)  F (50)
 Normdist(1 00,68,12, true)  Normdist(5 0,68,12, true)
 0.9293

–
Chi-Square Distribution
• Chi-square distribution with k degrees of freedom [denoted as
2(k) distribution] is a non-parametric distribution which is
obtained by adding square of k independent standard normal
random variables.
• Consider a normal random variable X1 with mean 1 and
standard deviation 1. Then we can define Z1 (the standard
normal random variable) as
X1  1
Z1 
1

• Then,
2
 X  1 
Z12   1 
  1 

is a chi-square distribution with one degree of freedom [2(1)]

–
• Let X2 be a normal random variable with mean 2 and standard deviation
2 and Z2 is the corresponding standard normal variable. Then the random
2 2
variable Z1  Z 2 given by
2 2
 X1  1   X 2  2 
Z12  Z 22 
 
 
 

  1   2 

is a chi-square distribution with 2 degrees of freedom.

• A chi-square distribution with k degrees of freedom is given by sum of

squares of standard normal random variables Z1, Z2, …, Zk obtained by
transforming normal random variables X1, X2, …, Xk with mean values
1, 2, …, k and corresponding standard deviations 1, 2, …, k. That
is
2 2 2
 X  1   X  2   X  k 
 2 (k )  Z12  Z 22  ...  Z k2   1    2   ...   k 
 1   2   k 

–
The probability density function of 2(k) is given by

k x
1 1 
f ( x)  x2 e 2
2k 2 (k 2)

where (k / 2) is a Gamma function given by

x
k 1 x
(k )  e dx
0

–
• The cumulative distribution function of a chi-square
distribution with k degrees of freedom is given by
 k x 
  , 
F ( x)   2 2 
 k 
 
 2 

• Where   k2 , 2xis the lower incomplete Gamma function. It is

given by
x
 (k , x)   t k 1e  t dt
0

–
Probability density function of chi-
Cumulative distribution of chi-square
square distribution for different values
distribution with k degrees of freedom
of k

–
Properties of chi-square distribution
• The mean and standard deviation of a chi-square distribution
are k and 2k where k is the degrees of freedom

• As the degrees of freedom k increases the probability density

function of a chi-square distribution approaches normal
distribution.

• Chi-square goodness of fit test is one of the popular tests for

checking whether a data follows a specific probability
distribution.

–
Student’s t-Distribution
• Student’s t-distribution (or simply t-distribution) arises while
estimating the population mean of a normal distribution using
sample which is either small and/or the population standard
deviation is unknown

• The distribution was developed by William Gosset under the

pseudo name ‘student’ while working for Guinness Brewery in
Dublin, Ireland (Student, 1908) and thus is called student’s
distribution.

–
• Assume that X1, X2, …, Xn are n observations (that is, sample of size n)
from a normal distribution with mean  and standard deviation . Let
 n
X   Xi
i 1

1 n  2
S   X i  X 
n  1 i 1 

• where  and S are mean and standard deviation estimated from the sample
X
X1, X2, …, Xn. Then the random variable t defined by


X
t
S/ n
follows a t-distribution with (n  1) degrees of freedom.

–
• The probability density function of t-distribution with n
degrees of freedom is given by
 n 1
  
n 1
2  2
 2  x 
f ( x)  1  
 
n n 
   n 
2

–
Cumulative distribution function of student’s t-distribution

–
Properties of t-distribution:
• The mean of a t distribution with 2 or more degrees of freedom is 0.
• The standard deviation of t-distribution is for n > 2, where n is
the number of degrees of freedom. n
n2

• As the degrees of freedom n increases the probability density

function of a t-distribution approaches the density function of
standard normal distribution. For n > 120, the difference between
the area under probability density function of a t-distribution is very
close to the area under a standard normal distribution.
• t-distribution is an important distribution for hypothesis testing of
means of a population and for comparing means of two populations.

–
F-Distribution
F-distribution (short form of Fisher’s distribution named after statistician
Ronald Fisher) is a ratio of two chi-square distributions. Let Y1 and Y2 be two
independent chi-square distributions with k1 and k2 degrees of freedom,
respectively. Then the random variable X is defined as
Y1 / k1
X 
Y2 / k 2
is a F distribution. The probability density function of an F-distribution is
given by
k /2
 k1  k 2  k1 
1
k1
 
k   1
 2  2  x 2
f ( x)  
k  k  k1  k 2
 1  2   k1 x 2
 2   2  
 1  
 k2  

–
Probability density function of F- Cumulative density function of F-
distribution distribution

–
Properties of F distribution:
k2
• Mean of F-distribution is k2  2
, for k2 > 2.

• Standard deviation of F-distribution is 2k 22 ( k1  k 2  2) for k2 > 4.

k1 ( k 2  2) 2 ( k 2  4)

• F-distribution is non-symmetrical and the shape of the distribution depends

on the values of k1 and k2.

• F-distribution is used in Analysis of Variance to test the mean values of

multiple groups

–
Summary
• The concept of probability, random variables and probability distributions
are foundations of data science. Knowledge of these concepts is important
for framing and solving analytics problems.

• Random variable is a function that maps an outcome of a random

experiment to a real number and plays an important role in analytics since
many key performance indicators used across industries are random
variables.

• Basic probability concepts such as joint events, independent events,

conditional probability and Bayes’ theorem are useful for predicting
probability of an event of importance. These concepts are used in
algorithms such as association rule learning which is used in solving
analytics problems such as market basket analysis and recommender
systems.

–
• Discrete probability distributions such as binomial distribution, Poisson
distribution and geometric distribution are used for modelling discrete
random variables.

• Continuous distributions such as normal distribution, chi-square

distribution, t-distribution and F-distribution play an important role in
hypothesis testing.

(Enzyme Reaction Engineering) F. Xavier Malcata - Mathematics for Enzyme Reaction Kinetics and Reactor Performance, 2 Volume Set (Enzyme Reaction Engineering)-Wiley (2020)
No ratings yet
(Enzyme Reaction Engineering) F. Xavier Malcata - Mathematics for Enzyme Reaction Kinetics and Reactor Performance, 2 Volume Set (Enzyme Reaction Engineering)-Wiley (2020)
1,042 pages
Mathematics of Finance
No ratings yet
Mathematics of Finance
31 pages
This Study Resource Was: A M MTH 5301
No ratings yet
This Study Resource Was: A M MTH 5301
7 pages
Advanced Rexx Topics
No ratings yet
Advanced Rexx Topics
56 pages
ACET - Syllabus 2 - Stats Pack
No ratings yet
ACET - Syllabus 2 - Stats Pack
2 pages
Scholz, Stephens - K-Sample Anderson-Darling Tests (1987)
No ratings yet
Scholz, Stephens - K-Sample Anderson-Darling Tests (1987)
8 pages
Products and Convolutions of Gaussian Probability Density Functions
No ratings yet
Products and Convolutions of Gaussian Probability Density Functions
13 pages
Lecture 2 Simple Regression Model
100% (1)
Lecture 2 Simple Regression Model
47 pages
BUSI2053 - Simple Regression I
No ratings yet
BUSI2053 - Simple Regression I
28 pages
Probability and Statistics
No ratings yet
Probability and Statistics
110 pages
Probability and Random Processes
No ratings yet
Probability and Random Processes
608 pages
Database Fundamentals Handout
100% (1)
Database Fundamentals Handout
113 pages
Yokoyama, Introduction To Probability Theory (Probability and Statistics: The Logic of Chance)
No ratings yet
Yokoyama, Introduction To Probability Theory (Probability and Statistics: The Logic of Chance)
21 pages
Statistics Basics From IITM Statistits 2 Course Week - 0
100% (1)
Statistics Basics From IITM Statistits 2 Course Week - 0
71 pages
Cobol Notes v01
No ratings yet
Cobol Notes v01
121 pages
Bayes' Theorem: Probability Theory Statistics
No ratings yet
Bayes' Theorem: Probability Theory Statistics
9 pages
Discriminant Analysis
No ratings yet
Discriminant Analysis
33 pages
SHARE 120 - VSAM Boot Camp - An Introduction To VSAM
No ratings yet
SHARE 120 - VSAM Boot Camp - An Introduction To VSAM
50 pages
Introduction To Probability and Random Signals
100% (9)
Introduction To Probability and Random Signals
139 pages
Comm-05-Random Variables and Processes
No ratings yet
Comm-05-Random Variables and Processes
90 pages
Regression PPT
No ratings yet
Regression PPT
21 pages
Markov Chains - Lectures - CMC - 2024
No ratings yet
Markov Chains - Lectures - CMC - 2024
168 pages
BAYE's Theorm
No ratings yet
BAYE's Theorm
27 pages
Probability and Probability Distributions (Week7-12)
No ratings yet
Probability and Probability Distributions (Week7-12)
73 pages
Practice Questions Additional PDF
No ratings yet
Practice Questions Additional PDF
33 pages
Time Series Components
No ratings yet
Time Series Components
3 pages
Markov Chain and Markov Processes
No ratings yet
Markov Chain and Markov Processes
9 pages
MMPC-5 Imp
100% (1)
MMPC-5 Imp
32 pages
Recursive SQL
No ratings yet
Recursive SQL
75 pages
Markov Analysis
No ratings yet
Markov Analysis
10 pages
Ugc Model Curriculum Statistics: Submitted To The University Grants Commission in April 2001
No ratings yet
Ugc Model Curriculum Statistics: Submitted To The University Grants Commission in April 2001
101 pages
Assignment 7 - Engineering Statistics - Spring 2018
No ratings yet
Assignment 7 - Engineering Statistics - Spring 2018
6 pages
Problem Set 5
No ratings yet
Problem Set 5
5 pages
How To Create A BMS Mapset: Cics, C4 © 2001, Mike Murach & Associates, Inc. Slide 1
100% (1)
How To Create A BMS Mapset: Cics, C4 © 2001, Mike Murach & Associates, Inc. Slide 1
52 pages
Probability & Statistics BITS WILP
100% (2)
Probability & Statistics BITS WILP
174 pages
App.A - Detection and Estimation in Additive Gaussian Noise PDF
No ratings yet
App.A - Detection and Estimation in Additive Gaussian Noise PDF
55 pages
Ma3355 CDP
No ratings yet
Ma3355 CDP
9 pages
Normal Distribution
No ratings yet
Normal Distribution
16 pages
Lab 4: Logistic Regression: PSTAT 131/231, Winter 2019
No ratings yet
Lab 4: Logistic Regression: PSTAT 131/231, Winter 2019
10 pages
Sivagamiyin Sabatham 1
100% (1)
Sivagamiyin Sabatham 1
124 pages
Lecture Notes - 3 - CT4
No ratings yet
Lecture Notes - 3 - CT4
17 pages
CHAPTER3 Continuous Probability Distribution
No ratings yet
CHAPTER3 Continuous Probability Distribution
56 pages
Solutions 5
No ratings yet
Solutions 5
11 pages
Geometric Linear Programming CH03 PDF
No ratings yet
Geometric Linear Programming CH03 PDF
36 pages
BSF 4230 - Advanced Portfolio Management - April 2022
No ratings yet
BSF 4230 - Advanced Portfolio Management - April 2022
8 pages
T Test For A Mean
100% (1)
T Test For A Mean
18 pages
Inclusion Exclusion Principle
No ratings yet
Inclusion Exclusion Principle
8 pages
Chapter 5 - Violations of Regression Assumptions
No ratings yet
Chapter 5 - Violations of Regression Assumptions
44 pages
02 Functions
No ratings yet
02 Functions
39 pages
Chapter 7 Statistics
No ratings yet
Chapter 7 Statistics
7 pages
Time Series Analysis
No ratings yet
Time Series Analysis
24 pages
Homework1 Ans
No ratings yet
Homework1 Ans
6 pages
Moment Generating Function
No ratings yet
Moment Generating Function
5 pages
MBA Question Bank Jan Feb 2023 June July 2023 I II Sem
No ratings yet
MBA Question Bank Jan Feb 2023 June July 2023 I II Sem
36 pages
Sample Exam Questions (And Answers)
No ratings yet
Sample Exam Questions (And Answers)
22 pages
Markov Chains
No ratings yet
Markov Chains
63 pages
Equity of Cybersecurity in the Education System: High Schools, Undergraduate, Graduate and Post-Graduate Studies.
From Everand
Equity of Cybersecurity in the Education System: High Schools, Undergraduate, Graduate and Post-Graduate Studies.
Joseph O. Esin
No ratings yet
Presentation2 - Probability Theory
No ratings yet
Presentation2 - Probability Theory
28 pages
3 Probability
No ratings yet
3 Probability
24 pages
Fundamentals of Probability
100% (1)
Fundamentals of Probability
35 pages
Lecture 2
No ratings yet
Lecture 2
40 pages
Probability
No ratings yet
Probability
45 pages
Basic Concepts of Probability
No ratings yet
Basic Concepts of Probability
36 pages
17-0112 Sample Agreement
No ratings yet
17-0112 Sample Agreement
7 pages
Sales_S_40_25-26
No ratings yet
Sales_S_40_25-26
1 page
GCPAT_bituthene_liquid_membrane_us_396
No ratings yet
GCPAT_bituthene_liquid_membrane_us_396
4 pages
Base Plate INDIA CODE
No ratings yet
Base Plate INDIA CODE
15 pages
Create PDF in Your Applications With The Pdfcrowd: HTML To PDF Api
No ratings yet
Create PDF in Your Applications With The Pdfcrowd: HTML To PDF Api
1 page
Concrete Curing Compound: Features
No ratings yet
Concrete Curing Compound: Features
2 pages
Ep 300
No ratings yet
Ep 300
2 pages
Create PDF in Your Applications With The Pdfcrowd: HTML To PDF Api
No ratings yet
Create PDF in Your Applications With The Pdfcrowd: HTML To PDF Api
1 page
Contract Building and Repair
No ratings yet
Contract Building and Repair
3 pages
Create PDF in Your Applications With The Pdfcrowd: HTML To PDF Api
No ratings yet
Create PDF in Your Applications With The Pdfcrowd: HTML To PDF Api
1 page
Create PDF in Your Applications With The Pdfcrowd: HTML To PDF Api
No ratings yet
Create PDF in Your Applications With The Pdfcrowd: HTML To PDF Api
1 page
Create PDF in Your Applications With The Pdfcrowd: HTML To PDF Api
No ratings yet
Create PDF in Your Applications With The Pdfcrowd: HTML To PDF Api
1 page
Create PDF in Your Applications With The Pdfcrowd: HTML To PDF Api
No ratings yet
Create PDF in Your Applications With The Pdfcrowd: HTML To PDF Api
1 page
Create PDF in Your Applications With The Pdfcrowd: HTML To PDF Api
No ratings yet
Create PDF in Your Applications With The Pdfcrowd: HTML To PDF Api
1 page
Create PDF in Your Applications With The Pdfcrowd: HTML To PDF Api
No ratings yet
Create PDF in Your Applications With The Pdfcrowd: HTML To PDF Api
1 page
Create PDF in Your Applications With The Pdfcrowd: HTML To PDF Api
No ratings yet
Create PDF in Your Applications With The Pdfcrowd: HTML To PDF Api
1 page
Create PDF in Your Applications With The Pdfcrowd: HTML To PDF Api
No ratings yet
Create PDF in Your Applications With The Pdfcrowd: HTML To PDF Api
1 page
Create PDF in Your Applications With The Pdfcrowd: HTML To PDF Api
No ratings yet
Create PDF in Your Applications With The Pdfcrowd: HTML To PDF Api
1 page
Create PDF in Your Applications With The Pdfcrowd: HTML To PDF Api
No ratings yet
Create PDF in Your Applications With The Pdfcrowd: HTML To PDF Api
1 page
Create PDF in Your Applications With The Pdfcrowd: HTML To PDF Api
No ratings yet
Create PDF in Your Applications With The Pdfcrowd: HTML To PDF Api
1 page
Create PDF in Your Applications With The Pdfcrowd: HTML To PDF Api
No ratings yet
Create PDF in Your Applications With The Pdfcrowd: HTML To PDF Api
1 page
Create PDF in Your Applications With The Pdfcrowd: HTML To PDF Api
No ratings yet
Create PDF in Your Applications With The Pdfcrowd: HTML To PDF Api
1 page
Create PDF in Your Applications With The Pdfcrowd: HTML To PDF Api
No ratings yet
Create PDF in Your Applications With The Pdfcrowd: HTML To PDF Api
1 page
A LEVEL APPLIED MATHEMATICS P4252 Seminar 20192
No ratings yet
A LEVEL APPLIED MATHEMATICS P4252 Seminar 20192
14 pages
Introduction To Statistics With GraphPad Prism Slides
No ratings yet
Introduction To Statistics With GraphPad Prism Slides
101 pages
Post Graduate Diploma in Machine Learning & Artificial Intelligence (PGD-ML&AI)
No ratings yet
Post Graduate Diploma in Machine Learning & Artificial Intelligence (PGD-ML&AI)
19 pages
Chapter Sixteen: Analysis of Variance and Covariance
No ratings yet
Chapter Sixteen: Analysis of Variance and Covariance
64 pages
ECM Class 1 2 3
No ratings yet
ECM Class 1 2 3
65 pages
Bhanu Pant
No ratings yet
Bhanu Pant
15 pages
Real Estate Modelling and Forecasting Chris Brooks - The ebook in PDF and DOCX formats is ready for download
100% (1)
Real Estate Modelling and Forecasting Chris Brooks - The ebook in PDF and DOCX formats is ready for download
46 pages
M1112SP IIIb 2
No ratings yet
M1112SP IIIb 2
4 pages
Chapter12cpm Pert 120130120246 Phpapp01
100% (1)
Chapter12cpm Pert 120130120246 Phpapp01
38 pages
Limiting Distributions
No ratings yet
Limiting Distributions
10 pages
Unit-16 IGNOU STATISTICS
No ratings yet
Unit-16 IGNOU STATISTICS
16 pages
Response of Bank of Correlators To Noisy Input
No ratings yet
Response of Bank of Correlators To Noisy Input
19 pages
Experimental design for food engineering
No ratings yet
Experimental design for food engineering
116 pages
Chapter 2756
No ratings yet
Chapter 2756
30 pages
STS 201 Week 6 Lecture Note
No ratings yet
STS 201 Week 6 Lecture Note
35 pages
Sample Finalexam
No ratings yet
Sample Finalexam
3 pages
BS Syllabus 02-10-2018
No ratings yet
BS Syllabus 02-10-2018
17 pages
Wills Book Modules 1
No ratings yet
Wills Book Modules 1
12 pages
Lessons
No ratings yet
Lessons
16 pages
DJW395
No ratings yet
DJW395
8 pages
Unit 5 Pattern Recognition
No ratings yet
Unit 5 Pattern Recognition
10 pages
Solution Manual for Statistical Inference, Second Edition, George Casella, Roger L. Berger download
100% (4)
Solution Manual for Statistical Inference, Second Edition, George Casella, Roger L. Berger download
37 pages
Testing 1,2,3
100% (1)
Testing 1,2,3
297 pages
Special Instructions / Useful Data: X X X I N PA
No ratings yet
Special Instructions / Useful Data: X X X I N PA
17 pages
Midterm Exam Version B
No ratings yet
Midterm Exam Version B
19 pages
SS 02 Quiz 1
No ratings yet
SS 02 Quiz 1
35 pages
Numerical Descriptive Techniques (6 Hours)
No ratings yet
Numerical Descriptive Techniques (6 Hours)
89 pages