Probability
Basic concepts
Distributions
Probability
• Inferential statistics is built on the foundation of probability theory
• Random experiment: a process leading to two or more possible
outcomes, without knowing exactly which outcome will occur
• A coin is tossed and the outcome is either a head or a tail
• A Globo order has the possibility of receiving a 1-5 score
• A customer enters a store and either purchases a shirt or does not
• Tossing a dice and having one of six potential equally likely outcomes
Random variables and their probability
distributions
• A random variable is one that takes on numerical values and has an
outcome that is determined by an experiment.
• The number of heads appearing in 10 flips of a coin
• The average score of travels for a Yandex taxi driver today
• Random variables: X, Y, X (uppercase letters)
• Outcomes of random variables: x, y, z (lowercase letters)
• X is the number of heads appearing in 10 flips of a coin and takes a value in
the set {0, 1, 2, 3, …, 10}
• x is some particular outcome, e.g., x = 6
Continuous Random Variable
• A random variable is a continuous random variable if it can take any
value in an interval
• The yearly income for a family
• The amount of oil imported to Tajikistan in a particular month
• The change in the exchange rate between KGS and USD in a month
• The CO2 emission level to Bishkek air on a given day
Discrete Random Variables
• A discrete random variable is one that takes on only a finite number of
values.
• E.g., a Bernoulli (binary) random variable, the simplest discrete random
variable
• X can take values of 1 or 0
• In coin-flipping, P(X = 1) = ½ and P(X = 0) = ½
• More generally, P(X = 1) = θ and P(X = 0) = 1 - θ
• For discrete random variables, if X takes on the k possible values {x1, x2, …xk}, then
the probabilities p1, p2, …pk, are defined by
• pj = P(X=xj), j = 1, 2, …k where each pj is between 0 and 1 and ∑pj = 1
Discrete Random Variables: pdf
• Probability density function (pdf) of a random variable is a
representation of the probabilities for all the possible outcomes (may
be algebraic, graphical, or tabular)
• The probability density function of X is
f(xj) = pj, j = 1, 2, …, k
The pdf of a discrete random variable X represents the probability that
X takes the value x, as a function of x. That is,
f(x) = P(X = x), for all values of x
Discrete Random Variables: pdf. Illustration
• Suppose that X is the number of free throws made by a basketball
player out of two attempts, so that X can take on the three values
{0,1,2}.
• Assume that the pdf of X is given by f(0) = .20, f(1) = .44, and f(2) = .36
• Using this pdf, what is the probability that a player makes at
least one free throw?
• P(X≥1) = ?
Discrete Random Variables: pdf. Illustration
• The probability distribution function for the number of sandwiches
sold by a sandwich shop.
Cumulative distribution function (cdf)
• F(x0) of a random variable X represents the probability that X does not
exceed the value x0 as a function of x0
F(x0) = P (X ≤ x0 )
For discrete random variables, cdf is a sum of the pdfs over all values xj
such that xj ≤ x
and then
0 ≤ F(x0) ≤ 1 for every number x0
If x0 < x1, then F(x0) ≤ F(x1)
cdf: illustration
• Fit Motors is a car dealer in Kochkor. Based on an analysis of its sales
history, the managers know that on any single day the number of
Honda Fit cars sold can vary from 0 to 5. How can the probability
distribution function shown in the table be used for inventory
planning?
Expected value of a Discrete Random
Variable
• The expected value of a random variable is also called its mean and is
denoted μ.
• The expected value E[X] of a discrete random variable X is defined as
E[X] = μ = ∑xP(x)
e.g., the probability distribution for the number of errors (X) in the
lecture is
P(0) = 0.81 P(1) = 0.17 P(2) = 0.02
Find the expected number of errors per lecture
Variance of a Discrete Random Variable
• The expectation of the squared deviations about the mean, (X – μ)2 ,
is called the variance, denoted as σ2 and given by
σ2 = E[(X – μ)2 ] = ∑ (x – μ)2 P(x)
The variance of a discrete random variable X can also be expressed as
σ2 = E[X2 ] – μ2 = ∑ x2 P(x) – μ2
The standard deviation, σ, is the positive square root of the variance
Expected value and variance: illustration
• Fit Motors is a car dealer in Kochkor.
• Find the expected value and variance for this
probability distribution
• μx = E[X] = ∑xP(x) = 0*0.15 + 1 * 0.30 + … = 1.95
• σ 2x= (0 - 1.95)2*0.15 + (1 – 1.95)2*0.30 + … = 1.9475
Expected value and variance for Bernoulli
random variable
• Remember that
• X can take values of 1 or 0
• P(X = 1) = θ and P(X = 0) = 1 - θ
Or
P(1) = P and P(0) = 1 – P
Can you calculate its mean and variance?
Expected value and variance for Bernoulli
random variable
Bernoulli random variable : illustration
• Anisa, a realtor, believes that for a particular contact the probability of
making a sale is 0.4. If the random variable X is defined to take the
value 1 if a sale is made and 0 otherwise, then X has a Bernoulli
distribution with probability of success P equal to 0.4.
• Find the mean and the variance of the distribution
Binomial distribution
•
Binomial distribution
• Suppose that a random experiment can result in two possible
mutually exclusive and collectively exhaustive outcomes, “success”
and “failure,” and that P is the probability of a success in a single trial.
• If n independent trials are carried out, the distribution of the number
of resulting successes, x, is called the binomial distribution.
• Its probability distribution function for the binomial random variable
X = x is as follows
Binomial distribution
• Let X be the number of successes in n independent trials, each with
probability of success P. Then X follows a binomial distribution with
mean
μx = E[X] = nP
σ 2x = E[(X – μ)2 ] = nP(1-P)
Can you show the derivation of the mean and variance?
Binomial distribution: practice
• Suppose that a real estate agent, Bakai, has 5 contacts, and he
believes that for each contact the probability of making a sale is 0.40.
a. Find the probability that he makes at most 1 sale.
• b. Find the probability that he makes between 2 and 4 sales
(inclusive).
• c. Graph the probability distribution function
Poisson distribution
• The number of failures in a large computer system during a given day
• The number of replacement orders for a part received by a firm in a
given month
• The number of customers to arrive for flights during each 10-minute
time interval from 3:00 p.m. to 6:00 p.m. on weekdays
• The number of occurences or successes of a certain event in a given
continuous interval (such as time, surface area, length, etc.)
Poisson distribution function, mean, and
variance
• The random variable X is said to follow the Poisson distribution if it has the
probability distribution
• where P(x) is the probability of x successes over a given time or space, given λ
• λ = the expected number of successes per time or space unit, λ >0
• e is the base for natural logarithm (=2.271828)
• The mean and variance of the Poisson distribution are
• μx = E[X] = λ
• σ 2x = E[(X – μ)2 ] = λ
Poisson distribution: illustration
• Asel, computer center manager, reports that her computer system
experienced three component failures during the past 100 days. From
past experience the expected number of failures per day is 3/100
• a. What is the probability of no failures in a given day?
• b. What is the probability of one or more component failures in a
given day?
• c. What is the probability of at least two failures in a 3-day period?
Poisson approximation to the Binomial
distribution
• If the number of trials, n, is large
• The distribution of the number of successes X is binomial
• Mean distribution of X is nP is of only moderate size (preferably nP ≤ 7),
• This distribution can be approximated by the Poisson distribution with
λ = Np. The probability distribution of the approximating distribution
is then
Poisson approximation to the Binomial
distribution
• An analyst predicted that 3.5% of all small corporations would file for
bankruptcy in the coming year. For a random sample of 100 small
corporations, estimate the probability that at least 3 will file for
bankruptcy in the next year, assuming that the analyst’s prediction is
correct.
Joint distributions, conditional distributions,
and independence
• Let X and Y be discrete random variables. Then, (X,Y) have a joint
distribution, which is fully described by the joint probability density
function of (X,Y):
fX,Y(x,y) = P(X = x, Y = y)
Random variables X and Y are independent iff
fX,Y(x,y) = fX(x) fY(y)
If X and Y are discrete: P(X=x, Y=y) = P(X=x)P(Y=y)
Conditional distribution
Conditional expectation
• If X and Y are discrete, then
• Interpret the following:
if X and Y are independent then E(Y|X)=E(Y)
Conditional expectation
• E(Y)=(−1)(0.40)+(0)(0.20)+(1)(0.40)=0
• E(Y|X=x) for each x
• E(Y|X=−1)=(−1)(10/35)+(0)(0)+(1)(20/35)=2/7
• E(Y|X=0)=(−1)(1)+(0)(0)+(1)(0)=−1
• E(Y|X=1)=(−1)(0)+(0)(20/35)+(1)(10/35)=2/7
• E(X|Y=y)for each y
• E(X|Y=−1)=(−1)(1/4)+(0)(3/4)+(1)(0)=−1/4
• E(X|Y=0)=(−1)(0)+(0)(0)+(1)(1)=1
• E(X|Y=1)=(−1)(25/40)+(0)(0)+(1)(15/40)=−1/4
• E(XY)=(−1)(−1)(0.1)+(1)(−1)(0.25)+(1)(−1)(0)+(1)(1)(0.15)+0=0
• So Cov(X,Y)=E(XY)−E(X)E(Y)=0
Expected value: properties
• For any constant c, E(c) = c.
• For any constants a and b, E(aX + b) = aE(X) + b
• {X1, X2, …, Xn} are random variables, then E(X1 + X2 + … + Xn) = E(X1) +
E(X2) + … +E(Xn)
• If {a1, a2, …, an} are constants and {X1, X2, …, Xn} are random variables,
then E(a1X1 + a2X2 + … + anXn) = a1E(X1) + a2E(X2) + … +anE(Xn)
• If X = X1 + X2 + … + Xn and the variable is binomial
• What is the E(X)?
Expected value: properties
•
Var(X): properties
•
Var(X): properties
• Variance of any constant is zero
Var(X) = 0 if E(X) = c
• For any constants a and b, Var(aX + b) = a2Var(X) (can you prove it?)
• So, adding a constant to a random variable does not change the variance
• Multiplying a random variable by a constant increases the variance by a
square of that constant
• Var(X1 + X2 + … + Xn ) = Var(X1 ) + Var(X2 ) + … + Var(Xn ) for independent
X
• Then what is Var(a1X1 + a2X2 + … + anXn) ?
Standard Deviation
•
Covariance
• E(X) =μx and E(Y)= μy
• Cov(X, Y) = σxy = E[(X - μx)(Y - μy)]
• Show that Cov(X, Y) = E(XY) - μx μy
Properties of Cov(X, Y)
• If X and Y are independent, then Cov(X, Y) = 0 (Proof?)
• For any constants a1, b1, a2, and b2
• Cov(a1 X + b1Y, a2 Y + b2) = a1 a2 Cov(X, Y)
• |Cov(X, Y)| ≤ sd(X)sd(Y) Cauchy-Schwartz inequality