0% found this document useful (0 votes)
3 views

lecture_note_5

The document discusses the generation of random numbers and their application in statistical simulations, including methods for simulating random variables from discrete and continuous distributions. Key techniques such as the Inverse Transformation Method and Acceptance-Rejection Sampling are explained, along with their mathematical foundations. Additionally, it covers concepts related to random sampling, sample statistics, and their distributions.

Uploaded by

Kabir Kumar
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views

lecture_note_5

The document discusses the generation of random numbers and their application in statistical simulations, including methods for simulating random variables from discrete and continuous distributions. Key techniques such as the Inverse Transformation Method and Acceptance-Rejection Sampling are explained, along with their mathematical foundations. Additionally, it covers concepts related to random sampling, sample statistics, and their distributions.

Uploaded by

Kabir Kumar
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

MTL390: Statistical Methods

Instructure: Dr. Biplab Paul


January 15, 2025

Lecture 5

Generation of Random Numbers

In order to use a computer to initiate a simulation study, we must be able to generate the
value of a U (0, 1) random variable; such variates are called random numbers. To generate
them, most computers have a built-in subroutine, called a random-number generator,
whose output is a sequence of pseudorandom numbers—a sequence of numbers that is,
for all practical purposes, indistinguishable from a sample from the U (0, 1) distribution.
Most random-number generators start with an initial value X0 , called the seed, and
then recursively compute values by specifying positive integers a, c, and m, and then
letting

Xn+1 = (aXn + c) mod m, n≥0 (1)

where the foregoing means that aXn + c is divided by m and the remainder is taken as
the value of Xn+1 . Thus, each Xn is either 0, 1, . . . , m − 1, and the quantity Xmn is taken
as an approximation to a U (0, 1) random variable.
It can be shown that, subject to suitable choices for a, c, and m, Equation 1 gives rise
to a sequence of numbers that look as if they were generated from independent U (0, 1)
random variables.

Simulating from Discrete Distributions


Here we will discuss a general method for simulating random variables from discrete
distributions. For example, suppose we want to simulate a random variable X with the
probability mass function
X
P {X = xj } = pj , j = 0, 1, . . . , where pj = 1.
j

To simulate X such that P {X = xj } = pj , generate a uniform random variable U over


(0, 1), and assign 


 x1 if U ≤ p1 ,
x if p1 < U ≤ p1 + p2 ,


 2


.
X = ..
 Pj−1 Pj
xj if i=1 pi < U ≤ i=1 pi ,




 .
..

1
This works because the probability of selecting xj is
( j−1 j
)
X X
P {X = xj } = P pi < U ≤ pi = pj .
i=1 i=1

Thus, X follows the desired distribution.


Simulating a Binomial Random Variable A binomial (n, p) random variable can
easily be simulated by recalling that it can be expressed as the sum of n independent
Bernoulli random variables. That is, if U1 , . . . , Un are independent uniform (0, 1) vari-
ables, then letting (
1 if Ui < p,
Xi =
0 otherwise,
it follows that n
X
X= Xi
i=1

is a binomial random variable with parameters n and p.

Simulating from Continuous Distributions


Simulation of univariate continuous distributions plays a crucial role in probabilistic mod-
eling, statistical analysis, and real-world applications. These techniques allow us to gener-
ate random samples from specified continuous distributions, which are essential for tasks
such as hypothesis testing, and risk analysis.
We present two general methods for using random numbers to simulate continuous
random variables.

Inverse Transformation Method


The Inverse Transformation Method is a widely recognized technique for simulating a
random variable with a continuous distribution. The method is based on the following
proposition:
Proposition 1. Let U represent a uniform random variable distributed between 0 and 1.
For any continuous cumulative distribution function (CDF) F , if we define the random
variable Y as follows:
Y = F −1 (U ),
then Y will have the distribution function F .
Here, F −1 (x) is the value y such that F (y) = x.
Proof. To prove this, we start with the definition of the distribution function of Y :

FY (a) = P {Y ≤ a} = P {F −1 (U ) ≤ a}.

Since F (x) is a monotonic function, the inequality F −1 (U ) ≤ a holds if and only if


U ≤ F (a). Substituting this into the equation:

FY (a) = P {U ≤ F (a)}.

2
Given that U is uniformly distributed over [0, 1], P {U ≤ F (a)} = F (a). Thus,

FY (a) = F (a).

This method is widely used because it establishes a direct link between uniform ran-
dom variables and arbitrary continuous distributions. We conclude that a random vari-
able X with a continuous distribution function F can be simulated by generating a
random number U from the uniform distribution and setting

X = F −1 (U ).

Simulation of the Univariate Exponential Distribution We use the Inverse Trans-


formation Method to simulate from the exponential distribution. If the cumulative dis-
tribution function (CDF) is given by:

F (x) = 1 − e−λx ,

then the inverse function F −1 (u) is the value of x such that:

1 − e−λx = u.

Solving for x, we get:


log(1 − u)
x=− .
λ
Since u is a random number uniformly distributed between 0 and 1, we can simplify
the expression by noting that 1 − u is also uniformly distributed over (0, 1). Hence, the
equation reduces to:
log(u)
x=− .
λ
Thus, to simulate a random variable X from the exponential distribution with rate pa-
rameter λ, we generate a random number u from and compute:
log(u)
X=− .
λ
In practice, the Inverse Transformation Method requires the cumulative distribution
function F to be well-defined and invertible. If F is not invertible or its inverse is
computationally infeasible, alternative simulation methods may be necessary. One such
method is acceptance-rejection sampling, which will be discussed now.

Acceptance-Rejection Sampling Method


Suppose that we have a method for simulating a random variable with a density function
g(x). We can use this method as the basis for simulating from a continuous distribution
with density f (x) by simulating Y from g and then accepting the simulated value with a
probability proportional to fg(Y
(Y )
)
. Let c be a constant such that:

f (y)
≤ c for all y.
g(y)
This leads to the following technique for simulating a random variable with density f .

3
1. Step 1: Simulate Y having density g and generate a random number U ∼ U (0, 1).
f (Y )
2. Step 2: If U ≤ cg(Y )
, set X = Y . Otherwise, return to Step 1.
Proposition 2. The random variable X, generated by the acceptance-rejection method,
has density function f .
Proof. Let X be the value obtained, and let N denote the number of necessary iterations.
Then, we compute P (X ≤ x) as follows
P (X ≤ x) = P (YN ≤ x).
f (Y )
From the acceptance-rejection method, the acceptance criterion U ≤ cg(Y )
implies
f (Y )
P (X ≤ x) = P (Y ≤ x | U ≤ ).
cg(Y )
Using the definition of conditional probability
f (Y )
P (Y ≤ x, U ≤ cg(Y )
)
P (X ≤ x) = ,
K
f (Y )
where K = P (U ≤ cg(Y )
) is the normalization constant. The joint density function of Y
and U , due to independence, is
f (y, u) = g(y), 0 ≤ u ≤ 1.
Thus,
Z Z f (y)
1 x cg(y)
P (X ≤ x) = g(y) du dy.
K −∞ 0
Evaluating the inner integral over u
Z f (y)
cg(y) f (y) f (y)
g(y) du = g(y) · = .
0 cg(y) c
Substituting this into the outer integral
Z x
1 f (y)
P (X ≤ x) = dy.
K −∞ c
Simplifying Z x
1
P (X ≤ x) = f (y) dy.
cK −∞
Letting
R∞ x → ∞ and using the fact that f (y) is a valid probability density function
( −∞ f (y) dy = 1), we find
Z ∞
1 1
1= f (y) dy = .
cK −∞ cK
Thus
cK = 1.
Therefore, we obtain Z x
P (X ≤ x) = f (y) dy,
−∞
which confirms that X has the cumulative distribution function of the target density f .
This completes the proof.

4
Simulation of the Standard Univariate Normal Distribution To simulate a
standard normal random variable Z, note first that the absolute value of Z has the
probability density function
r
2 −x2 /2
f (x) = e , 0<x<∞
π
We will start by simulating from the preceding density function by using the acceptance-
rejection method, with g(x) being the exponential density function with mean 1, i.e.,

g(x) = e−x , 0<x<∞

Now, note that: r  2  r


f (x) π x − 2x 2e
= exp − ≤ .
g(x) 2 2 π
q
2e
We take c = π
. Thus,
(x − 1)2
 
f (x)
= exp − .
cg(x) 2
Using the acceptance-rejection method, we can simulate the absolute value of a standard
normal random variable as follows

1. Generate independent random variables Y and U , where

• Y follows an exponential distribution with rate 1, and


• U follows a uniform distribution on (0, 1).

2. Accept Y as X if:
(Y − 1)2
 
U ≤ exp − .
2
Otherwise, return to step 1.

Once a random variable X is generated with the above method, a standard normal
random variable Z can be obtained by assigning Z to be either X or −X with equal
2
probability. In step 2, the acceptance criterion U ≤ exp − (Y −1)
2
can also be expressed
as:
(Y − 1)2
− log U ≥ .
2
Note that − log U is exponentially distributed with rate 1. Summing up, then, we have
the following algorithm

1. Generate independent random variables Y1 and Y2 , both being exponential with


rate 1.
2 2
2. If Y2 − (Y1 −1)
2
> 0, set Y = Y2 − (Y1 −1)
2
and go to step 3, otherwise go back to step
1.

3. Generate U ∼ U (0, 1) and set Z = Y1 if U ≤ 0.5, and Z = −Y1 otherwise.

5
Sample Moments and Their Distributions

Random Sampling
Let X be a random variable (RV) with distribution function (DF) F , and let X1 , X2 , . . . , Xn
be independent and identically distributed (iid) random variables with common DF F .
Then the collection X1 , X2 , . . . , Xn is known as a random sample of size n from the
DF F , or simply as n independent observations on X.
The set of n values x1 , x2 , . . . , xn is called a realization of the sample. Note that
the possible values of the random variables (X1 , X2 , . . . , Xn ) can be regarded as points
in Rn , which may be called the sample space.
If X1 , X2 , . . . , Xn is a random sample from the distribution function (DF) F , their
joint DF is given by:
n
Y

F (x1 , x2 , . . . , xn ) = F (xi ).
i=1

Statistic A function T of observable random variables X1 , X2 , . . . , Xn that does not


depend on any unknown parameter(s) is called a (sample) statistic.
The sample mean X̄ = n1 ni=1 X
P
2 1
Pi n a functionk of X1 , X2 , . . . , Xn . The sample me-
is
dian and sample variance S = n−1 j=1 (Xj − X̄) are also examples of statistics. It is
important to observe that even with random sampling, there is sampling variability or
error. That is, if we select different samples from the same population, a statistic will
take different values in different samples. Thus, a sample statistic is a random variable.

Sampling distribution The probability distribution of a sample statistic is called the


sampling distribution.
Theorem 1. Let X1 , . . . , Xn be a random sample of size n from a population with mean
µ and variance σ 2 . Then
  σ2
E X̄ = µ and Var X̄ = .
n
Proof. The mean and variance of X̄ are given by:
n
! n n
 1X 1X 1X nµ
E X̄ = E Xi = E(Xi ) = µ= = µ,
n i=1 n i=1 n i=1 n

and
n
!
 1X
Var X̄ = Var Xi
n i=1
n
1 X
= 2 Var(Xi ) (because Xk ’s are independent,and Var(aXi ) = a2 Var(Xi ))
n i=1
n
1 X 2 1 2 σ2
= σ = · nσ = .
n2 i=1 n2 n

6
2
We denote E[X̄] = µX̄ and Var(X̄) = σX̄ . Here, σX̄ is called the standard error of
the mean.

Theorem 2. Let X1 , . . . , Xn be a random sample from a population with mean µ and


variance σ 2 . Then E(S 2 ) = σ 2 .

Proof. It can be shown that:


n Pn
1 X 2 i=1 Xi2 − nX̄ 2
Xi − X̄ = .
n − 1 i=1 n−1

Hence,
Pn n
Xi2 − nX̄ 2

2 1 X n
E[S ] = E i=1
= E[Xi2 ] − E[X̄ 2 ].
n−1 n − 1 i=1 n−1
Using the fact that E(X 2 ) = Var(X) + µ2 we have:

σ2
 
2 1 2 2 2
E[S ] = n(σ + µ ) − n − nµ .
n−1 n

Simplifying:  
2 n 2 n−1 2 n−1 2
E[S ] = σ + µ − µ = σ2.
n−1 n n
Thus, the expected value of the sample variance is the same as the variance of the
population under consideration.

You might also like