0% found this document useful (0 votes)
23 views14 pages

chap 8 1

Introduction to probability chapter 8

Uploaded by

daiyifei36
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
23 views14 pages

chap 8 1

Introduction to probability chapter 8

Uploaded by

daiyifei36
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14

8

Transformations

The topic for this chapter is transformations of random variables and random vec-
tors. After applying a function to a random variable X or random vector X, the goal
is to find the distribution of the transformed random variable or joint distribution
of the transformed random vector.
Transformations of random variables appear all over the place in statistics. Here are
a few examples, to preview the kinds of transformations we’ll be looking at in this
chapter.
• Unit conversion: In one dimension, we’ve already seen how standardization and
location-scale transformations can be useful tools for learning about an entire
family of distributions. A location-scale change is linear, converting an r.v. X to
the r.v. Y = aX + b where a and b are constants (with a > 0).
There are also many situations in which we may be interested in nonlinear trans-
formations, e.g., converting from the dollar-yen exchange rate to the yen-dollar
exchange rate, or converting information like “Janet’s waking hours yesterday
consisted of 8 hours of work, 4 hours visiting friends, and 4 hours surfing the web”
to the format “Janet was awake for 16 hours yesterday; she spent 12 of that time
working, 14 of that time visiting friends, and 14 of that time surfing the web”. The
change of variables formula, which is the first result in this chapter, shows what
happens to the distribution when a random vector is transformed.
• Sums and averages as summaries: It is common in statistics to summarize n
observations by their sum or sample average. Turning X1 , . . . , Xn into the sum
T = X1 + · · · + Xn or sample mean X̄n = T /n is a transformation from Rn to R.
The term for a sum of independent random variables is convolution. We have
already encountered stories and MGFs as two techniques for dealing with convo-
lutions. In this chapter, convolution sums and integrals, which are based on the
law of total probability, will give us another way of obtaining the distribution of
a sum of r.v.s.
• Extreme values: In many contexts, we may be interested in the distribution of
the most extreme observations. For disaster preparedness, government agencies
may be concerned about the most extreme flood or earthquake in a 100-year
period; in finance, a portfolio manager with an eye toward risk management will
want to know the worst 1% or 5% of portfolio returns. In these applications,
we are concerned with the maximum or minimum of a set of observations. The

367
368 Introduction to Probability

transformation that sorts observations, turning X1 , . . . , Xn into the order statistics


min(X1 , . . . , Xn ), . . . , max(X1 , . . . , Xn ), is a transformation from Rn to Rn that is
not invertible. Order statistics are addressed in the last section in this chapter.
Furthermore, it is especially important to us to understand transformations because
of the approach we’ve taken to learning about the named distributions. Starting
from a few basic distributions, we have defined other distributions as transforma-
tions of these elementary building blocks, in order to understand how the named
distributions are related to one another. We’ll continue in that spirit here as we in-
troduce two new distributions, the Beta and Gamma, which generalize the Uniform
and Exponential.
We already have quite a few tools in our toolbox for dealing with transformations,
so let’s review those briefly. First, if we are only looking for the expectation of
g(X), LOTUS shows us the way: it tells us that the PMF or PDF of X is enough
for calculating E(g(X)). LOTUS also applies to functions of several r.v.s, as we
learned in the previous chapter.
If we need the full distribution of g(X), not just its expectation, our approach
depends on whether X is discrete or continuous.
• In the discrete case, we get the PMF of g(X) by translating the event g(X) = y
into an equivalent event involving X. To do so, we look for all values x such that
g(x) = y; as long as X equals any of these x’s, the event g(X) = y will occur. This
gives the formula X
P (g(X) = y) = P (X = x).
x:g(x)=y

For a one-to-one g, the situation is particularly simple, because there is only one
value of x such that g(x) = y, namely g 1 (y). Then we can use
1
P (g(X) = y) = P (X = g (y))

to convert between the PMFs of X and g(X), as also discussed in Section 3.7. For
example, it is extremely easy to convert between the Geometric and First Success
distributions.
• In the continuous case, a universal approach is to start from the CDF of g(X), and
translate the event g(X)  y into an equivalent event involving X. For general g,
we may have to think carefully about how to express g(X)  y in terms of X, and
there is no easy formula we can plug into. But when g is continuous and strictly
increasing, the translation is easy: g(X)  y is the same as X  g 1 (y), so
1 1
Fg(X) (y) = P (g(X)  y) = P (X  g (y)) = FX (g (y)).

We can then di↵erentiate with respect to y to get the PDF of g(X). This gives a
one-dimensional version of the change of variables formula, which generalizes to
invertible transformations in multiple dimensions.
Transformations 369

8.1 Change of variables

Theorem 8.1.1 (Change of variables in one dimension). Let X be a continuous


r.v. with PDF fX , and let Y = g(X), where g is di↵erentiable and strictly increasing
(or strictly decreasing). Then the PDF of Y is given by

dx
fY (y) = fX (x) ,
dy

where x = g 1 (y). The support of Y is all g(x) with x in the support of X.

Proof. Let g be strictly increasing. The CDF of Y is


1 1
FY (y) = P (Y  y) = P (g(X)  y) = P (X  g (y)) = FX (g (y)) = FX (x),

so by the chain rule, the PDF of Y is


dx
fY (y) = fX (x) .
dy
The proof for g strictly decreasing is analogous. In that case the PDF ends up as
fX (x) dx dx dx
dy , which is nonnegative since dy < 0 if g is strictly decreasing. Using | dy |,
as in the statement of the theorem, covers both cases. ⌅

When applying the change of variables formula, we can choose whether to compute
dx dy
dy , or compute dx and take the reciprocal. By the chain rule, these give the same
result, so we can do whichever is easier.
h 8.1.2. When finding the distribution of Y , be sure to:
• Check the assumptions of the change of variables theorem carefully if you wish to
apply it (if it doesn’t apply, a good strategy is to start with the CDF of Y ).
• Express your final answer for the PDF of Y as a function of y.
• Specify the support of Y .
The change of variables formula (in the strictly increasing g case) is easy to remem-
ber when written in the form

fY (y)dy = fX (x)dx,

which has an aesthetically pleasing symmetry to it. This formula also makes sense if
we think about units. For example, let X be a measurement in inches and Y = 2.54X
be the conversion into centimeters (cm). Then the units of fX (x) are inches 1
and the units of fY (y) are cm 1 , so it would be absurd to say something like
“fY (y) = fX (x)”. But dx is measured in inches and dy is measured in cm, so fY (y)dy
and fX (x)dx are unitless quantities, and it makes sense to equate them. Better yet,
370 Introduction to Probability

fX (x)dx and fY (y)dy have probability interpretations (recall from Chapter 5 that
fX (x)dx is essentially the probability that X is in a tiny interval of length dx,
centered at x), which makes it easier to think intuitively about what the change of
variables formula is saying.
The next two examples derive the PDFs of two r.v.s that are defined as transforma-
tions of a standard Normal r.v. In the first example the change of variables formula
applies; in the second example it does not.
Example 8.1.3 (Log-Normal PDF). Let X ⇠ N (0, 1), Y = eX . In Chapter 6
we named the distribution of Y the Log-Normal, and we found all of its moments
using the MGF of the Normal distribution. Now we can use the change of variables
formula to find the PDF of Y , since g(x) = ex is strictly increasing. Let y = ex , so
x = log y and dy/dx = ex . Then

dx 1 1
fY (y) = fX (x) = '(x) x = '(log y) , y > 0.
dy e y

Note that after applying the change of variables formula, we write everything on
the right-hand side in terms of y, and we specify the support of the distribution. To
determine the support, we just observe that as x ranges from 1 to 1, ex ranges
from 0 to 1.
We can get the same result by working from the definition of the CDF, translating
the event Y  y into an equivalent event involving X. For y > 0,

FY (y) = P (Y  y) = P (eX  y) = P (X  log y) = (log y),

so the PDF is again

d 1
fY (y) = (log y) = '(log y) , y > 0. ⇤
dy y

Example 8.1.4 (Chi-Square PDF). Let X ⇠ N (0, 1), Y = X 2 . The distribution


of Y is an example of a Chi-Square distribution, which is formally introduced in
Chapter 10. To find the PDF of Y , we can no longer apply the change of variables
formula because g(x) = x2 is not one-to-one; instead we start from the CDF.
By drawing the graph of y = x2 , we can see that the event X 2  y is equivalent to
p p
the event y  X  y. Then
p p p p p
FY (y) = P (X 2  y) = P ( yX y) = ( y) ( y) = 2 ( y) 1,

so
p 1 p
fY (y) = 2'( y) · y 1/2
= '( y)y 1/2
, y > 0. ⇤
2

The following example sheds light on an unexpected appearance of a decidedly


non-Normal distribution.
Transformations 371

Example 8.1.5 (Lighthouse). A lighthouse on a shore is shining light toward the


ocean at a random angle U (measured in radians), where
✓ ◆
⇡ ⇡
U ⇠ Unif , .
2 2

Consider a line which is parallel to the shore and 1 mile away from the shore, as
illustrated in Figure 8.1. An angle of 0 would mean the ray of light is perpendicular
to the shore, while an angle of ⇡/2 would mean the ray is along the shore, shining
to the right from the perspective of the figure.
Let X be the point that the light hits on the line, where the line’s origin is the point
on the line that is closest to the lighthouse. Find the distribution of X.

lighthouse
beach
ocean

1 mile U

0 X

FIGURE 8.1
A lighthouse shining light at a random angle U , viewed from above.

Solution: Looking at the right triangle in Figure 8.1, the length of the opposite side
of U divided by the length of the adjacent side of U is X/1 = X, so

X = tan(U ).

(The figure illustrates a case where U > 0 and, correspondingly, X > 0, but the
same relationship holds when U  0.) Let x be a possible value of X and u be the
corresponding possible value of U , so

x = tan(u) and u = arctan(x).

By the change of variables formula, which applies since tan is a di↵erentiable, strictly
increasing function on ( ⇡/2, ⇡/2),

du 1 1
fX (x) = fU (u) = · ,
dx ⇡ 1 + x2

which shows that X is Cauchy. In particular, this implies that E|X| is infinite (since
the expected value of a Cauchy does not exist), so on average X is infinitely far
from the origin of the line!
372 Introduction to Probability

The fact that X is Cauchy also makes sense in light of universality of the Uniform.
As shown in Example 7.1.25, the Cauchy CDF is
1
F (x) = arctan(x) + 0.5.

The inverse is F 1 (v) = tan (⇡ (v 0.5)) , so for V ⇠ Unif(0, 1) we have
1
F (V ) = tan (⇡ (V 0.5)) ⇠ Cauchy.

This agrees with our earlier result since ⇡ (V 0.5) ⇠ Unif( ⇡/2, ⇡/2). ⇤
We can also use the change of variables formula to find the PDF of a location-scale
transformation.
Example 8.1.6 (PDF of a location-scale transformation). Let X have PDF fX ,
and let Y = a + bX, with b 6= 0. Let y = a + bx, to mirror the relationship between
dy
Y and X. Then dx = b, so the PDF of Y is
✓ ◆
dx y a 1
fY (y) = fX (x) = fX . ⇤
dy b |b|

The change of variables formula generalizes to n dimensions, where it tells us how to


use the joint PDF of a random vector X to get the joint PDF of the transformed ran-
dom vector Y = g(X). The formula is analogous to the one-dimensional version, but
it involves a multivariate generalization of the derivative called a Jacobian matrix ;
see sections A.6 and A.7 of the math appendix for more about Jacobians.
Theorem 8.1.7 (Change of variables). Let X = (X1 , . . . , Xn ) be a continuous
random vector with joint PDF fX . Let g : A0 ! B0 be an invertible function,
where A0 and B0 are open1 subsets of Rn , A0 contains the support of X, and B0 is
the range of g.
Let Y = g(X), and mirror this by letting y = g(x). Since g is invertible, we also
have X = g 1 (Y) and x = g 1 (y).
@xi
Suppose that all the partial derivatives @yj exist and are continuous, so we can form
the Jacobian matrix 0 1
@x1 @x1 @x1
@y1 @y2 ... @yn
@x B .. .. C
=@ . . A.
@y @xn @xn @xn
@y1 @y2 ... @yn
Also assume that the determinant of this Jacobian matrix is never 0. Then the joint
PDF of Y is
@x
fY (y) = fX g 1 (y) · | | for y 2 B0 ,
@y
1
A set C ⇢ Rn is open if for each x 2 C, there exists ✏ > 0 such that all points with distance
less than ✏ from x are contained in C. Sometimes we take A0 = B0 = Rn , but often we would like
more flexibility for the domain and range of g. For example, if n = 2, and X1 and X2 have support
(0, 1), we may want to work with the open set A0 = (0, 1) ⇥ (0, 1) rather than all of R2 .
Transformations 373

and 0 otherwise. (The inner bars around the Jacobian say to take the determinant
and the outer bars say to take the absolute value.)
That is, to convert fX (x) to fY (y) we express the x in fX (x) in terms of y and then
multiply by the absolute value of the determinant of the Jacobian @x/@y.
As in the 1D case,
1
@x @y
= ,
@y @x
so we can compute whichever of the two Jacobians is easier, and then at the end
express the joint PDF of Y as a function of y.
We will not prove the change of variables formula here, but the idea is to apply the
change of variables formula from multivariable calculus and the fact that if A is a
region in A0 and B = {g(x) : x 2 A} is the corresponding region in B0 , then X 2 A
is equivalent to Y 2 B—they are the same event. So P (X 2 A) = P (Y 2 B), which
shows that Z Z
fX (x)dx = fY (y)dy.
A B

The change of variables formula from multivariable calculus (which is reviewed in


the math appendix) can then be applied to the integral on the left-hand side, with
the substitution x = g 1 (y).
h 8.1.8. A crucial conceptual di↵erence between transformations of discrete r.v.s
and transformations of continuous r.v.s is that with discrete r.v.s we don’t need a
Jacobian, while with continuous r.v.s we do need a Jacobian. For example, let X be
a positive r.v. and Y = X 3 . If X is discrete, then

P (Y = y) = P (X = y 1/3 )

converts between the PMFs. But if X is continuous, we need a Jacobian (which in


one dimension is just a derivative) to convert between the PDFs:

dx 1
fY (y) = fX (x) = fX (y 1/3 ) 2/3 .
dy 3y

Exercise 23 is a cautionary tale about someone who failed to use a Jacobian when
it was needed.
The next two examples apply the 2D change of variables formula.
Example 8.1.9 (Box-Muller). Let U ⇠ Unif(0, 2⇡), and let T ⇠ Expo(1) be inde-
pendent of U . Define
p p
X = 2T cos U and Y = 2T sin U.

Find the joint PDF of (X, Y ). Are they independent? What are their marginal
distributions?
374 Introduction to Probability

Solution:
The joint PDF of U and T is
1 t
fU,T (u, t) = e ,
2⇡
for u 2 (0, 2⇡) and t > 0. Viewing (X, Y ) as a point in the plane,

X 2 + Y 2 = 2T (cos2 U + sin2 U ) = 2T
p
is the squared distance from the origin and U is the angle; that is, ( 2T , U ) expresses
(X, Y ) in polar coordinates.
Since we can recover (U, T ) from (X, Y ), the transformation is invertible. The
Jacobian matrix
p !
@(x, y) 2t sin u p12t cos u
= p
@(u, t) 2t cos u p1 sin u
2t

exists, has continuous entries, and has absolute determinant

| sin2 u cos2 u| = 1
p p
(which is never 0). Then letting x = 2t cos u, y = 2t sin u to mirror the transfor-
mation from (U, T ) to (X, Y ), we have

@(u, t)
fX,Y (x, y) = fU,T (u, t) · | |
@(x, y)
1 t
= e ·1
2⇡
1 1 2 2
= e 2 (x +y )
2⇡
1 2 1 y 2 /2
= p e x /2 · p e ,
2⇡ 2⇡
for all real x and y.
The joint PDF fX,Y factors into a function of x times a function of y, so X and Y
are independent. Furthermore, we recognize the joint PDF as the product of two
standard Normal PDFs, so X and Y are i.i.d. N (0, 1) r.v.s! This result is called the
Box-Muller method for generating Normal r.v.s. ⇤
Example 8.1.10 (Bivariate Normal joint PDF). In Chapter 7, we saw some prop-
erties of the Bivariate Normal distribution and found its joint MGF. Now let’s find
its joint PDF.
Let (Z, W ) be BVN with N (0, 1) marginals and Corr(Z, W ) = ⇢. (If we want the
joint PDF when the marginals are not standard Normal, we can standardize both
components separately and use the result below.) Assume that 1 < ⇢ < 1 since
otherwise the distribution is degenerate (with Z and W perfectly correlated).
Transformations 375

As shown in Example 7.5.10, we can construct (Z, W ) as

Z=X
W = ⇢X + ⌧ Y,
p
with ⌧ = 1 ⇢2 and X, Y i.i.d. N (0, 1). We also need the inverse transformation.
Solving Z = X for X, we have X = Z. Plugging this into W = ⇢X +⌧ Y and solving
for Y , we have

X=Z
⇢ 1
Y = Z + W.
⌧ ⌧
The Jacobian is
0 1
1 0
@(x, y)
=@ ⇢ 1 A,
@(z, w)
⌧ ⌧
which has absolute determinant 1/⌧ . So by the change of variables formula,

@(x, y)
fZ,W (z, w) = fX,Y (x, y) · | |
@(z, w)
✓ ◆
1 1 2 2
= exp (x + y )
2⇡⌧ 2
✓ ◆
1 1 2 ⇢ 1 2
= exp (z + ( z + w) )
2⇡⌧ 2 ⌧ ⌧
✓ ◆
1 1 2 2
= exp (z + w 2⇢zw) , for all real z, w.
2⇡⌧ 2⌧ 2

In the last step we multiplied things out and used the fact that ⇢2 + ⌧ 2 = 1. ⇤

8.2 Convolutions

A convolution is a sum of independent random variables. As we mentioned earlier, we


often add independent r.v.s because the sum is a useful summary of an experiment
(in n Bernoulli trials, we may only care about the total number of successes), and
because sums lead to averages, which are also useful (in n Bernoulli trials, the
proportion of successes).
The main task in this section is to determine the distribution of T = X + Y ,
where X and Y are independent r.v.s whose distributions are known. In previous
chapters, we’ve already seen how stories and MGFs can help us accomplish this
376 Introduction to Probability

task. For example, we used stories to show that the sum of independent Binomials
with the same success probability is Binomial, and that the sum of i.i.d. Geometrics
is Negative Binomial. We used MGFs to show that a sum of independent Normals
is Normal.

A third method for obtaining the distribution of T is by using a convolution sum or


integral. The formulas are given in the following theorem. As we’ll see, a convolution
sum is nothing more than the law of total probability, conditioning on the value of
either X or Y ; a convolution integral is analogous.

Theorem 8.2.1 (Convolution sums and integrals). Let X and Y be independent


r.v.s and T = X + Y be their sum. If X and Y are discrete, then the PMF of T is
X
P (T = t) = P (Y = t x)P (X = x)
x
X
= P (X = t y)P (Y = y).
y

If X and Y are continuous, then the PDF of T is


Z 1
fT (t) = fY (t x)fX (x)dx
1
Z 1
= fX (t y)fY (y)dy.
1

Proof. For the discrete case, we use LOTP, conditioning on X:


X
P (T = t) = P (X + Y = t|X = x)P (X = x)
x
X
= P (Y = t x|X = x)P (X = x)
x
X
= P (Y = t x)P (X = x).
x

Conditioning on Y instead, we obtain the second formula for the PMF of T .

h 8.2.2. We use the assumption that X and Y are independent in order to get
from P (Y = t x|X = x) to P (Y = t x) in the last step. We are only justified
in dropping the condition X = x if the conditional distribution of Y given X = x
is the same as the marginal distribution of Y , i.e., X and Y are independent. A
common mistake is to assume that after plugging in x for X, we’ve “already used
the information” that X = x, when in fact we need an independence assumption to
drop the condition. Otherwise we destroy information without justification.

In the continuous case, since the value of a PDF at a point is not a probability, we
Transformations 377

first find the CDF, and then di↵erentiate to get the PDF. By LOTP,
Z 1
FT (t) = P (X + Y  t) = P (X + Y  t|X = x)fX (x)dx
1
Z 1
= P (Y  t x)fX (x)dx
1
Z 1
= FY (t x)fX (x)dx.
1

Again, we need independence to drop the condition X = x. To get the PDF, we


then di↵erentiate with respect to t, interchanging the order of integration and dif-
ferentiation. This gives
Z 1
fT (t) = fY (t x)fX (x)dx.
1

Conditioning on Y instead, we get the second formula for fT .


An alternative derivation uses the change of variables formula in two dimensions.
The only snag is that the change of variables formula requires an invertible trans-
formation from R2 to R2 , but (X, Y ) 7! X + Y maps R2 to R and is not invertible.
We can get around this by adding a redundant component to the transformation, in
order to make it invertible. Accordingly, we consider the invertible transformation
(X, Y ) 7! (X + Y, X) (using (X, Y ) 7! (X + Y, Y ) would be equally valid). Once we
have the joint PDF of X + Y and X, we integrate out X to get the marginal PDF
of X + Y .
Let T = X + Y , W = X, and let t = x + y, w = x. It may seem redundant to
give X the new name “W ”, but doing this makes it easier to distinguish between
pre-transformation variables and post-transformation variables: we are transforming
(X, Y ) 7! (T, W ). Then ✓ ◆
@(t, w) 1 1
=
@(x, y) 1 0
@(x,y)
has absolute determinant equal to 1, so | @(t,w) | is also 1. Thus, the joint PDF of
T and W is

fT,W (t, w) = fX,Y (x, y) = fX (x)fY (y) = fX (w)fY (t w),

and the marginal PDF of T is


Z 1 Z 1
fT (t) = fT,W (t, w)dw = fX (x)fY (t x)dx,
1 1

in agreement with our result above. ⌅


h 8.2.3. It is not hard to remember the convolution integral formula by reasoning
by analogy from X
P (T = t) = P (Y = t x)P (X = x)
x
378 Introduction to Probability

to Z 1
fT (t) = fY (t x)fX (x)dx.
1
But care is still needed. For example, Exercise 23 shows that an analogous-looking
formula for the PDF of the product of two independent continuous r.v.s is wrong:
a Jacobian is needed (for convolutions, the absolute Jacobian determinant is 1 so it
isn’t noticeable in the convolution integral formula).
Since convolution sums are just the law of total probability, we have already used
them in previous chapters without mentioning the word convolution; see, for ex-
ample, the first and most tedious proof of Theorem 3.8.9 (sum of independent
Binomials), as well as the proof of Theorem 4.8.1 (sum of independent Poissons).
In the following examples, we find the distribution of a sum of Exponentials and a
sum of Uniforms using a convolution integral.
i.i.d.
Example 8.2.4 (Exponential convolution). Let X, Y ⇠ Expo( ). Find the dis-
tribution of T = X + Y .
Solution:
For t > 0, the convolution formula gives
Z 1 Z t
(t x) x
fT (t) = fY (t x)fX (x)dx = e e dx,
1 0

where we restricted the integral to be from 0 to t since we need t x > 0 and x > 0
for the PDFs inside the integral to be nonzero. Simplifying, we have
Z t
2
fT (t) = e t dx = 2 te t , for t > 0.
0

This is known as the Gamma(2, ) distribution. We will introduce the Gamma


distribution in detail in Section 8.4. ⇤
i.i.d.
Example 8.2.5 (Uniform convolution). Let X, Y ⇠ Unif(0, 1). Find the distri-
bution of T = X + Y .
Solution:
The PDF of X (and of Y ) is

1, x 2 (0, 1),
g(x) =
0, otherwise.

The convolution formula gives


Z 1 Z 1
fT (t) = fY (t x)fX (x)dx = g(t x)g(x)dx.
1 1

The integrand is 1 if and only if 0 < t x < 1 and 0 < x < 1; this is a parallelogram-
shaped constraint. Equivalently, the constraint is max(0, t 1) < x < min(t, 1).
Transformations 379

1.0
0.8
0.6
x
0.4
0.2
0.0

0.0 0.5 1.0 1.5 2.0


t

FIGURE 8.2
Region in the (t, x)-plane where g(t x)g(x) is 1.

From Figure 8.2, we see that for 0 < t  1, x is constrained to be in (0, t), and for
1 < t < 2, x is constrained to be in (t 1, 1). Therefore, the PDF of T is a piecewise
linear function:
8 Z t
>
>
>
< dx = t for 0 < t  1,
0
fT (t) = Z 1
>
>
>
: dx = 2 t for 1 < t < 2.
t 1

Figure 8.3 plots the PDF of T . It is shaped like a triangle with vertices at 0, 1, and
2, so it is called the Triangle(0, 1, 2) distribution.
Heuristically, it makes sense that T is more likely to take on values near the mid-
dle than near the extremes: a value near 1 can be obtained if both X and Y are
moderate, if X is large but Y is small, or if Y is large but X is small. In contrast, a
value near 2 is only possible if both X and Y are large. Thinking back to Example
3.2.5, the PMF of the sum of two die rolls was also shaped like a triangle. A single
die roll has a Discrete Uniform distribution on the integers 1 through 6, so in that
problem we were looking at a convolution of two Discrete Uniforms. It makes sense
that the PDF we obtained here is similar in shape. ⇤

8.3 Beta

In this section and the next, we will introduce two continuous distributions, the
Beta and Gamma, which are related to several named distributions we have already
380 Introduction to Probability

1.0
0.8
0.6
PDF
0.4
0.2
0.0

0.0 0.5 1.0 1.5 2.0


t

FIGURE 8.3
PDF of T = X + Y , where X and Y are i.i.d. Unif(0, 1).

studied and are also related to each other via a shared story. This is an interlude
from the subject of transformations, but we’ll eventually need to use a change of
variables to tie the Beta and Gamma distributions together.
The Beta distribution is a continuous distribution on the interval (0, 1). It is a
generalization of the Unif(0, 1) distribution, allowing the PDF to be non-constant
on (0, 1).
Definition 8.3.1 (Beta distribution). An r.v. X is said to have the Beta distribution
with parameters a and b, where a > 0 and b > 0, if its PDF is
1
f (x) = xa 1
(1 x)b 1
, 0 < x < 1,
(a, b)
where the constant (a, b) is chosen to make the PDF integrate to 1. We write this
as X ⇠ Beta(a, b).
Taking a = b = 1, the Beta(1, 1) PDF is constant on (0, 1), so the Beta(1, 1) and
Unif(0, 1) distributions are the same. By varying the values of a and b, we get PDFs
with a variety of shapes; Figure 8.4 shows four examples. Here are a couple of general
patterns:
• If a < 1 and b < 1, the PDF is U-shaped and opens upward. If a > 1 and b > 1,
the PDF opens down.
• If a = b, the PDF is symmetric about 1/2. If a > b, the PDF favors values larger
than 1/2; if a < b, the PDF favors values smaller than 1/2.
By definition, the constant (a, b) satisfies
Z 1
(a, b) = xa 1 (1 x)b 1
dx.
0

You might also like