0% found this document useful (0 votes)
338 views

CS1 CMP Upgrade 2020

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
338 views

CS1 CMP Upgrade 2020

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 94

CS1 – CMP upgrade Page 1

Subject CS1
CMP Upgrade 2019/20

CMP Upgrade

This CMP Upgrade lists the changes to the Syllabus objectives, Core Reading and the ActEd
material since last year that might realistically affect your chance of success in the exam. It is
produced so that you can manually amend your 2019 CMP to make it suitable for study for the
2020 exams. It includes replacement pages and additional pages where appropriate.
Alternatively, you can buy a full set of up-to-date Course Notes / CMP at a significantly reduced
price if you have previously bought the full-price Course Notes / CMP in this subject. Please
see our 2020 Student Brochure for more details.

This CMP Upgrade contains:

 all significant changes to the Syllabus objectives and Core Reading.

 additional changes to the 2019 ActEd Course Notes and Series X Assignments that will
make them suitable for study for the 2020 exams.

In order to make the process of updating your course as painless as possible, we have
wherever possible provided replacement pages with this upgrade note.

However, all the chapter numbers have changed this year. This means that the process of
replacing pages is slightly more complicated. We have tried to spell out in detail exactly which
old pages need to be removed, and what they should be replaced with. Please take care.

In addition, because the courses are printed double sided, we have provided replacements for
both sides of any page on which there is an update that cannot quickly be written in by hand.

The Actuarial Education Company © IFE: 2020 Examinations


Page 2 CS1 – CMP upgrade

0 Changes to the Syllabus objectives

A new section of material has been added to the CS1 Syllabus. This is Section 2.1 – Data
analysis. Syllabus items 2.1.1 to 2.1.4 are new (they have been transferred from Subject CM1).
This has knock-on effects for some other syllabus numbering; Section 2.1 has become Section
2.2 and Section 2.2 has become Section 2.3.

Replace old pages 19-24 from the 2019 Study Guide with new pages 20-25 from the 2020
Study Guide.

There are no other changes to the syllabus.

© IFE: 2020 Examinations The Actuarial Education Company


CS1 – CMP upgrade Page 3

1 Changes to the Core Reading

There is an extra section of Core Reading to accompany Syllabus item 2.1. This is incorporated
into the new Chapter 1 of the course notes (see below).

Chapter numbers and page numbers below refer to the ActEd notes, not to the original Core
Reading document.

Chapter 7, Page 39. Two typos have been corrected in the R-box. In the second line, “stores”
should be “store”. In the last line in the box, there should be an extra closing bracket right at
the end of the line, ie there should be four closing brackets in all.

Chapter 10, Page 25. The Core Reading in the second R box has been deleted.

Chapter 11, Page 16. The last equation on the page should say SSRES /  2   n2 2 .

Chapter 12, Page 18. In the box at the foot of the page, the Core Reading should say “In R, to
use a gamma distribution…”.

Chapter 12, Page 42. A new paragraph has been added just before the second R-box. Replace
old Chapter 12 pages 41 and 42 with new Chapter 13 pages 41 and 42.

Chapter 12, Page 44. The definition of the AIC has been amended. It now reads:

AIC  2  log LM  2  number of parameters

where log LM is the log likelihood of the model under consideration.

Chapter 13, Pages 17-18. The Core Reading has been altered. Replace old Chapter 13 pages
17-18 with new Chapter 14 pages 17-18.

Chapter 14, Page 19. A new paragraph of Core Reading has been added. Replace old Chapter
14 pages 19-22 with new Chapter 15 pages 19-23.

Chapter 15, Page 31. In the 4-line equation, there are two xij expressions. These should not
have bars on them – they should just say xij .

The Actuarial Education Company © IFE: 2020 Examinations


Page 4 CS1 – CMP upgrade

2 Major Changes to the ActEd Course Notes

There is an additional chapter of the course notes (which incorporates the new Core Reading)
relating to new Syllabus item 2.1. This is Chapter 1 of the 2020 course notes. This means that
the chapter numbers of all the other chapters have changed – Chapter 1 in the 2019 course
has become Chapter 2 in the 2020 course, and so on.

A copy of the whole of new Chapter 1 is attached to this upgrade note. Insert this before old
Chapter 1.

In addition, the two chapters on regression have been combined into a single (long) chapter,
Chapter 12. This covers both the simple linear regression model and the multiple regression
model. So there is no longer a Chapter 11b.

Although Chapters 11 and 11b have been combined, the material remains very much the
same. So we have not attached any of this material.

As a result, cross references to other chapters throughout the course notes have been
updated.

We do not believe that students should need a new copy of the notes because of these
changes. However, students wanting to buy a new set of notes can obtain them from ActEd at
a significantly reduced price. See our current brochure for further details.

3 Minor Changes to the Course Notes – typos etc

A number of corrections have been made to the course notes this year. These are listed here.
The chapter and page numbers refer to the OLD (2019) version of the course notes.

Since the courses are printed double-sided, we provide double sided replacement pages. So if
for example there is a replacement for page 5, it will contain page 6 on the back, even if there
are no changes to page 6.

Chapter 1

Page 6. An additional sentence has been added. Replace old Chapter 1 pages 5-6 with new
Chapter 2 pages 5-6.

Page 10. An additional sentence has been added. Replace old Chapter 1 pages 9-10 with new
Chapter 2 pages 9-10.

Page 14. An additional sentence has been added. Replace old Chapter 1 pages 13-14 with
new Chapter 2 pages 13-14.

Chapter 3

Page 3. A phrase of CR has been deleted. Replace old Chapter 3 pages 3-4 with new Chapter 4
pages 3-4.

© IFE: 2020 Examinations The Actuarial Education Company


CS1 – CMP upgrade Page 5

Page 11. A new section of material has been added just before the start of old Section 1.4.
Please add Chapter 4, new pages 10a and 10b between old Chapter 3, pages 10 and 11. Delete
the solution at the top of old page 11. Do not remove any pages.

Page 45. An additional question has been added to the Practice Questions. Replace old
Chapter 3 pages 45 to 57 with new Chapter 4 pages 47-60.

Chapter 4

Page 5. A new question and solution have been added. Replace old Chapter 4 pages 5-6 with
new Chapter 5 pages 5-6.

Chapter 5

Pages 15-16. Several extra paragraphs of explanation have been added here. Replace old
Chapter 5 pages 15-16 with new Chapter 6 pages 15-16.

Chapter 6

Page 18. A new paragraph has been added. Replace old Chapter 6 pages 17-18 with new
Chapter 7 pages 16-17.

Page 20. A typo has been corrected. “just over 5%” should read “just less than 5%”.

Chapter 7

107
Page 19. Typo corrected. The penultimate equation should say: ˆ   3.057 .
35

Page 37. An extra sentence has been added. Replace old Chapter 7 pages 37-38 with new
Chapter 8 pages 37-38.

Page 58. An extra sentence has been added. Replace old Chapter 7 pages 57-58 with new
Chapter 8 pages 57-58.

Chapter 9

Page 10. Typo corrected in the last line. “Determine the probability …”.

Page 35. Typo corrected in the eighth line. “This is a one-sided test and our statistic is less
than 2.132 …”.

Page 51. Typo corrected. “ …the characteristic is dependent on the mother’s age.” The same
correction is made on page 52.

Page 54. An additional question has been added at the end of the chapter. Add new
Chapter 10 page 54 after old Chapter 9 page 54. Do not remove any pages.

Chapter 10

Page 31. Typo corrected. “ we could also use a scree test.”

The Actuarial Education Company © IFE: 2020 Examinations


Page 6 CS1 – CMP upgrade

Chapter 11

Page 14. There are some missing symbols in the last two paragraphs. These paragraphs
should read as follows:

Note: With the full model in place the Yi ’s have normal distributions and it is

possible to derive maximum likelihood estimators of the parameters  ,  , and


 2 (since maximum likelihood estimation requires us to know the distribution
whereas least squares estimation does not).

It is possible to show that the maximum likelihood estimators of  and  are the
same as the least squares estimators, but the MLE of  2 has a different denominator
from the least squares estimator).

Chapter 11b

Page 7. Typo corrected. “Therefore it is a similar measure …”.

Chapter 12

Page 46. Two additional lines of text have been added. Replace old Chapter 12 pages 45-48
with new Chapter 13 pages 45-48.

© IFE: 2020 Examinations The Actuarial Education Company


CS1 – CMP upgrade Page 7

4 Changes to the X Assignments

Question X1.3 has been replaced with a new question. Updated pages attached for both
question and solution. There are no other changes (apart from updating the references to the
chapters).

The Actuarial Education Company © IFE: 2020 Examinations


Page 8 CS1 – CMP upgrade

5 Other tuition services

In addition to the CMP you might find the following services helpful with your study.

5.1 Study material

We also offer the following study material in Subject CS1:


 Online Classroom
 Flashcards
 Revision Notes
 ASET (ActEd Solutions with Exam Technique) for the 2014-2017 papers and separately for
the 2019 papers
 Paper B online resources (PBOR)
 Mock Exam
 Additional Mock Pack.

For further details on ActEd’s study materials, please refer to the 2020 Student Brochure,
which is available from the ActEd website at www.ActEd.co.uk.

5.2 Tutorials

We offer the following tutorials in Subject CS1:


 a set of Regular Tutorials (lasting four full days) – either face to face or online
 a Block Tutorial (lasting four full days) – either face to face or online
 a Paper B preparation day (one day).

For further details on ActEd’s tutorials, please refer to our latest Tuition Bulletin, which is
available from the ActEd website at www.ActEd.co.uk.

© IFE: 2020 Examinations The Actuarial Education Company


CS1 – CMP upgrade Page 9

5.3 Marking

You can have your attempts at any of our assignments or mock exams marked by ActEd. When
marking your scripts, we aim to provide specific advice to improve your chances of success in
the exam and to return your scripts as quickly as possible.

For further details on ActEd’s marking services, please refer to the 2020 Student Brochure,
which is available from the ActEd website at www.ActEd.co.uk.

5.4 Feedback on the study material

ActEd is always pleased to get feedback from students about any aspect of our study
programmes. Please let us know if you have any specific comments (eg about certain sections
of the notes or particular questions) or general suggestions about how we can improve the
study material. We will incorporate as many of your suggestions as we can when we update
the course material each year.

If you have any comments on this course please send them by email to [email protected].

The Actuarial Education Company © IFE: 2020 Examinations


All study material produced by ActEd is copyright and is sold
for the exclusive use of the purchaser. The copyright is
owned by Institute and Faculty Education Limited, a
subsidiary of the Institute and Faculty of Actuaries.
Unless prior authority is granted by ActEd, you may not hire
out, lend, give out, sell, store or transmit electronically or
photocopy any part of the study material.
You must take care of your study material to ensure that it
is not used or copied by anybody else.
Legal action will be taken if these terms are infringed. In
addition, we may seek to take disciplinary action through
the profession or through your employer.
These conditions remain in force after you have finished
using the course.

© IFE: 2020 Examinations The Actuarial Education Company


Page 20 CS1: Study Guide

2.2 Subject CS1 – Syllabus and Core Reading

Syllabus
The Syllabus for Subject CS1 is given here. To the right of each objective are the chapter numbers
in which the objective is covered in the ActEd course.

Aim

The aim of the Actuarial Statistics 1 subject is to provide a grounding in mathematical and
statistical techniques that are of particular relevance to actuarial work.

Competences

On successful completion of this subject, a student will be able to:


1. describe the essential features of statistical distributions
2. summarise data using appropriate statistical analysis, descriptive statistics and graphical
presentation
3. describe and apply the principles of statistical inference
4. describe, apply and interpret the results of the linear regression model and generalised
linear models
5. explain the fundamental concepts of Bayesian statistics and use them to compute
Bayesian estimators.

Syllabus topics

1. Random variables and distributions (20%)


2. Data analysis (15%)
3. Statistical inference (20%)
4. Regression theory and applications (30%)
5. Bayesian statistics (15%)

The weightings are indicative of the approximate balance of the assessment of this subject
between the main syllabus topics, averaged over a number of examination sessions.

The weightings also have a correspondence with the amount of learning material underlying each
syllabus topic. However, this will also reflect aspects such as:
 the relative complexity of each topic, and hence the amount of explanation and support
required for it
 the need to provide thorough foundation understanding on which to build the other
objectives
 the extent of prior knowledge which is expected
 the degree to which each topic area is more knowledge or application based.

© IFE: 2020 Examinations The Actuarial Education Company


CS1: Study Guide Page 21

Assumed knowledge

This subject assumes that a student will be competent in the following elements of foundational
mathematics and basic statistics:

1 Summarise the main features of a data set (exploratory data analysis)

1.1 Summarise a set of data using a table or frequency distribution, and display it
graphically using a line plot, a box plot, a bar chart, histogram, stem and leaf plot,
or other appropriate elementary device.

1.2 Describe the level/location of a set of data using the mean, median, mode, as
appropriate.

1.3 Describe the spread/variability of a set of data using the standard deviation,
range, interquartile range, as appropriate.

1.4 Explain what is meant by symmetry and skewness for the distribution of a set of
data.

2 Probability

2.1 Set functions and sample spaces for an experiment and an event.

2.2 Probability as a set function on a collection of events and its basic properties.

2.3 Calculate probabilities of events in simple situations.

2.4 Derive and use the addition rule for the probability of the union of two events.

2.5 Define and calculate the conditional probability of one event given the occurrence
of another event.

2.6 Derive and use Bayes’ Theorem for events.

2.7 Define independence for two events, and calculate probabilities in situations
involving independence.

3 Random variables

3.1 Explain what is meant by a discrete random variable, define the distribution
function and the probability function of such a variable, and use these functions
to calculate probabilities.

3.2 Explain what is meant by a continuous random variable, define the distribution
function and the probability density function of such a variable, and use these
functions to calculate probabilities.

3.3 Define the expected value of a function of a random variable, the mean, the
variance, the standard deviation, the coefficient of skewness and the moments of
a random variable, and calculate such quantities.

The Actuarial Education Company © IFE: 2020 Examinations


Page 22 CS1: Study Guide

3.4 Evaluate probabilities associated with distributions (by calculation or by referring


to tables as appropriate).

3.5 Derive the distribution of a function of a random variable from the distribution of
the random variable.

Detailed syllabus objectives

1 Random variables and distributions (20%)


1.1 Define basic univariate distributions and use them to calculate probabilities, quantiles and
moments. (Chapter 2)
1.1.1 Define and explain the key characteristics of the discrete distributions: geometric,
binomial, negative binomial, hypergeometric, Poisson and uniform on a finite set.

1.1.2 Define and explain the key characteristics of the continuous distributions: normal,
lognormal, exponential, gamma, chi-square, t , F , beta and uniform on an interval.

1.1.3 Evaluate probabilities and quantiles associated with distributions (by calculation
or using statistical software as appropriate).

1.1.4 Define and explain the key characteristics of the Poisson process and explain the
connection between the Poisson process and the Poisson distribution.

1.1.5 Generate basic discrete and continuous random variables using the inverse
transform method.

1.1.6 Generate discrete and continuous random variables using statistical software.

1.2 Independence, joint and conditional distributions, linear combinations of random


variables (Chapter 4)
1.2.1 Explain what is meant by jointly distributed random variables, marginal
distributions and conditional distributions.

1.2.2 Define the probability function/density function of a marginal distribution and of a


conditional distribution.

1.2.3 Specify the conditions under which random variables are independent.

1.2.4 Define the expected value of a function of two jointly distributed random
variables, the covariance and correlation coefficient between two variables, and
calculate such quantities.

1.2.5 Define the probability function/density function of the sum of two independent
random variables as the convolution of two functions.

1.2.6 Derive the mean and variance of linear combinations of random variables.

1.2.7 Use generating functions to establish the distribution of linear combinations of


independent random variables.

© IFE: 2020 Examinations The Actuarial Education Company


CS1: Study Guide Page 23

1.3 Expectations, conditional expectations (Chapter 5)


1.3.1 Define the conditional expectation of one random variable given the value of
another random variable, and calculate such a quantity.

1.3.2 Show how the mean and variance of a random variable can be obtained from
expected values of conditional expected values, and apply this.

1.4 Generating functions (Chapter 3)


1.4.1 Define and determine the moment generating function of random variables.

1.4.2 Define and determine the cumulant generating function of random variables.

1.4.3 Use generating functions to determine the moments and cumulants of random
variables, by expansion as a series or by differentiation, as appropriate.

1.4.4 Identify the applications for which a moment generating function, a cumulant
generating function and cumulants are used, and the reasons why they are used.

1.5 Central Limit Theorem – statement and application (Chapter 6)


1.5.1 State the Central Limit Theorem for a sequence of independent, identically
distributed random variables.

1.5.2 Generate simulated samples from a given distribution and compare the sampling
distribution with the Normal.

2 Data analysis (15%)


2.1 Data analysis (Chapter 1)
2.1.1 Describe the possible aims of data analysis (eg descriptive, inferential, and
predictive).

2.1.2 Describe the stages of conducting a data analysis to solve real-world problems in a
scientific manner and describe tools suitable for each stage.

2.1.3 Describe sources of data and explain the characteristics of different data sources,
including extremely large data sets.

2.1.4 Explain the meaning and value of reproducible research and describe the
elements required to ensure a data analysis is reproducible.

2.2 Exploratory data analysis (Chapter 11)


2.2.1 Describe the purpose of exploratory data analysis.

2.2.2 Use appropriate tools to calculate suitable summary statistics and undertake
exploratory data visualizations.

2.2.3 Define and calculate Pearson’s, Spearman’s and Kendall’s measures of correlation
for bivariate data, explain their interpretation and perform statistical inference as
appropriate.

The Actuarial Education Company © IFE: 2020 Examinations


Page 24 CS1: Study Guide

2.2.4 Use Principal Components Analysis to reduce the dimensionality of a complex


data set.

2.3 Random sampling and sampling distributions (Chapter 7)


2.3.1 Explain what is meant by a sample, a population and statistical inference.

2.3.2 Define a random sample from a distribution of a random variable.

2.3.3 Explain what is meant by a statistic and its sampling distribution.

2.3.4 Determine the mean and variance of a sample mean and the mean of a sample
variance in terms of the population mean, variance and sample size.

2.3.5 State and use the basic sampling distributions for the sample mean and the
sample variance for random samples from a normal distribution.

2.3.6 State and use the distribution of the t -statistic for random samples from a
normal distribution.

2.3.7 State and use the F distribution for the ratio of two sample variances from
independent samples taken from normal distributions.

3 Statistical inference (20%)


3.1 Estimation and estimators (Chapter 8)
3.1.1 Describe and apply the method of moments for constructing estimators of
population parameters.

3.1.2 Describe and apply the method of maximum likelihood for constructing
estimators of population parameters.

3.1.3 Define the terms: efficiency, bias, consistency and mean squared error.

3.1.4 Define and apply the property of unbiasedness of an estimator.

3.1.5 Define the mean square error of an estimator, and use it to compare estimators.

3.1.6 Describe and apply the asymptotic distribution of maximum likelihood estimators.

3.1.7 Use the bootstrap method to estimate properties of an estimator.

© IFE: 2020 Examinations The Actuarial Education Company


CS1: Study Guide Page 25

3.2 Confidence intervals (Chapter 9)


3.2.1 Define in general terms a confidence interval for an unknown parameter of a
distribution based on a random sample.

3.2.2 Derive a confidence interval for an unknown parameter using a given sampling
distribution.

3.2.3 Calculate confidence intervals for the mean and the variance of a normal
distribution.

3.2.4 Calculate confidence intervals for a binomial probability and a Poisson mean,
including the use of the normal approximation in both cases.

3.2.5 Calculate confidence intervals for two-sample situations involving the normal
distribution, and the binomial and Poisson distributions using the normal
approximation.

3.2.6 Calculate confidence intervals for a difference between two means from paired
data.

3.2.7 Use the bootstrap method to obtain confidence intervals.

3.3 Hypothesis testing and goodness of fit (Chapter 10)


3.3.1 Explain what is meant by the terms null and alternative hypotheses, simple and
composite hypotheses, type I and type II errors, test statistic, likelihood ratio,
critical region, level of significance, probability-value and power of a test.

3.3.2 Apply basic tests for the one-sample and two-sample situations involving the
normal, binomial and Poisson distributions, and apply basic tests for paired data.

3.3.3 Apply the permutation approach to non-parametric hypothesis tests.

3.3.4 Use a chi-square test to test the hypothesis that a random sample is from a
particular distribution, including cases where parameters are unknown.

3.3.5 Explain what is meant by a contingency (or two-way) table, and use a chi-square
test to test the independence of two classification criteria.

The Actuarial Education Company © IFE: 2020 Examinations


CS1-13: Generalised linear models Page 41

Key Information
The scaled deviance for a particular model M is defined as:

SDM  2   S   M 

The deviance for the current model, DM , is defined such that:

DM
scaled deviance =

Remember that  is a scale parameter, so it seems sensible that it should be used to connect the
deviance with the scaled deviance. For a Poisson or exponential distribution,   1 so the scaled
deviance and the deviance are identical.

The smaller the deviance, the better the model from the point of view of model fit.

However, there will be a trade-off here. A model with many parameters will fit the data well.
However a model with too many parameters will be difficult and complex to build, and will not
necessarily lead to better prediction in the future. It is possible for models to be ‘over-
parameterised’, ie factors are included that lead to a slightly, but not significantly, better fit.
When choosing linear models, we will usually need to strike a balance between a model with too
few parameters (which will not take account of factors that have a substantial impact on the data,
and will therefore not be sensitive enough) and one with too many parameters (which will be too
sensitive to factors that really do not have much effect on the results). We use the principle of
parsimony here – that is we choose the simplest model that does the job.

This can be illustrated by considering the case when the data are normally distributed.

In this case, the log-likelihood for a sample of size n is:

n
( y ; , )   log fY ( y i ; i , )
i 1

n
n ( y i   i )2

2
log2 2   2 2
i 1

The likelihood function for a random sample of size n is f (y1 ) f (y2 )... f (y n ) . Note that when we
take logs, we add the logs of the individual PDF terms to get the joint likelihood. Recall that for
the normal distribution the natural parameter is just the mean,  i   i .

For the saturated model, the parameter  i is estimated by y i , and so the second term
disappears. Thus, the scaled deviance (twice the difference between the values of the log-
likelihood under the current and saturated models) is

n ( y i  ˆi )2

i 1 2

where ˆi is the fitted value for the current model.

The Actuarial Education Company © IFE: 2020 Examinations


Page 42 CS1-13: Generalised linear models

The deviance (remembering that the scale parameter    2 ), is the well-known residual
sum of squares:
n
 ( y i  ˆi )2
i 1

This is why the deviance is defined with a factor of two in it, so that for the normal model the
deviance is equal to the residual sum of squares that we met in linear regression.

The residual deviance (ie the deviance after all the covariates have been included) is
displayed as part of the results from summary(model). For example:

In R we can obtain a breakdown of how the deviance is reduced by each covariate added
sequentially by using anova(model). However, unlike for linear regression, this command
does not automatically carry out a test.

And recall that the smaller the residual (left over) deviance the better the fit of the model.

5.5 Using scaled deviance and Akaike’s Information Criterion to choose


between models
Adding more covariates will always improve the fit and thus decrease the deviance,
however we need to determine whether adding a particular covariate leads to a significant
decrease in the deviance.

For normally distributed data, the scaled deviance has a  2 distribution. Since the scale
parameter for the normal    2 must be estimated we would compare models by taking
ratios of sum-of-squares and using F-tests (as in the analysis of variance for linear
regression models).
We covered this in Section 4.3 from the previous chapter.
Thus, if we want to decide if Model 2 (which has p  q parameters and scaled deviance S2)
is a significant improvement over Model 1 (which has p parameters and scaled deviance
(S1  S2 ) q
S1), we see if is greater than the 5% value for the Fq ,n  p  q distribution.
S2 (n  ( p  q ))

The code for comparing two normally distributed models, model1 and model2, in R is:
anova(model1, model2, test=”F”)

In the case of data that are not normally distributed, the scale parameter may be known (for
example, for the Poisson distribution   1 ), and the deviance is only asymptotically a  2
distribution. For these reasons, the common procedure is to compare two models by
looking at the difference in the scaled deviance and comparing with a  2 distribution.

Since the distributions are only asymptotically normal the F test will not be very accurate. Hence,
by simply comparing two approximate  2 distributions we will get a better result.

© IFE: 2020 Examinations The Actuarial Education Company


CS1‐14: Bayesian statistics  Page 17 

Solution 

The proportionality argument will be used and any constants simply omitted as appropriate.

Prior:

f ( )    1(1   ) 1

 (   )
omitting the constant .
 ( ) ( )

Likelihood:

f ( x |  )   x (1   )n  x

n
omitting the constant   .
x

Combining the prior PDF with the likelihood function gives the posterior PDF: 

f ( | x )   x (1   )n  x .   1(1   ) 1   x  1(1   )n  x   1

Now it can be seen that, apart from the appropriate constant of proportionality, this is the
density of a beta random variable. Therefore the immediate conclusion is that the posterior
distribution of  given X  x is beta with parameters x   and n  x   .

It can also be seen that the posterior density and the prior density belong to the same family
of distributions. Thus the conjugate prior for the binomial distribution is the beta
distribution.

The Bayesian estimate under quadratic loss is the mean of this distribution, that is:

x  x 

( x   )  (n  x   ) n    

We can use R to simulate this Bayesian estimate.  

The R code to obtain the Monte Carlo Bayesian estimate of the above is:

pm <- rep(0,M)

for (i in 1:M)

{theta <- rbeta(1,alpha,beta)

x <- rbinom(1,n,theta)

pm[i] <- (x+alpha)/(n+alpha+beta)}

The Actuarial Education Company   IFE: 2020 Examinations 
Page 18   CS1‐14: Bayesian statistics 

The average of these Bayesian estimates under quadratic loss is given by:

mean(pm)

Question 

A random sample of size 10 from a Poisson distribution with mean    yields the following data 
values: 

3, 4, 3, 1, 5, 5, 2, 3, 3, 2 

The prior distribution of    is  Gamma(5,2) .   

Calculate the Bayesian estimate of    under squared error loss.  

Solution 

Using the formula for the PDF of the gamma distribution given on page 12 of the Tables, we see 
that the prior PDF of    is: 

2 5 4 2 
fprior ( )   e  ,      0
(5)

Alternatively, we could say: 

fprior ( )   4 e 2  ,      0

The likelihood function obtained from the data is: 

L( )  P ( X 1  3)P ( X 2  4) P ( X10  2)   

where  X 1 , , X 10  are independent  Poisson( )  random variables.  So: 

e   3 e   4 e   2
L()      Ce10  31
3! 4! 2!

where  C  is a constant.  (31 is the sum of the observed data values.) 

Combining the prior distribution and the sample data, we see that: 

fpost ( )   35e 12 ,      0

 IFE: 2020 Examinations  The Actuarial Education Company 
CS1-15: Credibility theory Page 19

So the posterior distribution of  given x is:

 nx  
 2  2 
 2 1
N 1 , 
 n 1 n 1 
 2  2 2
 2 

 1  2  1  2 

where:

n
x  xi / n
i 1

The Bayesian estimate of  under quadratic loss is the mean of this posterior distribution:

nx 

 12  22
E ( | x ) 
n 1

 12  22
n 1
 12  22
 x 
n 1 n 1
 
 12  22  12  22

or:

E ( | x )  Z x  (1  Z ) (14.3.4)

where:

n
Z (14.3.5)
n  ( 12 /  22 )

Equation (14.3.4) is a credibility estimate of E ( | x ) since it is a weighted average of two


estimates: the first, x , is a maximum likelihood estimate based solely on data from the risk
itself, and the second,  is the best available estimate if no data were available from the
risk itself.

Notice that, as for the Poisson/gamma model, the estimate based solely on data from the
risk itself is a linear function of the observed data values.

There are some further points to be made about the credibility factor, Z , given by (14.3.5):

 It is always between zero and one.

 It is an increasing function of n , the amount of data available.

 It is an increasing function of  2 , the standard deviation of the prior distribution.

These features are all exactly what would be expected for a credibility factor.

The Actuarial Education Company  IFE: 2020 Examinations


Page 20 CS1-15: Credibility theory

Notice also that, as 12 increases, the denominator increases, and so Z decreases. 12 denotes
the variance of the distribution of the sample values. If this is large, then the sample values are
likely to be spread over a wide range, and they will therefore be less reliable for estimation.

The R code to obtain the Monte Carlo credibility premiums for the above based on M
simulations is:

Z <- n/(n+sigma1^2/sigma2^2)

pm <- rep(0,M)

for (i in 1:M)

{theta <- rnorm(1,mu,sigma2)

x <- rnorm(1,theta,sigma1)

cp[i] <- Z*mean(x)+(1-Z)*mu}

The average of these credibility estimates is given by:

mean(cp)

3.5 Further remarks on the normal/normal model


In Section 3.4 the normal/normal model for the estimation of a pure premium was discussed
within the framework of Bayesian statistics. In this section the same model will be
considered, without making any different assumptions, but in a slightly different way.

The reason for doing this is that some of the observations will be helpful when empirical
Bayes credibility theory is considered in the next chapter.

In this section, as in Section 3.4, the problem is to estimate the expected aggregate claims
produced each year by a risk. Let:

X 1, X 2 , , X n , X n 1,

be random variables representing the aggregate claims in successive years. The following
assumptions are made.

The distribution of each X j depends on the value of a fixed, but unknown, parameter,  .

Again,  is a random variable whose value, once determined, does not change over time.

The conditional distribution of X j given  is N( , 12 ) .

Given  , the random variables X j   are independent.


The prior distribution of  is N(, 22 ) .

 IFE: 2020 Examinations The Actuarial Education Company


CS1-15: Credibility theory Page 21

The values of X 1, X 2 , , X n have already been observed and the expected aggregate claims
in the coming, ie (n  1) th, year need to be estimated.

It is important to realise that the assumptions and problem outlined above are exactly the
same as the assumptions and problem outlined in Section 3.4. Slightly different notation
has been used in this section; in Section 3.4, X 1, , X n were denoted x1, , x n since their
values were assumed to be known, and X n  1 was denoted just X . The assumptions that
the distribution of each X j depends on  , that the conditional distribution of X j given 

is N( , 12 ) , and that the prior distribution of  is N(, 22 ) were all made in Section 3.4. The
only assumption not made explicitly in Section 3.4 is that, given  , the random variables
 X j  are independent.
Having stressed that everything is the same as in Section 3.4, some consequences of the
above assumptions will be considered. Some important consequences are:

Given  , the random variables X j   are identically distributed, as well as independent.


This is an immediate consequence of the assumption that given  , each X j has the same

N( ,12 ) distribution.

The random variables { X j } are (unconditionally) identically distributed. The following


formula can be written down for the unconditional distribution function of X j :

 1  (   )2   y   
P(X j  y )    2 exp     d
 2 2    1
2 2

This comes from the formula for calculating a marginal probability:

P( X j  y)   P( X j  y | ) f ( ) d

Since  is assumed to have a N( ,  22 ) distribution:

1  1     2 
f ( )  exp     
 2 2  2  2  
 

Also, since X j | is assumed to have a N( , 12 ) distribution:

 y    y  
P( X j  y | )  P  N(0,1)     
 1   1 

( ) is the standardised normal distribution function.

This expression is the same for each value of j and hence the random variables { X j } are
(unconditionally) identically distributed.

The Actuarial Education Company  IFE: 2020 Examinations


Page 22 CS1-15: Credibility theory

The random variables { X j } are not (unconditionally) independent. This can be


demonstrated as follows.

Using Equation (14.1.1) and the fact that, given  , X 1 and X 2 are conditionally
independent:

E ( X 1 X 2 )  E  E ( X 1 X 2 |  ) 

 E  E ( X 1 |  ) E ( X 2 |  ) 

 E ( 2 ) ( since E ( X 1 |  )  E ( X 2 |  )   )

  2   22

The idea used in this argument will be used repeatedly in the next chapter.

Now, if X 1 and X 2 were unconditionally independent:

E ( X 1X 2 )  E ( X 1) E ( X 2 )

However, using (14.1.1) again:

E ( X 1)  E [ E ( X 1 |  )]  E ( )  

Similarly, E ( X 2 )   . Hence:

E ( X 1X 2 )   2   22  E ( X 1)E ( X 2 )

This shows that X 1 and X 2 are not unconditionally independent. The relationship between
X 1 and X 2 is that their means are chosen from a common distribution. If this mean,  is
known, then this relationship is broken and there exists conditional independence.

3.6 Discussion of the Bayesian approach to credibility


This approach has been very successful in the Poisson/gamma and normal/normal models.
It has made the notion of collateral data very precise (by interpreting it in terms of a prior
distribution) and has given formulae for the calculation of the credibility factor. What, then,
are the drawbacks of this approach?

The first difficulty is whether a Bayesian approach to the problem is acceptable, and, if so,
what values to assign to the parameters of the prior distribution. For example, although the
Poisson/gamma model provides a formula (Equation 14.3.3) for the calculation of the
credibility factor, this formula involves the parameter  . How a value for  might be
chosen has not been discussed. The Bayesian approach to the choice of parameter values
for a prior distribution is to argue that they summarise the subjective degree of belief about
the possible values of the quantity to be estimated, for example, the mean claim number,  ,
for the Poisson/gamma model.

The second difficulty is that even if the problem fits into a Bayesian framework, the
Bayesian approach may not work in the sense that it may not produce an estimate which
can readily be rearranged to be in the form of a credibility estimate. This point can be
illustrated by using a uniform prior with a Poisson distribution for the number of claims.

 IFE: 2020 Examinations The Actuarial Education Company


CS1-15: Credibility theory Page 23

Suppose that X1 , , X n is a random sample from a Poisson distribution with mean  . If we


assume that the prior distribution of  is U(0,  ) , then the prior PDF is constant for 0     ,
and the posterior PDF is proportional to the likelihood function, ie:

n
fpost ( )   P( Xi  xi )
i 1
n
e    xi

i 1 xi !

 e n   xi

for 0     . The posterior PDF is 0 for other values of  (since the prior PDF is 0 unless
0     ). The posterior distribution of  is not one of the standard distributions listed in the
Tables (in fact it is a truncated gamma distribution) so we can’t look up the formula for its mean
or easily deduce whether it can be expressed in the form of a credibility estimate.

The Actuarial Education Company  IFE: 2020 Examinations


All study material produced by ActEd is copyright and is sold
for the exclusive use of the purchaser. The copyright is
owned by Institute and Faculty Education Limited, a
subsidiary of the Institute and Faculty of Actuaries.

Unless prior authority is granted by ActEd, you may not hire


out, lend, give out, sell, store or transmit electronically or
photocopy any part of the study material.

You must take care of your study material to ensure that it


is not used or copied by anybody else.

Legal action will be taken if these terms are infringed. In


addition, we may seek to take disciplinary action through
the profession or through your employer.

These conditions remain in force after you have finished


using the course.

© IFE: 2020 Examinations The Actuarial Education Company


CS1‐01: Data analysis  Page 1 

 
Data analysis 
 

Syllabus objectives 
2.1  Data analysis 

2.1.1 Describe the possible aims of data analysis (eg descriptive, inferential and 
predictive). 
2.1.2 Describe the stages of conducting a data analysis to solve real‐world 
problems in a scientific manner and describe tools suitable for each stage. 
2.1.3 Describe sources of data and explain the characteristics of different data 
sources, including extremely large data sets. 
2.1.4 Explain the meaning and value of reproducible research and describe the 
elements required to ensure a data analysis is reproducible. 

The Actuarial Education Company  © IFE: 2020 Examinations 
Page 2   CS1‐01: Data analysis 

0 Introduction 
This chapter provides an introduction to the underlying principles of data analysis, in particular 
within an actuarial context.   

Data analysis is the process by which data is gathered in its raw state and analysed or
processed into information which can be used for specific purposes. This chapter will
describe some of the different forms of data analysis, the steps involved in the process and
consider some of the practical problems encountered in data analytics.

Although this chapter looks at the general principles involved in data analysis, it does not deal 
with the statistical techniques required to perform a data analysis.  These are covered elsewhere, 
in CS1 and CS2. 

© IFE: 2020 Examinations  The Actuarial Education Company 
CS1‐01: Data analysis  Page 3 

1 Aims of a data analysis 
Three keys forms of data analysis will be covered in this section:

 descriptive;

 inferential; and

 predictive.

1.1 Descriptive analysis 
Data presented in its raw state can be difficult to manage and draw meaningful conclusions
from, particularly where there is a large volume of data to work with. A descriptive analysis
solves this problem by presenting the data in a simpler format, more easily understood and
interpreted by the user.

Simply put, this might involve summarising the data or presenting it in a format which
highlights any patterns or trends. A descriptive analysis is not intended to enable the user
to draw any specific conclusions. Rather, it describes the data actually presented.

For example, it is likely to be easier to understand the trend and variation in the sterling/euro 
exchange rate over the past year by looking at a graph of the daily exchange rate rather than a list 
of values.  The graph is likely to make the information easier to absorb.  

Two key measures, or parameters, used in a descriptive analysis are the measure of central
tendency and the dispersion. The most common measurements of central tendency are the
mean, the median and the mode. Typical measurements of the dispersion are the standard
deviation and ranges such as the interquartile range.

Measures of central tendency tell us about the ‘average’ value of a data set, whereas measures of 
dispersion tell us about the ‘spread’ of the values.  We will use many of these measures later in 
the course. 

It can also be important to describe other aspects of the shape of the (empirical) distribution
of the data, for example by calculating measures of skewness and kurtosis.

Empirical means ‘based on observation’.  So an empirical distribution relates to the distribution of 
the actual data points collected, rather than any assumed underlying theoretical distribution. 

Skewness is a measure of how symmetrical a data set is, and kurtosis is a measure of how likely 
extreme values are to appear (ie those in the tails of the distribution).  We shall touch on these 
later.   

1.2 Inferential analysis 
Often it is not feasible or practical to collect data in respect of the whole population,
particularly when that population is very large. For example, when conducting an opinion
poll in a large country, it may not be cost effective to survey every citizen. A practical
solution to this problem might be to gather data in respect of a sample, which is used to
represent the wider population. The analysis of the data from this sample is called
inferential analysis.

The Actuarial Education Company  © IFE: 2020 Examinations 
Page 4   CS1‐01: Data analysis 

The sample analysis involves estimating the parameters as described in Section 1.1 above
and testing hypotheses. It is generally accepted that if the sample is large and taken at
random (selected without prejudice), then it quite accurately represents the statistics of the
population, such as distribution, probability, mean, standard deviation, However, this is
also contingent upon the user making reasonably correct hypotheses about the population
in order to perform the inferential analysis.

Care may need to be taken to ensure that the sample selected is likely to be representative of the 
whole population.  For example, an opinion poll on a national issue conducted in urban locations 
on weekday afternoons between 2pm and 4pm may not accurately reflect the views of the whole 
population.  This is because those living in rural areas and those who regularly work during that 
period are unlikely to have been surveyed, and these people might tend to have a different 
viewpoint to those who have been surveyed.  

Sampling, inferential analysis and parameter estimation are covered in more detail later.

1.3 Predictive analysis 
Predictive analysis extends the principles behind inferential analysis in order for the user to
analyse past data and make predictions about future events.

It achieves this by using an existing set of data with known attributes (also known as
features), known as the training set in order to discover potentially predictive relationships.
Those relationships are tested using a different set of data, known as the test set, to assess
the strength of those relationships.

A typical example of a predictive analysis is regression analysis, which is covered in more


detail later. The simplest form of this is linear regression where the relationship between a
scalar dependent variable and an explanatory or independent variable is assumed to be
linear and the training set is used to determine the slope and intercept of the line. A
practical example might be the relationship between a car’s braking distance against speed.

In this example, the car’s speed is the explanatory (or independent) variable and the braking 
distance is the dependent variable. 
 
Question 

Based on data gathered at a particular weather station on the monthly rainfall in mm ( r ) and the 
average number of hours of sunshine per day ( s ), a researcher has determined the following 
explanatory relationship: 

s  9  0.1r   

Using this model: 

(i)  Estimate the average number of hours of sunshine per day, if the monthly rainfall is 50mm. 

(ii)  State the impact on the average number of hours of sunshine per day of each extra 
millimetre of rainfall in a month.  

   

© IFE: 2020 Examinations  The Actuarial Education Company 
CS1‐01: Data analysis  Page 5 

Solution 

(i)  When  r  50 : 

    s  9  0.1  50  4   

ie there are 4 hours of sunshine per day on average. 

(ii)  For each extra millimetre of rainfall in a month, the average number of hours of sunshine 
per day falls by 0.1 hours, or 6 minutes. 

The Actuarial Education Company  © IFE: 2020 Examinations 
Page 6   CS1‐01: Data analysis 

2 The data analysis process 
While the process to analyse data does not follow a set pattern of steps, it is helpful to
consider the key stages which might be used by actuaries when collecting and analysing
data.

The key steps in a data analysis process can be described as follows:

1. Develop a well-defined set of objectives which need to be met by the results of the
data analysis.

The objective may be to summarise the claims from a sickness insurance product by age, 
gender and cause of claim, or to predict the outcome of the next national parliamentary 
election. 

2. Identify the data items required for the analysis.

3. Collection of the data from appropriate sources.

The relevant data may be available internally (eg from an insurance company’s 
administration department) or may need to be gathered from external sources (eg from a 
local council office or government statistical service). 

4. Processing and formatting data for analysis, eg inputting into a spreadsheet,


database or other model.

5. Cleaning data, eg addressing unusual, missing or inconsistent values.

6. Exploratory data analysis, which may include:


(a) Descriptive analysis; producing summary statistics on central tendency and
spread of the data.
(b) Inferential analysis; estimating summary parameters of the wider population
of data, testing hypotheses.
(c) Predictive analysis; analysing data to make predictions about future events
or other data sets.

7. Modelling the data.

8. Communicating the results.

It will be important when communicating the results to make it clear what data was used, 
what analyses were performed, what assumptions were made, the conclusion of the 
analysis, and any limitations of the analysis. 

9. Monitoring the process; updating the data and repeating the process if required.

A data analysis is not necessarily just a one‐off exercise.  An insurance company analysing 
the claims from its sickness policies may wish to do this every few years to allow for the 
new data gathered and to look for trends.  An opinion poll company attempting to predict 
an election result is likely to repeat the poll a number of times in the weeks before the 
election to monitor any changes in views during the campaign period. 

© IFE: 2020 Examinations  The Actuarial Education Company 
CS1‐01: Data analysis  Page 7 

Throughout the process, the modelling team needs to ensure that any relevant professional
guidance has been complied with. For example, the Financial Reporting Council has issued
a Technical Actuarial Standard (TAS) on the principles for Technical Actuarial Work
(TAS100) which includes principles for the use of data in technical actuarial work.
Knowledge of the detail of this TAS is not required for CS1.

Further, the modelling team should also remain aware of any legal requirement to be
complied with. Such legal requirement may include aspects around consumer/customer
data protection and gender discrimination.

The Actuarial Education Company  © IFE: 2020 Examinations 
Page 8   CS1‐01: Data analysis 

3 Data sources 
Step 3 of the process described in Section 2 above refers to collection of the data needed to
meet the objectives of the analysis from appropriate sources. As consideration of Steps 3,
4, and 5 makes clear, getting data into a form ready for analysis is a process, not a single
event. Consequently, what is seen as the source of data can depend on your viewpoint.

Suppose you are conducting an analysis which involves collecting survey data from a
sample of people in the hope of drawing inferences about a wider population. If you are in
charge of the whole process, including collecting the primary data from your selected
sample, you would probably view the ‘source’ of the data as being the people in your
sample. Having collected, cleaned and possibly summarised the data you might make it
available to other investigators in JavaScript object notation (JSON) format via a web
Application programming interface (API). You will then have created a secondary ‘source’
for others to use.

In this section we discuss how the characteristics of the data are determined both by the
primary source and the steps carried out to prepare it for analysis – which may include the
steps on the journey from primary to secondary source. Details of particular data formats
(such as JSON), or of the mechanisms for getting data from an external source into a local
data structure suitable for analysis, are not covered in CS1.

Primary data can be gathered as the outcome of a designed experiment or from an


observational study (which could include a survey of responses to specific questions). In
all cases, knowledge of the details of the collection process is important for a complete
understanding of the data, including possible sources of bias or inaccuracy. Issues that the
analyst should be aware of include:

 whether the process was manual or automated;

 limitations on the precision of the data recorded;

 whether there was any validation at source; and

 if data wasn’t collected automatically, how was it converted to an electronic form.

These factors can affect the accuracy and reliability of the data collected.  For example: 
 in a survey, an individual’s salary may be specified as falling into given bands, eg £20,000 ‐ 
£29,999, £30,000 ‐ £39,999 etc, rather than the precise value being recorded 
 if responses were collected on handwritten forms, and then manually input into a 
database, there is greater scope for errors to appear. 

Where randomisation has been used to reduce the effect of bias or confounding variables it
is important to know the sampling scheme used:

 simple random sampling;

 stratified sampling; or

 another sampling method.

© IFE: 2020 Examinations  The Actuarial Education Company 
CS1‐01: Data analysis  Page 9 

Question  

A researcher wishes to survey 10% of a company’s workforce.  

Describe how the sample could be selected using: 

(a)  simple random sampling 

(b)  stratified sampling. 

Solution 

(a)  Simple random sampling 

Using simple random sampling, each employee would have an equal chance of being selected.  
This could be achieved by taking a list of the employees, allocating each a number, and then 
selecting 10% of the numbers at random (either manually, or using a computer‐generated 
process). 

(b)  Stratified sampling 

Using stratified sampling, the workforce would first be split into groups (or strata) defined by 
specific criteria, eg level of seniority.  Then 10% of each group would be selected using simple 
random sampling.  In this way, the resulting sample would reflect the structure of the company by 
seniority. 

This aims to overcome one of the issues with simple random sampling, ie that the sample 
obtained does not fully reflect the characteristics of the population.  With a simple random 
sample, it would be possible for all those selected to be at the same level of seniority, and so be 
unrepresentative of the workforce as a whole. 

 
Data may have undergone some form of pre-processing. A common example is grouping
(eg by geographical area or age band). In the past, this was often done to reduce the
amount of storage required and to make the number of calculations manageable. The scale
of computing power available now means that this is less often an issue, but data may still
be grouped: perhaps to anonymise it, or to remove the possibility of extracting sensitive (or
perhaps commercially sensitive) details.

Other aspects of the data which are determined by the collection process, and which affect
the way it is analysed include the following:

 Cross-sectional data involves recording values of the variables of interest for each
case in the sample at a single moment in time.
For example, recording the amount spent in a supermarket by each member of a loyalty 
card scheme this week. 

 Longitudinal data involves recording values at intervals over time.


For example, recording the amount spent in a supermarket by a particular member of a 
loyalty card scheme each week for a year. 

The Actuarial Education Company  © IFE: 2020 Examinations 
Page 10   CS1‐01: Data analysis 

 Censored data occurs when the value of a variable is only partially known, for
example, if a subject in a survival study withdraws, or survives beyond the end of
the study: here a lower bound for the survival period is known but the exact value
isn’t.
Censoring is dealt with in detail in CS2. 

 Truncated data occurs when measurements on some variables are not recorded so
are completely unknown.

For example, if we were collecting data on the periods of time for which a user’s internet 
connection was disrupted, but only recorded the duration of periods of disruption that 
lasted 5 minutes or longer, we would have a truncated data set. 

3.1 Big data 
The term big data is not well defined but has come to be used to describe data with
characteristics that make it impossible to apply traditional methods of analysis (for
example, those which rely on a single, well-structured data set which can be manipulated
and analysed on a single computer). Typically, this means automatically collected data with
characteristics that have to be inferred from the data itself rather than known in advance
from the design of an experiment.

Given the description above, the properties that can lead data to be classified as ‘big’
include:

 size, not only does big data include a very large number of individual cases, but
each might include very many variables, a high proportion of which might have
empty (or null) values – leading to sparse data;

 speed, the data to be analysed might be arriving in real time at a very fast rate – for
example, from an array of sensors taking measurements thousands of time every
second;

 variety, big data is often composed of elements from many different sources which
could have very different structures – or is often largely unstructured;

 reliability, given the above three characteristics we can see that the reliability of
individual data elements might be difficult to ascertain and could vary over time (for
example, an internet connected sensor could go offline for a period).

Examples of ‘big data’ are: 
 the information held by large online retailers on items viewed, purchased and 
recommended by each of its customers 
 measurements of atmospheric pressure from sensors monitored by a national 
meteorological organisation  
 the data held by an insurance company received from the personal activity trackers (that 
monitor daily exercise, food intake and sleep, for example) of its policyholders. 

© IFE: 2020 Examinations  The Actuarial Education Company 
CS1‐01: Data analysis  Page 11 

Although the four points above (size, speed, variety, reliability) have been presented in the
context of big data, they are characteristics that should be considered for any data source.
For example, an actuary may need to decide if it is advisable to increase the volume of data
available for a given investigation by combining an internal data set with data available
externally. In this case, the extra processing complexity required to handle a variety of
data, plus any issues of reliability of the external data, will need to be considered.

3.2 Data security, privacy and regulation 
In the design of any investigation, consideration of issues related to data security, privacy
and complying with relevant regulations should be paramount. It is especially important to
be aware that combining different data from different ‘anonymised’ sources can mean that
individual cases become identifiable.

Another point to be aware of is that just because data has been made available on the
internet, doesn’t mean that that others are free to use it as they wish. This is a very
complex area and laws vary between jurisdictions.

The Actuarial Education Company  © IFE: 2020 Examinations 
Page 12   CS1‐01: Data analysis 

4 Reproducible research 
An example reference for this section is in Peng (2016). For the full reference, see the end
of this section.

4.1 The meaning of reproducible research 
Reproducibility refers to the idea that when the results of a statistical analysis are reported,
sufficient information is provided so that an independent third party can repeat the analysis
and arrive at the same results.

In science, reproducibility is linked to the concept of replication which refers to someone


repeating an experiment and obtaining the same (or at least consistent) results. Replication
can be hard, or expensive or impossible, for example if:

 the study is big;

 the study relies on data collected at great expense or over many years; or

 the study is of a unique occurrence (the standards of healthcare in the aftermath of a


particular event).

Due to the possible difficulties of replication, reproducibility of the statistical analysis is


often a reasonably alternative standard.

So, rather than the results of the analysis being validated by an independent third party 
completely replicating the study from scratch (including gathering a new data set), the validation 
is achieved by an independent third party reproducing the same results based on the same data 
set. 

4.2 Elements required for reproducibility 
Typically, reproducibility requires the original data and the computer code to be made
available (or fully specified) so that other people can repeat the analysis and verify the
results. In all but the most trivial cases, it will be necessary to include full documentation
(eg description of each data variable, an audit trail describing the decisions made when
cleaning and processing the data, and full documented code). Documentation of models is
covered in Subject CP2.

Full documented code can be achieved through literate statistical programming (as defined
by Knuth, 1992) where the program includes an explanation of the program in plain
language, interspersed with code snippets. Within the R environment, a tool which allows
this is R-markdown.

R‐markdown enables documents to be produced that include the code used, an explanation of 
that code, and, if desired, the output from that code.   

As a simpler example, it may be possible to document the work carried out in a spreadsheet by 
adding comments or annotations to explain the operations performed in particular cells, rows or 
columns. 

© IFE: 2020 Examinations  The Actuarial Education Company 
CS1‐01: Data analysis  Page 13 

Although not strictly required to meet the definition of reproducibility, a good version
control process can ensure evolving drafts of code, documentation and reports are kept in
alignment between the various stages of development and review, and changes are
reversible if necessary. There are many tools that are used for version control. A popular
tool used for version control is git.

A detailed knowledge of the version control tool ‘git’ is not required in CS1. 

In addition to version control, documenting the software environment, the computing


architecture, the operating system, the software toolchain, external dependencies and
version numbers can all be important in ensuring reproducibility.

As an example, in the R programming language, the command:

sessionInfo()

provides information about the operating system, version of R and version of all R
packages being used.

Question  

Give a reason why documenting the version number of the software used can be important for 
reproducibility of a data analysis. 

Solution 

Some functions might be available in one version of a package that are not available in another 
(older) version.  This could prevent someone being able to reproduce the analysis. 

 
Where there is randomness in the statistical or machine learning techniques being used (for
example random forests or neural networks) or where simulation is used, replication will
require the random seed to be set.

Machine learning is covered in Subject CS2. 

Simulation will be dealt with in more detail later in the course.  At this point, it is sufficient to 
know that each simulation that is run will be based on a series of pseudo‐random numbers.  So, 
for example, one simulation will be based on one particular series of pseudo‐random numbers, 
but unless explicitly coded otherwise, a different simulation will be based on a different series of 
pseudo‐random numbers.  The second simulation will then produce different results, rather than 
replicating the original results, which is the desired outcome here. 

To ensure the two simulations give the same results, they would both need to be based on the 
same series of pseudo‐random numbers.  This is known as ‘setting the random seed’.  We will do 
this regularly when using R to carry out a simulation. 

   

The Actuarial Education Company  © IFE: 2020 Examinations 
Page 14   CS1‐01: Data analysis 

Doing things ‘by hand’ is very likely to create problems in reproducing the work. Examples
of doing things by hand are:

 manually editing spreadsheets (rather than reading the raw data into a programming
environment and making the changes there);

 editing tables and figures (rather than ensuring that the programming environment
creates them exactly as needed);

 downloading data manually from a website (rather than doing it programmatically);


and

 pointing and clicking (unless the software used creates an audit trail of what has
been clicked).

‘Pointing and clicking’ relates to choosing a particular operation from an on‐screen menu, for 
example.  This action would not ordinarily be recorded electronically. 

The main thing to note here is that the more of the analysis that is performed in an automated 
way, the easier it will be to reproduce by another individual.  Manual interventions may be 
forgotten altogether, and even if they are remembered, can be difficult to document clearly. 

4.3 The value of reproducibility 
Many actuarial analyses are undertaken for commercial, not scientific, reasons and are not
published, but reproducibility is still valuable:

 reproducibility is necessary for a complete technical work review (which in many


cases will be a professional requirement) to ensure the analysis has been correctly
carried out and the conclusions are justified by the data and analysis;

 reproducibility may be required by external regulators and auditors;

 reproducible research is more easily extended to investigate the effect of changes to


the analysis, or to incorporate new data;

 it is often desirable to compare the results of an investigation with a similar one


carried out in the past; if the earlier investigation was reported reproducibly an
analysis of the differences between the two can be carried out with confidence;

 the discipline of reproducible research, with its emphasis on good documentation of


processes and data storage, can lead to fewer errors that need correcting in the
original work and, hence, greater efficiency.

There are some issues that reproducibility does not address:

 Reproducibility does not mean that the analysis is correct. For example, if an
incorrect distribution is assumed, the results may be wrong – even though they can
be reproduced by making the same incorrect assumption about the distribution.
However, by making clear how the results are achieved, it does allow transparency
so that incorrect analysis can be appropriately challenged.

 If activities involved in reproducibility happen only at the end of an analysis, this


may be too late for resulting challenges to be dealt with. For example, resources
may have been moved on to other projects.
 

© IFE: 2020 Examinations  The Actuarial Education Company 
CS1‐01: Data analysis  Page 15 

4.4 References 
Further information on the material in this section is given in the references: 

 Knuth, Donald E. (1992). Literate Programming. California: Stanford University


Center for the Study of Language and Information. ISBN 978-0-937073-80-3.

 Peng, R. D., 2016, Report Writing for Data Science in R,


www.Leanpub.com/reportwriting
   

The Actuarial Education Company  © IFE: 2020 Examinations 
Page 16   CS1‐01: Data analysis 

The chapter summary starts on the next page so that you can 
 keep all the chapter summaries together for revision purposes. 
 

© IFE: 2020 Examinations  The Actuarial Education Company 
CS1‐01: Data analysis  Page 17 

Chapter 1 Summary 
The three key forms of data analysis are: 
 descriptive analysis: producing summary statistics (eg measures of central tendency 
and dispersion) and presenting the data in a simpler format 
 inferential analysis: using a data sample to estimate summary parameters for the 
wider population from which the sample was taken, and testing hypotheses 
 predictive analysis: extends the principles of inferential analysis to analyse past data 
and make predictions about future events. 

The key steps in the data analysis process are: 

1. Develop a well‐defined set of objectives which need to be met by the results of the 
data analysis. 

2. Identify the data items required for the analysis. 

3. Collection of the data from appropriate sources. 

4. Processing and formatting data for analysis, eg inputting into a spreadsheet, 
database or other model. 

5. Cleaning data, eg addressing unusual, missing or inconsistent values. 

6. Exploratory data analysis, which may include descriptive analysis, inferential analysis 
or predictive analysis. 
7. Modelling the data. 

8. Communicating the results. 

9. Monitoring the process; updating the data and repeating the process if required. 

In the data collection process, the primary source of the data is the population (or 
population sample) from which the ‘raw’ data is obtained.  If, once the information is 
collected, cleaned and possibly summarised, it is made available for others to use via a web 
interface, this is then a secondary source of data. 

Other aspects of the data determined by the collection process that may affect the analysis 
are: 
 Cross‐sectional data involves recording values of the variables of interest for each 
case in the sample at a single moment in time. 
 Longitudinal data involves recording values at intervals over time. 
 Censored data occurs when the value of a variable is only partially known. 
 Truncated data occurs when measurements on some variables are not recorded so 
are completely unknown. 

The Actuarial Education Company  © IFE: 2020 Examinations 
Page 18   CS1‐01: Data analysis 

 
The term ‘big data’ can be used to describe data with characteristics that make it impossible 
  to apply traditional methods of analysis.  Typically, this means automatically collected data 
with characteristics that have to be inferred from the data itself rather than known in 
  advance from the design of the experiment.   
  Properties that can lead to data being classified as ‘big’ include: 
   size of the data set 

   speed of arrival of the data 
 variety of different sources from which the data is drawn 
 
 reliability of the data elements might be difficult to ascertain. 
 
Replication refers to an independent third party repeating an experiment and obtaining the 
  same (or at least consistent) results.  Replication of a data analysis can be difficult, expensive 
or impossible, so reproducibility is often used as a reasonably alternative standard.  
 
Reproducibility refers to reporting the results of a statistical analysis in sufficient detail that 
  an independent third party can repeat the analysis on the same data set and arrive at the 
same results. 
 
Elements required for reproducibility: 
 
 the original data and fully documented computer code need to be made available 
 
 good version control 
   documentation of the software used, computing architecture, operating system, 
external dependencies and version numbers 
 
 where randomness is involved in the process, replication will require the random 
  seed to be set 
   limiting the amount of work done ‘by hand’. 

© IFE: 2020 Examinations  The Actuarial Education Company 
CS1‐01: Data analysis  Page 19 

Chapter 1 Practice Questions 
1.1 The data analysis department of a mobile phone messaging app provider has gathered data on 
the number of messages sent by each user of the app on each day over the past 5 years.  The 
geographical location of each user (by country) is also known.  

(i)  Describe each of the following terms as it relates to a data set, and give an example of 
each as it relates to the app provider’s data: 

  (a)  cross‐sectional 

  (b)  longitudinal. 

(ii)  Give an example of each of the following types of data analysis that could be carried out 
using the app provider’s data: 

  (a)  descriptive 

  (b)  inferential 

  (c)  predictive. 

1.2 Explain the regulatory and legal requirements that should be observed when conducting a data 
analysis exercise. 

1.3 A car insurer wishes to investigate whether young drivers (aged 17‐25) are more likely to have an 
Exam style 
accident in a given year than older drivers.  

Describe the steps that would be followed in the analysis of data for this investigation.  [7] 

1.4 (i)  In the context of data analysis, define the terms ‘replication’ and ‘reproducibility’.  [2] 


Exam style 
(ii)  Give three reasons why replication of a data analysis can be difficult to achieve in practice. 
        [3] 
        [Total 5] 

   

The Actuarial Education Company  © IFE: 2020 Examinations 
Page 20   CS1‐01: Data analysis 

The solutions start on the next page so that you can 
separate the questions and solutions. 

© IFE: 2020 Examinations  The Actuarial Education Company 
CS1‐01: Data analysis  Page 21 

Chapter 1 Solutions 
1.1 (i)(a)  Cross‐sectional 

Cross‐sectional data involves recording the values of the variables of interest for each case in the 
sample at a single moment in time. 

In this data set, this relates to the number of messages sent by each user on any particular day. 

(i)(b)  Longitudinal 

Longitudinal data involves recording the values of the variables of interest at intervals over time. 

In this data set, this relates to the number of messages sent by a particular user on each day over 
the 5‐year period. 

(ii)(a)  Descriptive analysis 

Examples of descriptive analysis that could be carried out on this data set include: 
 calculating the mean and standard deviation of the number of messages sent each day by 
users in each country 
 plotting a graph of the total messages sent each day worldwide, to illustrate the overall 
trend in the number of messages sent over the 5 years 
 calculating what proportion of the total messages sent in each year originate in each 
country. 

(ii)(b)  Inferential analysis 

Examples of inferential analysis that could be carried out on this data set include: 
 testing the hypothesis that more messages are sent at weekends than on weekdays 
 assessing whether there is a significant difference in the rate of growth of the number of 
messages sent each day by users in different countries over the 5‐year period. 

(ii)(c)  Predictive analysis 

Examples of predictive analysis that could be carried out on this data set include: 
 forecasting which countries will be the major users of the app in 5 years’ time, and will 
therefore need the most technical support staff  
 predicting the number of messages sent on the app’s busiest day (eg New Year’s Eve) next 
year, to ensure that the provider continues to have sufficient capacity. 

   

The Actuarial Education Company  © IFE: 2020 Examinations 
Page 22   CS1‐01: Data analysis 

1.2 Throughout the data analysis process, it is important to ensure that any relevant professional 
guidance has been complied with.  For example, the UK’s Financial Reporting Council has issued a 
Technical Actuarial Standard (TAS) on the principles for Technical Actuarial Work (TAS100).  This 
describes the principles that should be adhered to when using data in technical actuarial work. 

The data analysis team must also be aware of any legal requirements to be complied with relating 
to, for example: 
 protection of an individual’s personal data and privacy 
 discrimination on the grounds of gender, age, or other reasons. 

With regard to privacy regulations, it is important to note that combining data from different 
sources may mean that individuals can be identified, even if they are anonymous in the original 
data sources. 

Finally, data that have been made available on the internet cannot necessarily be used for any 
purpose.  Any legal restrictions should be checked before using the data, noting that laws can vary 
between jurisdictions. 

1.3 The key steps in the data analysis process in this scenario are: 

1. Develop a well‐defined set of objectives that need to be met by the results of the data 
analysis.    [½] 

Here, the objective is to determine whether young drivers are more likely to have an 
accident in a given year than older drivers.  [½] 

2. Identify the data items required for the analysis.  [½] 

The data items needed would include the number of drivers of each age during the 
investigation period and the number of accidents they had.  [½] 

3. Collection of the data from appropriate sources.    [½] 

The insurer will have its own internal data from its administration department on the 
number of policyholders of each age during the investigation period and which of them 
had accidents.    [½] 

The insurer may also be able to source data externally, eg from an industry body that 
collates information from a number of insurers.  [½] 

4. Processing and formatting the data for analysis, eg inputting into a spreadsheet, database 
or other model.   [½] 

The data will need to be extracted from the administration system and loaded into 
whichever statistical package is being used for the analysis.  [½] 

If different data sets are being combined, they will need to be put into a consistent format 
and any duplicates (ie the same record appearing in different data sets) will need to be 
removed.    [½] 

© IFE: 2020 Examinations  The Actuarial Education Company 
CS1‐01: Data analysis  Page 23 

5. Cleaning data, eg addressing unusual, missing or inconsistent values.  [½] 

For example, the age of the driver might be missing, or be too low or high to be plausible.  
These cases will need investigation.  [½] 

6. Exploratory data analysis, which here takes the form of inferential analysis…  [½] 

… as we are testing the hypothesis that younger drivers are more likely to have an 
accident than older drivers.  [½] 

7. Modelling the data.  [½] 

This may involve fitting a distribution to the annual number of accidents arising from the 
policyholders in each age group.  [½] 

8. Communicating the results.  [½] 

This will involve describing the data sources used, the model and analyses performed, and 
the conclusion of the analysis (ie whether young drivers are indeed more likely to have an 
accident than older drivers), along with any limitations of the analysis.  [½] 

9. Monitoring the process – updating the data and repeating the process if required.  [½] 

The car insurer may wish to repeat the process again in a few years’ time, using the data 
gathered over that period, to ensure that the conclusions of the original analysis remain 
valid.      [½] 

10. Ensuring that any relevant professional guidance and legislation (eg on age discrimination) 
has been complied with.  [½] 
        [Maximum 7] 

1.4 (i)  Definitions 

Replication refers to an independent third party repeating an analysis from scratch (including 
gathering an independent data sample) and obtaining the same (or at least consistent) results.  [1] 

Reproducibility refers to reporting the results of a statistical analysis in sufficient detail that an 
independent third party can repeat the analysis on the same data set and arrive at the same 
results.        [1] 
        [Total 2] 

(ii)  Three reasons why replication is difficult 

Replication of a data analysis can be difficult if: 
 the study is big;   [1] 
 the study relies on data collected at great expense or over many years; or  [1] 
 the study is of a unique occurrence (eg the standards of healthcare in the aftermath of a 
particular event).  [1] 
      [Total 3] 

The Actuarial Education Company  © IFE: 2020 Examinations 
All study material produced by ActEd is copyright and is sold
for the exclusive use of the purchaser. The copyright is
owned by Institute and Faculty Education Limited, a
subsidiary of the Institute and Faculty of Actuaries.

Unless prior authority is granted by ActEd, you may not hire


out, lend, give out, sell, store or transmit electronically or
photocopy any part of the study material.

You must take care of your study material to ensure that it


is not used or copied by anybody else.

Legal action will be taken if these terms are infringed. In


addition, we may seek to take disciplinary action through
the profession or through your employer.

These conditions remain in force after you have finished


using the course.

The Actuarial Education Company © IFE: 2020 Examinations


CS1‐02: Probability distributions  Page 5 

Another example of a Bernoulli random variable occurs when a fair die is thrown once.  If  X  is the 
number of sixes obtained,  p  1 6 , 1  p   5 6  and  P( X  0)  5 6  and  P( X  1)  1 6 . 

R code. See R code for Binomial distribution.

1.3 Binomial distribution 
Consider a sequence of n Bernoulli trials as above such that:

(i) the trials are independent of one another, ie the outcome of any trial does not
depend on the outcomes of any other trials

and:

(ii) the trials are identical, ie at each trial P ({ s })  p .

Such a sequence is called a ‘sequence of n independent, identical, Bernoulli ( p) trials’ or,


for short, a ‘sequence of n Bernoulli ( p) trials’.

A quick way of saying independent and identically distributed is IID.  We will need this idea later. 

The independence allows the probability of a joint outcome involving two or more trials to
be expressed as the product of the probabilities of the outcomes associated with each
separate trial concerned.

Sample space S: the joint set of outcomes of all n trials

Probability measure: as above for each trial

Random variable X is the number of successes that occur in the n trials.

n
Distribution: P ( X  x )    p x (1  p )n  x , x  0, 1, 2, , n ; 0  p  1
x

The coefficients here are the same as in the binomial expansion that can be obtained using the 
n n!
numbers from Pascal’s triangle, ie     nC x  .  We can work out these quantities using 
x (n  x)! x !
the  nCr  function on a calculator. 

If  X  is distributed binomially with parameters n and  p , then we can write  X  Bin(n, p) . 

The fact that a  Bin(n, p)  distribution arises from the sum of n independent and identical Bernoulli


(p)  trials is important and will be used later to prove some important results. 

Moments:   np

 2  np(1  p)

Very often when using the binomial distribution we will write  1  p  q . 

The Actuarial Education Company  © IFE: 2020 Examinations 
Page 6   CS1‐02: Probability distributions 

As an example of the binomial distribution, suppose that  X  is the number of sixes obtained when 
10  x
a fair die is thrown 10 times.  Then  P( X  x)  10C x  1 6   6
x 5
 and the probability of exactly 

one ‘six’ in ten throws is  10C1  1 6   6  
1 5 9
 0.3230 .  There are 10   10C1  ways of obtaining 
exactly one ‘six’, ie the ‘six’ could be on the first throw, the second throw, ….  or the tenth throw. 

Question 

Calculate the probability that at least 9 out of a group of 10 people who have been infected by a 
serious disease will survive, if the survival probability for the disease is 70%. 

Solution 

The number of survivors is distributed binomially with parameters  n  10 , and  p  0.7 .  If  X  is 


the number of survivors, then: 

 10   10 
P( X  9)  P( X  9 or 10)     0.79  0.3     0.710  0.1493  
9  10 

Alternatively, we could use the cumulative binomial probabilities given on page 187 of the Tables.  
The figure for  x  8  in the Tables for the  Bin(10,0.7)  distribution is 0.8507.  Subtracting this from 
1, we get  1  0.8507  0.1493  as before. 

The R code for simulating values and calculating probabilities and quantiles from the
binomial distribution uses the R functions rbinom, dbinom, pbinom and qbinom. The
prefixes r, d, p, and q stand for random generation, density, distribution and quantile
functions respectively.

R code for simulating a random sample of 100 values from the binomial distribution with
n  20 and p  0.3 :

n = 20
p = 0.3
rbinom(100, n, p)

Calculate P ( X  2) :

dbinom(2, n, p)

Similarly, the cumulative distribution function (CDF) and quantiles can be calculated with
pbinom and qbinom.

For a Bernoulli distribution the parameter n is set to n  1 .

   

© IFE: 2020 Examinations  The Actuarial Education Company 
CS1‐02: Probability distributions  Page 9 

1.5 Negative binomial distribution 
This is a generalisation of the geometric distribution.

The random variable X is the number of the trial on which the k th success occurs, where
k is a positive integer.

For example, in a telesales company,  X  might be the number of phone calls required to make the 
fifth sale.   

 x  1 k x k
Distribution: P ( X  x )    p (1  p ) x  k , k  1, ; 0  p  1
 k  1 

We say that  X  has a Type 1 negative binomial  (k , p)  distribution. 

The probabilities satisfy the recurrence relationship: 

x 1
P(X  x )  (1  p )P ( X  x  1)
x k

Note that in applying this model, the value of k is known.

k k (1  p)
Moments:  and: 2 
p p2

Note: The mean and variance are just k times those for the geometric ( p) variable, which is
itself a special case of this random variable (with k  1 ). Further, the negative binomial
variable can be expressed as the sum of k geometric variables (the number of trials to the
first success, plus the number of additional trials to the second success, plus … to the
( k  1 )th success, plus the number of additional trials to the k th success.)

Question 

If the probability that a person will believe a rumour about a scandal in politics is 0.8, calculate the 
probability that the ninth person to hear the rumour will be the fourth person to believe it. 

Solution 

Let  X  be the ‘position’ of the fourth person who believes it.  Then  p  0.8 ,  X  9  and  k  4 , and 


we have: 

8
P( X  9)     0.84  0.25  0.00734  
3

 
 

The Actuarial Education Company  © IFE: 2020 Examinations 
Page 10   CS1‐02: Probability distributions 

Another formulation of the negative binomial distribution is sometimes used.

Let Y be the number of failures before the k th success.

 k  y  1 k k (1  p)
 p (1  p ) , y  0, 1, 2, 3, , with mean   . Y  X k,
y
Then P (Y  y )  
 y  p
where X is defined as above.

This formulation is called the Type 2 negative binomial distribution and can be found on page 9 of 
the Tables.  It should be noted that in the Tables the combinatorial factor has been rewritten in 
terms of the gamma function (defined later in this chapter). 

The previous formulation is known as the Type 1 negative binomial distribution.  The formulae for 
this version are given on page 8 of the Tables. 

The R code for simulating values and calculating probabilities and quantiles from the
negative binomial distribution is similar to the R code used for the binomial distribution
using the R functions rnbinom, dnbinom, pnbinom and qnbinom.

For example:

dnbinom(15, 10, 0.3)

calculates the probability P (Y  15)  0.0366544 for p  0.3 and k  10 .

By default, R uses the Type‐2 version of the negative binomial distribution. 

1.6 Hypergeometric distribution 
This is the ‘finite population’ equivalent of the binomial distribution, in the following sense.
Suppose objects are selected at random, one after another, without replacement, from a
finite population consisting of k ‘successes’ and N  k ‘failures’. The trials are not
independent, since the result of one trial (the selection of a success or a failure) affects the
make-up of the population from which the next selection is made.

Random variable X : is the number of ‘successes’ in a sample of size n from a population


of size N that has k ‘successes’ and N  k ‘failures’ .

 k N  k 
  
x nx
P(X  x )     , x  1, 2, 3, ; 0  p  1 .
N
 
n

nk
Moments: 
N

nk (N  k )(N  n )
2 
N 2 (N  1)

(The details of the derivation of the mean and variance of the number of successes are not
required by the syllabus).

© IFE: 2020 Examinations  The Actuarial Education Company 
CS1‐02: Probability distributions  Page 13 

This suggests that X has variance  . This is in fact also the case. So    2   .

Question 

Using the probability function for the Poisson distribution, prove the formulae for the mean and 
variance.  Hint: for the variance, consider  E[ X ( X  1)] . 

Solution 

The mean is: 

2 3 4
E ( X )   xP( X  x)  e   2 e   3 e   4 e  
x 2! 3! 4!

3 4
  e    2e   e   e    
2! 3!
 2 3 
  e   1      
 2! 3! 
 

2 3
Since  e  1      , we obtain: 
2! 3!

  E (X )  e  e    

For the variance we need to work out  E(X2 ) .  However, the easiest way to work out the variance 


is actually to consider  E[ X ( X  1)] : 

2 3 4
E[ X ( X  1)]   x(x  1) P( X  x)  2  1 e   3  2 e   4  3 e   
x 2 3! 4!

 2 
  2e   1     
 2!   
 

  2e  e

 2

E[ X ( X  1)]  E ( X 2 )  E ( X )   2  E ( X 2 )   2  E ( X )   2    

var( X )  E(X 2 )  [E(X )]2   2     2    

 
We can calculate Poisson probabilities in the usual way, using the probability function or the 
cumulative probabilities given in the Tables. 

   

The Actuarial Education Company  © IFE: 2020 Examinations 
Page 14   CS1‐02: Probability distributions 

Question 

If goals are scored randomly in a game of football at a constant rate of three per match, calculate 
the probability that more than 5 goals are scored in a match. 

Solution 

The number of goals in a match can be modelled as a Poisson distribution with mean    3 . 

P( X  5)  1  P( X  5)  

We can use the recurrence relationship given: 

P( X  0)  e 3  0.0498

3
P( X  1)   0.0498  0.1494
1
3
P( X  2)   0.1494  0.2240
2
 
3
P( X  3)   0.2240  0.2240
3
3
P( X  4)   0.2240  0.1680
4
3
P( X  5)   0.1680  0.1008
5

So we have  P( X  5)  1  0.9161  0.0839 . 

Alternatively, we could obtain this directly using the cumulative Poisson probabilities given on 
page 176 of the Tables.  For  Poi(3) , the figure for  x  5  is 0.91608, and  1  0.91608  0.08392 . 

 
The Poisson distribution provides a very good approximation to the binomial when n is
large and p is small – typical applications have n  100 or more and p  0.05 or less. The
approximation depends only on the product np (   ) – the individual values of n and p are
irrelevant. So, for example, the value of P ( X  x ) in the case n  200 and p  0.02 is
effectively the same as the value of P ( X  x ) in the case n  400 and p  0.01 . When
dealing with large numbers of opportunities for the occurrence of ‘rare’ events (under
‘binomial assumptions’), the distribution of the number that occurs depends only on the
expected number.

We will look at other approximations in Chapter 6. 

   

© IFE: 2020 Examinations  The Actuarial Education Company 
CS1‐04: Joint distributions  Page 3 

1 Joint distributions 

1.1 Joint probability (density) functions 
Defining several random variables simultaneously on a sample space gives rise to a
multivariate distribution. In the case of just two variables, it is a bivariate distribution.

Discrete case 
To illustrate this for a pair of discrete variables, X and Y , the probabilities associated with
the various values of ( x , y ) are as follows:

y
1 2 3

1 0.10 0.10 0.05

2 0.15 0.10 0.05

3 0.20 0.05 -

4 0.15 0.05 -

So, for example, P ( X  3,Y  1)  0.05 , and P ( X  1,Y  3)  0.20 .

The function f ( x , y )  P ( X  x ,Y  y ) for all values of ( x , y ) is the (joint/bivariate)


probability function of ( X ,Y ) – it specifies how the total probability of 1 is divided up
amongst the possible values of ( x , y ) and so gives the (joint/bivariate) probability
distribution of ( X ,Y ) .

The requirements for a function to qualify as the probability function of a pair of discrete
random variables are:

f ( x , y )  0 for all values of x and y in the domain

  f (x, y )  1
x y

This parallels earlier results, where the probability function was  P( X  x)  which had to satisfy 
P( X  x)  0  for all values of  x  and   P( X  x)  1 . 
x

The Actuarial Education Company  © IFE: 2020 Examinations 
Page 4   CS1‐04: Joint distributions 

For example, consider the discrete random variables  M  and  N  with joint probability function: 

m
  P(M  m, N  n)  ,  where m  1,  2,  3,  4 and n  1,  2,  3  
35  2n2

Let’s draw up a table showing the values of the joint probability function for M and N. 

2 1
Starting with the smallest possible values of  M  and  N ,  P(M  1, N  1)  .   
1
35  2 35
Calculating the joint probability for all combinations of M and N, we get the table shown below. 

    M 

    1  2  3  4 

2 4 6 8
1         
35 35 35 35

1 2 3 4
N  2         
35 35 35 35

1 1 3 2
3         
70 35 70 35

 
Question 

Use the table of probabilities given above to calculate: 

(i)  P(M  3, N  1 or 2)  

(ii)  P(N  3)  

(iii)  P(M  2|N  3) . 

Solution 

(i)  Since the events  P(N  1)  and P(N  2)  are mutually exclusive, we have: 

6 3 9
    P(M  3, N  1 or 2)  P(M  3, N  1)  P(M  3, N  2)     
35 35 35

   

© IFE: 2020 Examinations  The Actuarial Education Company 
CS1‐04: Joint distributions  Page 10a 

Solution 

The marginal PDF of  Y  is: 

2 2
1 1 1  1
fY (y)   16
(x  3y) dx   x 2  3xy 
16  2
 (2  6 y)
 x 0 16
x 0

So: 

1
(x  3y)
fX ,Y (x , y) x  3y
fX|Y y (x , y)   16  0 x 2
(2  6y) 2(1  3y)
fY (y) 1
16

1.4 Distributions defined on more complex domains 
For all the distributions we have seen so far, the limits on both the  x  and the  y  integrals have 
been numbers.  So, in the previous example, the joint PDF is defined over the rectangle whose 
vertices are at the points (0, 0), (0, 2), (2, 0), and (2, 2). 

It is possible for a joint distribution to be defined over a non‐rectangular area.  In these cases, the 
limits for  y  may be dependent on  x , or vice versa.  Care needs to be taken in these cases to 
ensure that the correct limits are used when integrating. 

Question 

Let  X  and  Y  have joint density function: 

f (x , y)  k(x2  xy) 0 y  x 2  

(i)  Calculate the value of  k . 

(ii)  Determine the PDFs of the marginal distributions for  X  and  Y . 

(ii)  Determine the conditional density function of  Y | X  x . 

The Actuarial Education Company  © IFE: 2020 Examinations 
Page 10b   CS1‐04: Joint distributions 

Solution 

(i)  Let us integrate with respect to  x  first, then  y . 

We see from the inequality  0  y  x  2  that, when  X  is considered as a variable, the 


limits for  X  will be from  X  y  to  X  2 .  Once  x  has been integrated out, the limits for 
y  will be from  y  0  to  y  2 . 

So, integrating first with respect to  x : 

2
2  x3 x2y   8  y
3
y3  8 5y 3 
 k(x  xy) dx  k  3  2   k  3  2y    3  2    k  3  2y  6 
2

y  y     

We now integrate this expression with respect to  y , using the limits 0 and 2: 

2 2
8 5y 3   8y 5y 4   16 10 
k    2y   dy  k   y2    k   4    6k
3 6    3 24   3 3 
0 0

1
Since this must be equal to 1, we see that  k  . 
6

(ii)  By integrating first with respect to  x , and setting  k  1 / 6 , we have already obtained the 


marginal distribution for  Y : 

18 5y 3 
fY (y)    2y   0 y 2
6  3 6 

To obtain the marginal distribution for  X , we must integrate first with respect to  y .  We 


see from the inequality given in the question that the limits for  y  are now 0 and  x .  So: 

x
 
x
1 1  1 1  1
fX (x)   k x 2  xy dy   x 2 y  xy2    x 3  x 3   x 3 0 x 2 
6 2 0 6  2  4
0

(iii)  The conditional PDF for  Y | X  x  is obtained by dividing the joint PDF by the marginal PDF 


for  X : 

fY|X (x , y) 
fX ,Y (x , y)
 6

1 2
x  xy 
21 y 
    0  y  2   
fX ( x ) 1 3 3  x x2 
x
4

© IFE: 2020 Examinations  The Actuarial Education Company 
CS1-04: Joint distributions Page 47

Chapter 4 Practice Questions


4.1 Let X and Y have joint density function given by:

f (x , y)  c(x  3y) 0  x  2, 0  y  2

(i) Calculate the value of c .

(ii) Hence, calculate P( X  1,Y  0.5) .

4.2 The continuous random variables X ,Y have the bivariate PDF:


Exam style
f (x , y)  2 x  y  1, x  0, y  0

(i) Derive the marginal PDF of Y . [2]

(ii) Use the result from part (i) to derive the conditional PDF of X given Y  y . [1]
[Total 3]

4.3 The continuous random variables X and Y have joint PDF:

f ( x , y) 
6

1 2
x  xy  0 y  x 2

(i) Determine the PDF of the conditional distribution X |Y  y .

(ii) Calculate the conditional probability P(1  X  1.5|Y  1) .

4.4 Show that, for the joint random variables M , N , where

m
P(M  m, N  n)  , for m  1, 2, 3, 4 and n  1, 2, 3
35  2n2

the conditional probability functions for M given N  n and for N given M  m are equal to the
corresponding marginal distributions.

4.5 Let X and Y have joint density function:

 
Exam style
4
fX ,Y (x , y)  3x2  xy 0  x  1, 0  y  1
5

Determine:

(i) the marginal density function of X [2]

(ii) the conditional density function of Y given X  x [1]

(iii) the covariance of X and Y . [5]


[Total 8]

The Actuarial Education Company © IFE: 2020 Examinations


Page 48 CS1-04: Joint distributions

4.6 Calculate the correlation coefficient of X and Y , where X and Y have the joint distribution:

0 1 2

1 0.1 0.1 0

Y 2 0.1 0.1 0.2

3 0.2 0.1 0.1

4.7 Claim sizes on a home insurance policy are normally distributed about a mean of £800 and with a
standard deviation of £100. Claims sizes on a car insurance policy are normally distributed about
a mean of £1,200 and with a standard deviation of £300. All claims sizes are assumed to be
independent.

To date, there have already been home claims amounting to £800, but no car claims. Calculate
the probability that after the next 4 home claims and 3 car claims the total size of car claims
exceeds the total size of the home claims.

4.8 Two discrete random variables, X and Y , have the following joint probability function:
Exam style
X

1 2 3

1 0.2 0 0.2

Y 2 0 0.2 0

3 0.2 0 0.2

Determine:

(i) E (X ) [1]

(ii) the probability distribution of Y | X  1 [1]

(iii) whether X and Y are correlated or not [2]

(iv) whether X and Y are independent or not. [1]


[Total 5]

© IFE: 2020 Examinations The Actuarial Education Company


CS1-04: Joint distributions Page 49

4.9 The random variables X and Y have joint density function given by:

kx  ey  1  x   ,1  y  

where   1,   0 , and k is a constant.

Derive an expression for k in terms of  and  .

2
4.10 Show using convolutions that if X and Y are independent random variables and X has a  m
distribution and Y has a n2 distribution, then X  Y has a  m
2
n distribution.

4.11 Let X be a random variable with mean 3 and standard deviation 2, and let Y be a random
variable with mean 4 and standard deviation 1. X and Y have a correlation coefficient of –0.3.
Exam style
Let Z  X  Y .

Calculate:

(i) cov( X , Z ) [2]

(ii) var( Z ) . [2]


[Total 4]

4.12 X has a Poisson distribution with mean 5 and Y has a Poisson distribution with mean 10. If
cov( X , Y )   12 , calculate the variance of Z where Z  X  2Y  3 . [2]
Exam style

4.13 Show that if X has a negative binomial distribution with parameters k and p , and Y has a
negative binomial distribution with parameters m and p , and X and Y are independent, then
X  Y also has a negative binomial distribution, and specify its parameters.

4.14 For a certain company, claim sizes on car policies are normally distributed about a mean of £1,800
and with standard deviation £300, whereas claim sizes on home policies are normally distributed
Exam style
about a mean of £1,200 and with standard deviation £500. Assuming independence among all
claim sizes, calculate the probability that a car claim is at least twice the size of a home claim. [4]

The Actuarial Education Company © IFE: 2020 Examinations


Page 50 CS1-04: Joint distributions

4.15 (i) Two discrete random variables, X and Y , have the following joint probability function:
Exam style
X

1 2 3 4

1 0.2 0 0.05 0.15


Y
2 0 0.3 0.1 0.2

Determine var( X |Y  2) . [3]

(ii) Let U and V have joint density function:


fU ,V (u , v)  48
67 
2uv  u2 0  u  1, u2  v  2

Determine E (U|V  v) . [3]


[Total 6]

© IFE: 2020 Examinations The Actuarial Education Company


CS1-04: Joint distributions Page 51

Chapter 4 Solutions

4.1 (i) Using the result   f (x , y) dx dy  1 gives:


yx

2 2 2 2
  c(x  3y) dxdy   c  12 x 2  3xy  dy
  x 0
y 0 x 0 y 0

2
  c(2  6y) dy
y 0

2
 c 2y  3y 2 
  y 0

 16c  1

 c  16
1

(ii) The probability is:

2 1
P( X  1,Y  0.5)    1 (x  3y) dx dy
16
y 0.5 x 0

2 1
1  1 x 2  3xy 
  16  2  x 0
dy
y 0.5

2
  
1 1  3y
16 2  dy
y 0.5

2
1  1 y  3 y2 
 16
2 2  y 0.5

 128
51  0.398

4.2 (i) The marginal PDF of Y is:

1 y
1 y
fY (y )   2 dx  2 x 0  2 1  y  , 0  y  1 [2]
0

(ii) The conditional PDF of X given Y  y is:

fX ,Y (x , y) 2 1
fX|Y  y (x , y)    , 0  x 1y [1]
fY (y) 2(1  y) 1  y

The Actuarial Education Company © IFE: 2020 Examinations


Page 52 CS1-04: Joint distributions

4.3 (i) We saw in Section 1.4 of the chapter that the marginal distribution for Y is:

18 5y 3 
fY (y)    2y   0 y 2
6  3 6 

So the PDF for the conditional distribution X |Y  y is the joint PDF divided by the
marginal PDF:

fX|Y y (x , y) 
 1 2
6
x  xy  
x 2  xy
0 x 2
18 5y 3  8 5y 3
  2 y    2 y 
6  3 6  3 6

(ii) So the general conditional probability is given by:

1.5
1
1.5
1  x 3 x2y 

2
P(1  X  1.5|Y  y)  x  xy dx    
8 5y 3 8 5y 3

 3 2 1
 2y  1  2y 
3 6 3 6


1.125  1.125y   1 / 3  y / 2 
8 5y 3
 2y 
3 6

Substituting in y  1 , we obtain:

P(1  X  1.5|Y  1) 
1.125  1.125  1 / 3  1 / 2  1.4167  0.3696
8 5 3.8333
2
3 6

4.4 In the chapter, we found that the marginal probability functions for M and N were:

m
PM (m)  for m  1, 2, 3, 4
10

and:

1
PN (n)  for n  1, 2, 3
7  2n 3

So, dividing the joint probability function by the marginal probability function for N , we obtain:

P (m, n)  m   1  m
PM|N n (m, n)  M ,N     , m  1, 2, 3, 4

PN (n)  35  2n 2
  7  2n3  10

for the conditional probability function of M given N  n .

© IFE: 2020 Examinations The Actuarial Education Company


CS1-04: Joint distributions Page 53

Similarly:

P(N  n, M  m)  m  m 1
PN|M m (m, n)      , n  1, 2, 3
P(M  m)  35  2n2  10 7  2n3

is the conditional probability function of N given M  m .

These are identical to the marginal distributions obtained in the chapter text.

4.5 (i) Marginal density

1 1
fX ( x )  
4
5
 4 
 1 
3x 2  xy dy    3x2 y  xy 2  
5 2
4 1 
  3x 2  x 
  y 0 5  2 
[2]
y 0

(ii) Conditional density

fY|X  x (x , y) 
fX ,Y (x , y)

4
5  3 x 2  xy  3x 2  xy 3 x  y
  [1]
5 2 
fX ( x ) 4 3x 2  1 x 3 x 2 1
 x 3x  21
2

(iii) Covariance

Using the marginal density function of X :

1 1
4 3 1 2 4 3 1  11
E(X )   5 
3x  x  dx   x 4  x 3 
2  5 4

6  x 0 15
[1]
x 0

Obtaining the marginal density function of Y :

1 1
fY (y)  
4
5
 4
 1 
3x 2  xy dx   x 3  x 2 y 
5 2
4 1 
 1  y 
 x 0 5  2 
x 0

1 1
4 1 2 4 1 2 1 3  8
 E (Y )    y  y  dy   y  y 
5 2  5 2

6  y 0 15
[1]
y 0

The Actuarial Education Company © IFE: 2020 Examinations


Page 54 CS1-04: Joint distributions

Now:

1 1
E ( XY )   
4
5
 
3x 3y  x 2 y 2 dy dx
x  0 y 0

1 1
4 3 3 2 1 2 3
  5  2
x y  x y 
3  y 0
dx
x 0
1
43 3 1 2
  5  2
x  x  dx
3 
x 0
1
4 3 1 
  x4  x3 
5 8 9  x 0

7
 [2]
18

Hence:

7 11 8 1
cov( X ,Y )     [1]
18 15 15 450

4.6 The covariance of X and Y was obtained in Section 2.4 to be cov( X ,Y )  0.02 . The variances of
the marginal distributions are:

var( X )  E ( X 2 )  E ( X )  02  0.4  12  0.3  22  0.3  (0.9)2  0.69


2

var(Y )  E (Y 2 )  E (Y )  12  0.2  22  0.4  32  0.4   2.2   0.56


2 2
and:

So the correlation coefficient is:

cov( X ,Y ) 0.02
corr ( X ,Y )    0.0322
var( X )var(Y ) 0.69  0.56

4.7 Let X be the amount of a home insurance claim and Y the amount of a car insurance claim.
Then:

X  N(800,1002 ) Y  N(1200,3002 )

We require:

P  (Y1  Y2  Y3 )  ( X1  X2  X 3  X 4 )  800 

 P  (Y1  Y2  Y3 )  ( X1  X2  X 3  X 4 )  800 

© IFE: 2020 Examinations The Actuarial Education Company


CS1-04: Joint distributions Page 55

So we need the distribution of (Y1  Y2  Y3 )  ( X1  X 2  X 3  X 4 ) :

(Y1  Y2  Y3 )  ( X1  X2  X 3  X 4 ) ~ N(3  1200  4  800, 3  3002  4  1002 )

ie (Y1  Y2  Y3 )  ( X 1  X 2  X 3  X 4 ) ~ N (400, 310 000)

Therefore:

 800  400 
P  (Y1  Y2  Y3 )  ( X1  X2  X3  X 4 )  800   P  Z  
 310,000 

 P(Z  0.718)

 1  P(Z  0.718)  0.236

4.8 (i) Mean

E ( X )  1  0.4  2  0.2  3  0.4  2 [1]

Alternatively, we could use the fact that the distribution of X is symmetrical about 2.

(ii) Probability distribution of Y | X  1

P( X  1,Y  y)
Using P(Y  y | X  1)  and P ( X  1)  0.4 gives:
P( X  1)

Y  1| X  1 Y  2| X  1 Y  3| X  1

0.5 0 0.5
[1]

(iii) Correlated?

To calculate the correlation coefficient, we first require the covariance.

E(X )  2 from part (i)

E (Y )  1  0.4  2  0.2  3  0.4  2

E ( XY )  1  0.2  2  0  3  0.2    3  0.2  6  0  9  0.2  4

So cov( X , Y )  E ( XY )  E ( X )E (Y )  4  2  2  0 . [1]

cov( X ,Y )
Hence corr( X ,Y )  0.
var( X )var(Y )

Therefore X and Y are uncorrelated. [1]

The Actuarial Education Company © IFE: 2020 Examinations


Page 56 CS1-04: Joint distributions

(iv) Independent?

X and Y are independent if P( X  x ,Y  y)  P( X  x)P(Y  y) for all x and y.

However P ( X  1, Y  1)  0.2  0.4  0.4  P ( X  1)P (Y  1) .

So X and Y are not independent. [1]

4.9 Since the PDF must integrate to 1:

 
  kx  e  y /  dx dy  1
y 1 x 1

Integrating over the x values gives:

 
  y /  y / 
 x  1  key / 
 kx e dx  ke   
   1 1  1
x 1

Integrating this over the y values gives:


ke  y /  k   k e 1/ 
 dy   ey /   
y 1
 1  1  1  1

Equating this to 1:

k  e 1/  (  1)e1/ 
1  k 
 1 

4.10 The chi-square distribution is a continuous distribution that can take any positive value. The
chi-square distribution with parameter m is the same as a gamma distribution with parameters
m / 2 and 1 / 2.

So, using the PDF of the gamma distribution, the PDF of the sum Z  X  Y is given by the
convolution formula:

fZ (z)   fX (x) fY (z  x)dx

z
(½)½m ½m1 ½ x (½)½n
 x e (z  x)½n1 e ½(z  x )dx
(½m) (½n)
0
½(m n) z
1 1
  e ½ z  x ½m1 (z  x)½n1 dx
2 (½m)(½n)
0

© IFE: 2020 Examinations The Actuarial Education Company


CS1-04: Joint distributions Page 57

Using the substitution t  x / z gives:

1
1
fZ (z)  (½)½(mn) e ½ z  (zt )½m1 (z  zt )½n1 zdt
(½m)(½n)
0

(½)½(mn) ½(mn)1 ½ z (½m  ½n) ½m1


1
 z e  t (1  t )½n1 dt
(½m  ½n) (½m)(½n)
0

Since the last integral represents the total probability for the Beta(½m,½n) distribution, we get:

(½)½(m  n) ½(m  n)1 ½ z


fZ ( z )  z e  P[0  Beta(½m ,½n)  1]
(½m  ½n)

(½)½(m  n) ½(m  n)1 ½ z


 z e
(½m  ½n)

2
Since this matches the PDF of the  m n distribution (and Z can take any positive value), Z is a
2
m n random variable.

It is much easier to prove this result using MGFs.

4.11 (i) Covariance

We have:

cov( X , Z )  cov( X , X  Y )

 cov( X , X )  cov( X ,Y )

 var( X )  cov( X ,Y )

Using the correlation coefficient between X and Y gives:

cov( X ,Y ) cov( X ,Y )
corr( X ,Y )  0.3  
var( X )var(Y ) 4 1

 cov( X ,Y )  0.6

Hence:

cov( X , Z )  4  0.6  3.4 [2]

The Actuarial Education Company © IFE: 2020 Examinations


Page 58 CS1-04: Joint distributions

(ii) Variance

Using var( Z )  cov( Z , Z ) :

var(Z )  cov( X  Y , X  Y )

 cov( X , X )  2cov( X ,Y )  cov(Y ,Y )

 var( X )  2cov( X ,Y )  var(Y )


 4  2  0.6  1

 3.8 [2]

Note: var( Z )  var( X )  var(Y ) as X and Y are not independent.

4.12 The 3 term does not affect the variance, so:

var( Z )  var( X  2Y  3)  var( X  2Y )

Now:

var( X  Y )  var( X )  var(Y )  2 cov( X , Y )

and:

cov(aX , bY )  ab cov( X , Y )

So:

var( X  2Y )  var( X )  4 var(Y )  2  2 cov( X , Y ) [1]

 5  4  10  4  (12)  93 [1]

4.13 The moment generating function of X is:

k
 pet 
M X (t )   
 1  qet 
 

Similarly, the MGF of Y is:

m
 pet 
MY (t )   
 1  qet 
 

Since X and Y are independent, we have:

k m k m
 pet   pet   pet 
M X Y (t )  M X (t )MY (t )       
 1  qet   1  qet   1  qet 
     

This is the MGF of another negative binomial distribution with parameters p and k  m . Hence,
by uniqueness of MGFs, X  Y has this distribution.

© IFE: 2020 Examinations The Actuarial Education Company


CS1-04: Joint distributions Page 59

4.14 Let X be the claim size on car policies, so that X  N 1800,3002 .  


Let Y be the claim size on home policies, so that Y  N 1200,5002 .  
We want:

P ( X  2Y )  P ( X  2Y  0) [1]

So we need the distribution of X  2Y :

X  2Y  N(1800  2  1200 , 3002  4  5002 )

X  2Y  N  600,1090000  [2]

Standardising:

0  (600)
z  0.575
1,090,000

So:

P( X  2Y  0)  P(Z  0.575)  1  P(Z  0.575)

 1  0.71735  0.283 [1]

4.15 (i) Conditional variance

var( X |Y  2)  E( X 2 |Y  2)  E 2 ( X |Y  2)

P( X  x  Y  2)
E ( X |Y  2)   xP( X  x |Y  2)   x
P(Y  2)
0 0.3 0.1 0.2
 1  2  3  4
0.6 0.6 0.6 0.6
5
2 [1]
6

P( X  x  Y  2)
E ( X 2 |Y  2)   x 2P( X  x |Y  2)   x 2
P(Y  2)
0 0.3 2 0.1 2 0.2
 12   22  3  4 
0.6 0.6 0.6 0.6
5
8 [1]
6

2
5  5  29
So var( X |Y  2)  8   2    0.80556 . [1]
6  6  36

The Actuarial Education Company © IFE: 2020 Examinations


Page 60 CS1-04: Joint distributions

(ii) Conditional expectation

We require:

E (U|V  v)   u f (u|v) du
u

Now:

1 1
48 (2uv  u2 ) du  48 u2v  1 u3  v  1 
f (v)   67 67  3  u 0
 48
67  3
[1]
u 0

2
f (u , v ) 67 (2uv  u ) 2uv  u2
48
 f (u|v)    [1]
f (v) 48 (v  1 )
67 3
v  13

So:

1
1
2u2v  u3  2 u3v  1 u4  2v 1
E (U |V  v)   du   3 4  3 4 [1]
v  1  v  1  v  1
u 0 3  3 u 0 3

© IFE: 2020 Examinations The Actuarial Education Company


CS1‐05: Conditional expectation  Page 5 

Hence: 

2 3x
x  y 2
xy
 dy   y
E[Y | X  x ]  y 5 dy
y 0
6x
5  x  1 y 0
2( x  1)

2
2
xy  y 2  1 xy 2  1 y 3 
  dy   2 3   
2( x  1)  2( x  1) 
y 0   y 0

2 x  83 x  43 3x  4
  
2( x  1) x 1 3( x  1)

 
We can also calculate conditional expectations in the case where the limits for one variable 
depend on the other. 

Question 

Let  X  and  Y  have joint density function given by: 

  fX ,Y (x , y) 
1 2
6

x  xy  0 y  x 2 

Determine the conditional expectation  E[Y | X  x ] . 

Solution 

We saw in the previous chapter in Section 1.4 that in this case the conditional distribution 
Y | X  x  has PDF: 

2 1 y 
fY|X (x , y)     0  y 2  
3  x x2 

So the conditional expectation is given by: 

2
2
2 1 y 
2
2 y 2y 2  y 2 2y 3  4 16
  E (Y | X  x)   y    dy    dy        
2
3 x x  2 2 2
0 0
3x 3x  3 x 9 x  0 3 x 9 x

 
 

The Actuarial Education Company  © IFE: 2020 Examinations 
Page 6   CS1‐05: Conditional expectation 

2 The random variable  E [Y | X ]  
The conditional expectation E [ Y | X  x ]  g ( x ) , say, is, in general, a function of x . It can
be thought of as the observed value of a random variable g ( X ) . The random variable g ( X )
is denoted E [ Y | X ] .

3x  4
We saw in a previous question that  E[Y | X  x ]  ,which is a function of x.  So  E[Y | X ]  in 
3(x  1)
3X  4
this example is the random variable  . 
3( X  1)

Observe that, although  E[Y | X ]  is an expectation, it does not have a single numerical value, as it is 


a function of the random variable  X . 

Note: E [ Y | X ] is also referred to as the regression of Y on X .

In a later chapter the regression line will be defined as  E [Y | x ]     x . 

E [ Y | X ] , like any other function of X , has its own distribution, whose properties depend
on those of the distribution of X itself. Of particular importance is the expected value (the
mean) of the distribution of E [ Y | X ] . The usefulness of considering this expected value,
E [ E [ Y | X ] ] , comes from the following result, proved here in the case of continuous
variables, but true in general.

Theorem: E [ E [ Y | X ]]  E [ Y ]

Proof:


E [ E [ Y | X ] ]  E [ Y | x ] f X ( x ) dx

    y f ( y | x ) dy  f X ( x ) dx
   y f ( x , y ) dx dy  E [ Y ]
We are integrating here over all possible values of  x  and  y . 

Here  f (y | x)  represents the density function of the conditional distribution of  Y | X  x .  This was 


written as  fY|X (x , y)  in Chapter 4. 

f (x , y)
The last two steps follow by noting that  f (y | x)   and   f (x , y) dx  fY (y)  ie the marginal 
fX ( x )
PDF of  Y . 

This formula is given on page 16 of the Tables. 

© IFE: 2020 Examinations  The Actuarial Education Company 
CS1‐06: The Central Limit Theorem  Page 15 

Another method of comparing the distribution of our sample means, x , with the normal
distribution is to examine the quantiles.

In R we can find the quantiles of x using the quantile function. Using the default setting
(type 7) to obtain the sample lower quartile, median and upper quartile gives 4.775, 5.000
and 5.250, respectively. However in Subject CS1 we prefer to use type 5 or type 6.

In R, we can find the quartiles of the normal distribution using the qnorm function. This
gives a lower quartile, median and upper quartile of 4.762, 5.000 and 5.238, respectively.

The quantiles obtained here are those of a normal distribution with mean 5 and variance 5/40. 

There is no universal agreement amongst statisticians over how to define sample quantiles.  The 
n1
lower quartile, for example, is sometimes defined to be the position of the  th  sample value, 
4
n2 n3
where  n  is the sample size.  Others may use the  th  sample value, or even the  th  
4 4
value. 

n3 3n  1
In R, if we do not specify, R will use   and   for the lower and upper quartiles.  Other 
4 4
definitions can be used by specifying them in the R code.  In fact, when we use R, we are often 
using quite large sample sizes, in which case the differences between the different definitions will 
be minimal. 

The Actuarial Education Company  © IFE: 2020 Examinations 
Page 16   CS1‐06: The Central Limit Theorem 

We observe that our distribution of the sample means is slightly more spread out in the tails
– which is what we observed in the previous diagram.

A quick way to compare all the quantiles in one go is by drawing a QQ-plot using the R
function qqnorm.

If our sample quantiles coincide with the quantiles of the normal distribution we would
observe a perfect diagonal line (which we have added to the diagram for clarity). For our
example we can see that x and the normal distribution are very similar except in the tails
where we see that x has a lighter lower tail and a heavier upper tail than the normal
distribution.

The QQ plot gives us sample quantiles which are very close to the diagonal line. 

Care needs to be taken when interpreting a QQ plot.  In this example, we see that, at the top end 
of the distribution, the sample quantiles are slightly larger than we would expect them to be.  This 
suggests that our sample has slightly more weight in the upper tail than the corresponding normal 
distribution. 

At the lower end, again the sample quantiles are slightly larger than we would expect.  This 
suggests that our sample has slightly less weight in the lower tail than the corresponding normal 
distribution.  This might be the case if the sample distribution was (very slightly) positively 
skewed. 

If we use R to calculate the coefficient of skewness for this sample, we obtain a figure of 0.0731.  
This confirms the very slight positive sample skewness. 
 
   

© IFE: 2020 Examinations  The Actuarial Education Company 
Page 16 CS1-07: Sampling and statistical inference

5 The F result for variance ratios


U / v1
The F distribution is defined by F  , where U and V are independent  2 random
V / v2
variables with v1 and v 2 degrees of freedom respectively. Thus if independent random
samples of size n1 and n2 respectively are taken from normal populations with variances
S12 /  12
 12 and  22 , then ~ Fn1 1,n2 1 .
S22 /  22

The F distribution gives us the distribution of the variance ratio for two normal populations. v1
and v2 can be referred to as the number of degrees of freedom in the numerator and
denominator respectively.

It should be noted that it is arbitrary which one is the numerator and which is the
S22 /  22
denominator and so ~ Fn2 1,n1 1 .
S12 /  12

Since it is arbitrary which value is the numerator and which is the denominator, and since only the
upper critical points are tabulated, it is usually easier to put the larger value of the sample
variance into the numerator and the smaller sample variance into the denominator.

1
Alternatively, F ~ Fn1 1,n2 1  ~ Fn2 1,n1 1 .
F

This reciprocal form is needed when using tables of critical points, as only upper tail points
are tabulated. See ‘Formulae and Tables’.

This is an important result and will be used in Chapter 9 in the work on confidence intervals and
Chapter 10 in the work on hypothesis tests.

The percentage points for the F distribution can be found on pages 170-174 of the Tables.

© IFE: 2020 Examinations The Actuarial Education Company


CS1-07: Sampling and statistical inference Page 17

Question

Determine:

(i) P(F9,10  3.779)

(ii) P(F12,14  3.8)

(iii) P(F11,8  0.3392)

(iv) the value of p such that P(F14,6  p)  0.01 .

Solution

By referring to the Tables on pages 170 to 174:

(i) 3.779 is greater than 1, so we simply use the upper critical values given:

P(F9,10  3.779)  0.025

since 3.779 is the 2½% point of the F9,10 distribution (page 173).

(ii) Since 3.8 is greater than 1, it is again an upper value and so we use the Tables directly.
We simply turn the probability around:

P(F12,14  3.8)  1  P(F12,14  3.8)  1  0.01  0.99

1
(iii) Since this is a lower critical point we need to use the result:
Fm,n

 1 1   1 
P(F11,8  0.3392)  P    P  F8,11    P(F8,11  2.948)  0.05
 F11,8 0.3392   0.3392 
 

(iv) Since only 1% of the distribution is below p, this implies that it must be a lower critical
1
point and so we use the result again:
Fm,n

 1 1
P(F14,6  p)  P  F6,14    0.01   4.456  p  0.2244
 p p

The mean of the F distribution is 1, regardless of the number of degrees of freedom. So values
such as 0.3392 and 0.2244 given above are values in the lower tail, whereas 3.779 and 3.8 are
upper tail values.

We now apply the F result to problems involving sample variances.

The Actuarial Education Company © IFE: 2020 Examinations


CS1‐08: Point estimation  Page 37 

We can use the following R code to obtain a single resample with replacement from this
original sample.

sample.data <- c(0.61, 6.47, 2.56, 5.44, 2.72, 0.87, 2.77, 6.00,
0.14, 0.75)

sample(sample.data, replace=TRUE)

If we do this, R automatically gives us a sample of the same size as the original data sample, ie we 
obtain a sample of size 10 in this case. 

Note that this is non-parametric as we are ignoring the Exp( ) assumption to obtain a new
sample.

) using ˆ j  1 y j and
* *
The following R code obtains B  1,000 estimates (ˆ1* , ˆ2* ,..., ˆ1,000
*

stores them in the vector estimate:

set.seed(47)

estimate<-rep(0,1000)

for (i in 1:1000)

{x<-sample(sample.data, replace=TRUE);

estimate[i]<-1/mean(x)}

An alternative would be to use:

set.seed(47)

estimate <- replicate(1000, 1/mean(sample(sample.data,


replace=TRUE)))

The Actuarial Education Company  © IFE: 2020 Examinations 
Page 38   CS1‐08: Point estimation 

This gives us the following empirical sampling distribution of ̂ :

We can obtain estimates for the mean, standard error and 95% confidence interval of the
estimator ̂ using the following R code:

mean(estimate)

sd(estimate)

quantile(estimate, c(0.025,0.975))

7.3 Parametric bootstrap  
If we are prepared to assume that the sample is considered to come from a given
distribution, we first obtain an estimate of the parameter of interest ˆ (eg using maximum
likelihood, or method of moments). Then we use the assumed distribution, with parameter
equal to ˆ , to draw the bootstrap samples. Once the bootstrap samples are available, we
proceed as with the non-parametric method before.

Example 
Using our sample of 10 values (to 2 DP) from an Exp( ) distribution with unknown
parameter  :

0.61 6.47 2.56 5.44 2.72 0.87 2.77 6.00 0.14 0.75

our estimate would for  would be ˆ  1 y  1 2.833  0.3530 . We now use the Exp(0.3530)
distribution to generate the bootstrap samples.

© IFE: 2020 Examinations  The Actuarial Education Company 
CS1‐08: Point estimation  Page 57 

8.10 (i)  Range of values 

Since  0  P(X  x)  1 , using this for each of the probabilities gives lower bounds for    of 


1 1 3 1 7 1 5
 ,   and   .  Hence,     .  We also obtain upper bounds for    of  ,  and  .  
16 6 8 16 16 6 8
1
Hence,    . 
6

(ii)  Method of moments estimator 

We have one unknown, so we will use  E(X )  x . 

      
E ( X )  2 18  2  4 12  3  5 83    33
8
 3   
From the data, we have: 

7  2  6  4  17  5 123
  x   4.1  
30 30

Therefore: 

  33  3ˆ  4.1
8
 ˆ  0.0083  

This value lies between the limits derived in part (i). 

(iii)  Maximum likelihood 

The likelihood of obtaining the observed results is: 

     
7 6 17
  L( )  constant  18  2  12  3  83    

Taking logs and differentiating gives: 

  
 ln L( )  constant  7ln 18  2  6ln 12  3  17ln 83     
d 14 18 17
 ln L       
d 1  2 2  3 8  
1 3
8

Equating this to zero to find the maximum value of    gives: 

14 18 17
  0
1  2ˆ 1  3ˆ 3  ˆ
8 2 8


 14 12  3ˆ   83  ˆ   18  18  2ˆ  83  ˆ   17  18  2ˆ   12  3ˆ   0
    
3  5 ˆ  3ˆ 2  18 3  7 ˆ  2ˆ 2  17 1  5 ˆ  6ˆ 2  0
 14 16 8 64 8 16 8 
 180ˆ 2  111
8
ˆ  32
91  0  

The Actuarial Education Company  © IFE: 2020 Examinations 
Page 58   CS1‐08: Point estimation 

(iv)  MLE 

Solving the quadratic equation gives: 

 1118 
2
 111
8
  4  180   32
91
  ˆ   0.170,0.0929  
360

The maximum likelihood estimate is 0.0929. 

The other solution of  0.170  does not lie between the bounds calculated in (i).  It is not feasible 


as it is less than the smallest possible value for    of  0.0625  . 

8.11 (i)  Expected results using method of moments (Poisson) 

The sample mean is: 

1
   (87,889  0  11,000  1  1,000  2 )  0.13345  
100,000

The mean of the  Poi( )  distribution is   . 

So the method of moments estimate of    is 0.13345.  [1] 

For the Poisson distribution, probabilities can be calculated iteratively using the relationship: 


  P( X  x )   P( X  x  1), x  1, 2, 3, ...   
x

The expected numbers, based on this estimate, are: 

  x  0 :    100,000 e 0.13345  87,507  

  x  1 :    0.13345  87,507  11,678  

0.13345
  x  2 :     11,678  779  
2

0.13345
  x  3 :     779  35  
3

0.13345
  x  4 :     35  1  
4

0.13345
  x  5 :    1  0  
5

  x  6 :    100,000  87,507  11,678  779  35  1  0  0   [2] 

   

© IFE: 2020 Examinations  The Actuarial Education Company 
Page 54   CS1‐10: Hypothesis testing 

Question 

A certain company employs both graduates and non‐graduates.  A small sample of employees are 
entered for a certain test, with the following results.  Of the four graduates taking the test, all 
passed.  Of the eight non‐graduates taking the test, five passed.  Using Fisher’s exact test, assess 
whether graduates are more likely to pass the test than non‐graduates. 

Solution 

9
Given that we had nine passes, the number of ways of choosing four graduates to pass is    .  
4
 3
Given that we had three fails, the number of ways of choosing no graduates to fail is    .  The 
0
 12 
total number of ways of choosing four graduates out of 12 employees is    . 
4

So the probability of obtaining four graduate passes is: 

9  3
  
   4   0   14  0.2545  
 12  55
 
4

Since we cannot obtain more than four graduate passes when we only have four graduates, this is 
the most extreme result possible, and the total probability of obtaining as extreme a result as this 
is 0.2545.  Since this is not less than 5%, we have insufficient evidence to conclude that graduates 
are more likely to pass than non‐graduates. 

We can see that in this case it will never be possible to obtain a significant result, based on the 
small sample numbers we have here.  Fisher’s exact test needs much bigger samples for it to be 
usable to obtain satisfactory statistical results. 

 
 

© IFE: 2020 Examinations  The Actuarial Education Company 
All study material produced by ActEd is copyright and is sold
for the exclusive use of the purchaser. The copyright is
owned by Institute and Faculty Education Limited, a
subsidiary of the Institute and Faculty of Actuaries.

Unless prior authority is granted by ActEd, you may not hire


out, lend, give out, sell, store or transmit electronically or
photocopy any part of the study material.

You must take care of your study material to ensure that it


is not used or copied by anybody else.

Legal action will be taken if these terms are infringed. In


addition, we may seek to take disciplinary action through
the profession or through your employer.

These conditions remain in force after you have finished


using the course.

© IFE: 2020 Examinations The Actuarial Education Company


CS1-13: Generalised linear models Page 45

Suppose we are modelling the number of claims on a motor insurance portfolio and we have data
on the driver’s age, sex and vehicle group. We would start with the null model (ie a single
constant equal to the sample mean) then we would try each of single covariate models (linear
function of age or the factors sex or vehicle group) to see which produces the most significant
improvement in a  2 test or reduces the AIC the most. Suppose this was sex. Then we would try
adding a second covariate (linear function of age or the factor vehicle group). Suppose this was
age. Then we would try adding the third covariate (vehicle group). Then maybe we would try a
quadratic function of the variable age (and maybe higher powers) or each of 2 term interactions
(eg sex*age or sex*group or age*group). Finally we would try the 3 term interaction
(ie sex*age*group).

(2) Backward selection. Start by adding all available covariates and interactions. Then
remove covariates one by one starting with the least significant until the AIC reaches a
minimum or there is no significant improvement in the deviance, and all the remaining
covariates have a statistically significant impact on the response.

So with the last example we would start with the 3 term interaction sex*age*group and look at
which parameter has the largest p-value (in a test of it being zero) and remove that. We should
see a significant improvement in a  2 test and the AIC should fall. Then we remove the next
parameter with the largest p-value and so on.

The Core Reading uses R to demonstrate this procedure. Whilst this will be covered in the CS1
PBOR, it’s important to understand the process here.

Example

We demonstrate both of these methods in R using a binomial model on the mtcars dataset
from the MASS package to determine whether a car has a V engine or an S engine (vs)
using weight in 1000 lbs (wt) and engine displacement in cubic inches (disp) as covariates.

Forward selection

Starting with the null model:

model0 <- glm(vs ~ 1, data=mtcars, family=binomial)

The AIC of this model (which would be displayed using summary(model0)) is 45.86.

We have to choose whether we add disp or wt first. We try each and see which has the
greatest improvement in the deviance.

model1 <- update(model0, ~.+ disp)

anova(model0, model1, test="Chi")

model2 <- update(model0, ~.+ wt)

The Actuarial Education Company © IFE: 2020 Examinations


Page 46 CS1-13: Generalised linear models

anova(model0, model2, test="Chi")

So we can see that disp has produced the more significant result – so we add that
covariate first.

Note that R always calls the models we are comparing ‘Model 1’ and ‘Model 2’, irrespective of
how we have named them. This can lead to confusion if we are not careful.

The AIC of model 1 (adding disp) is 26.7 whereas the AIC of model 2 (adding wt) is 35.37.
Therefore adding disp reduces the AIC more from model 0’s value of 45.86.

Let us now see if adding wt to disp produces a significant improvement:

model3 <- update(model1, ~.+ wt)

anova(model1, model3, test="Chi")

This has not led to a significant improvement in the deviance so we would not add wt (and
therefore we definitely would not add an interaction term between disp and wt).

The AIC of model 3 (adding wt) is 27.4 which is worse than model 1’s AIC of 26.7. Therefore we
would not add it.

Incidentally the AIC for models 0, 1, 2, 3 are 45.86, 26.7, 35.37 and 27.4. So using these
would have given the same results (as Model 1 produces a smaller AIC than Model 2, and
then Model 3 increases the AIC and so we would not have selected it).

Backward selection

Starting with all the possibilities:

modelA <- glm(vs ~ wt * disp, data=mtcars, family=binomial)

The output is:

None of these covariates is significant – so removing the interaction term wt:disp which is
the least significant.

© IFE: 2020 Examinations The Actuarial Education Company


CS1-13: Generalised linear models Page 47

The parameter of the interaction term has the highest p-value of 0.829 and so is most likely to be
zero.

modelB <- update(model1, ~.-wt:disp)

The AIC has fallen from 29.361 to 27.4.

Alternatively, carrying out a  2 test using anova(modelA, modelB, test="Chi") would


show that there is no significant difference between the models (p-value of 0.8417) and therefore
we are correct to remove the interaction term between wt and disp.

The wt term is not significant so removing that:

modelC <- update(modelB, ~.-wt)

Both of these coefficients are significant and the AIC has fallen from 27.4 to 26.696.

Alternatively, carrying out a  2 test using anova(modelB, modelC, test="Chi") would


show that there is no significant difference between the models (p-value of 0.255) and therefore
we are correct to remove the wt covariate.

We would stop at this model. Had we removed the disp term (to give the null model) the
AIC increases to 45.86.

Alternatively, carrying out a  2 test between these two models would show a very significant
difference (p-value of less than 0.001) and therefore we should not remove the disp covariate.

We can see that both forward and backward selection lead to the same model being chosen.

The Actuarial Education Company © IFE: 2020 Examinations


Page 48 CS1-13: Generalised linear models

5.7 Estimating the response variable


Once we have obtained our model and its estimates, we are then able to calculate the value
of the linear predictor,  , and by using the inverse of the link function we can calculate our
estimate of the response variable ˆ  g 1 (ˆ ) .

Substituting the estimated parameters into the linear predictor gives the estimated value of the
linear predictor for different individuals. Now the link function links the linear predictor to the
mean of the distribution. Hence we can obtain an estimate for the mean of the distribution of Y
for that individual.

Let’s now return to the Core Reading example on page 45.

Suppose, we wish to estimate the probability of having a V engine for a car with weight
2100 lbs and displacement 180 cubic inches.

Using our linear predictor  0  1  disp (ie vs ~ disp) we obtained estimates of


ˆ  4.137827 and ˆ  0.021600 .
0 1

These coefficients displayed as part of the summary output of Model C in the example above.

Hence, for displacement 180 we have ˆ  4.137827  0.021600  180  0.24983 . We did not
specify the link function so we shall use the canonical binomial link function which is the
logit function.

 ˆ  e 0.24983
0.24983  log    ˆ   0.562
 1  ˆ  1  e 0.24983

Recall that the mean for a binomial model is the probability. So the probability of having a V
engine for a car with weight 2,100 lbs and displacement 180 cubic inches is 56.2%.

Because we removed the weight covariate, the figure 2,100 does not enter the calculation.

In R we can obtain this as follows:

newdata <-data.frame(disp=180)

predict(model,newdata,type="response")

© IFE: 2020 Examinations The Actuarial Education Company


CS1: Assignment X1 Questions  Page 1  

X1.1 An actuarial student has said that the following three distributions are the same: 

(i)  the chi square distribution with 2 degrees of freedom 

(ii)  the exponential distribution with mean ½ 

(iii)  the gamma distribution with    1  and    ½ . 

State with reasons whether the student is correct.  [2] 

X1.2 The number of telephone calls per hour on a working day received at an insurance office follows a 
Poisson distribution with mean 2.5. 

(i)  Calculate the probability that more than 7 telephone calls are received on a working day 
between 9am and 11am.  [1] 

(ii)  Calculate the probability that, if the office opens at 8am, there are no telephone calls 
received until after 9am.  [2] 
        [Total 3] 

X1.3 Outline the three key forms of data analysis.  [3] 

X1.4 A random variable  X  has probability density function: 

2  52
  f (x)  , x 0 
(5  x)3

(i)  Determine an expression for the distribution function,  F (x) .  [1] 

(ii)  Calculate two simulated observations from the distribution using the random numbers 
0.656 and 0.285 selected from the  U(0,1)  distribution.  [3] 
        [Total 4] 

X1.5 The random variable  X  has a beta distribution with parameters    1  and    4 . 

(i)  State the value of  E ( X ) .  [1] 

(ii)  Determine the median of  X .  [3] 

(iii)  Hence comment on the shape of this distribution.  [1] 


        [Total 5] 

   

The Actuarial Education Company  © IFE: 2020 Examinations 
Page 2   CS1: Assignment X1 Questions 

X1.6 A large life office has 1,000 policyholders, each of whom has a probability of 0.01 of dying during 
the next year (independently of all other policyholders). 

(i)  Derive a recursive relationship for the binomial distribution of the form: 

    P(X  x)  kg(x) P(X  x  1)  

  where  k  is a constant and  g(x)  is a function of  x .  [2] 

(ii)  Calculate the probabilities of the following events: 

  (a)  there will be no deaths during the year  

  (b)  there will be more than two deaths during the year 

  (c)  there will be exactly twenty deaths during the year.  [3] 


        [Total 5] 

X1.7 On a portfolio of insurance policies, the claim size,  Y  is assumed to depend on the age of the 


policyholder,  X .  Suppose that the conditional mean and variance of  Y  are: 

  E(Y | X  x)  2x  400  

x2
  var(Y | X  x)   
2

The distribution of  X  over the portfolio is assumed to be normal with mean 50 and standard 
deviation 14. 

Calculate the unconditional mean and standard deviation of  Y .   [5] 

X1.8 (i)  For a pair of jointly distributed random variables  X  and  Y , derive the result: 

    var( X  Y )  var(X )  var(Y )  2cov( X ,Y )   [2] 

The random variables X and Y are jointly distributed with standard deviations of 5 and 7 
respectively and  corr( X ,Y )   3 7 . 

(ii)  Calculate the standard deviation of  3 X  2Y  5 .  [3] 


        [Total 5] 

   

© IFE: 2020 Examinations   The Actuarial Education Company 
CS1: Assignment X1 Solutions Page 1

Assignment X1 Solutions

Markers: This document sets out one approach to solving each of the questions (sometimes with
alternatives). Please give credit for any other valid approaches.

Solution X1.1

The gamma, exponential and  2 distributions are covered in Chapter 2.

By definition, the  22 distribution is the same as the Gamma(1, ½) distribution. [1]

The exponential distribution with mean ½ has parameter 2. This is a Gamma(1, 2) distribution,
and so is not equivalent to the other two. [1]

Therefore the student is wrong. [Total 2]

Be careful to distinguish between the parameter  and the mean 1  for the exponential
distribution.

Solution X1.2

The Poisson process is covered in Chapter 2.

(i) Probability of more than 7 calls

The number of telephone calls, N , in a two hour period also follows a Poisson distribution:

N  Poi(2  2.5)  Poi(5) [½]

Using the Poisson tables:

P(N  7)  1  P(N  7)  1  0.86663  0.13337 [½]


[Total 1]

(ii) Probability of no calls until after 9am

The waiting time, T , in hours for the first telephone call has an exponential distribution:

T  Exp(2.5) [1]

Using the cumulative distribution function of the exponential distribution:

P(T  1)  1  P(T  1)  1  F (1)  1  (1  e 2.51 )  e 2.5  0.08208 [1]


[Total 2]

The Actuarial Education Company © IFE: 2020 Examinations


Page 2 CS1: Assignment X1 Solutions

Solution X1.3

This material appears in Section 1 of Chapter 1.

The three key forms of data analysis are:


 descriptive analysis: producing summary statistics (eg measures of central tendency and
dispersion) and presenting the data in a simpler format [1]
 inferential analysis: using a data sample to estimate summary parameters for the wider
population from which the sample was taken, and testing hypotheses [1]
 predictive analysis: extends the principles of inferential analysis to analyse past data and
make predictions about future events. [1]
[Total 3]

© IFE: 2020 Examinations The Actuarial Education Company

You might also like