CS1 CMP Upgrade 2020
CS1 CMP Upgrade 2020
Subject CS1
CMP Upgrade 2019/20
CMP Upgrade
This CMP Upgrade lists the changes to the Syllabus objectives, Core Reading and the ActEd
material since last year that might realistically affect your chance of success in the exam. It is
produced so that you can manually amend your 2019 CMP to make it suitable for study for the
2020 exams. It includes replacement pages and additional pages where appropriate.
Alternatively, you can buy a full set of up-to-date Course Notes / CMP at a significantly reduced
price if you have previously bought the full-price Course Notes / CMP in this subject. Please
see our 2020 Student Brochure for more details.
additional changes to the 2019 ActEd Course Notes and Series X Assignments that will
make them suitable for study for the 2020 exams.
In order to make the process of updating your course as painless as possible, we have
wherever possible provided replacement pages with this upgrade note.
However, all the chapter numbers have changed this year. This means that the process of
replacing pages is slightly more complicated. We have tried to spell out in detail exactly which
old pages need to be removed, and what they should be replaced with. Please take care.
In addition, because the courses are printed double sided, we have provided replacements for
both sides of any page on which there is an update that cannot quickly be written in by hand.
A new section of material has been added to the CS1 Syllabus. This is Section 2.1 – Data
analysis. Syllabus items 2.1.1 to 2.1.4 are new (they have been transferred from Subject CM1).
This has knock-on effects for some other syllabus numbering; Section 2.1 has become Section
2.2 and Section 2.2 has become Section 2.3.
Replace old pages 19-24 from the 2019 Study Guide with new pages 20-25 from the 2020
Study Guide.
There is an extra section of Core Reading to accompany Syllabus item 2.1. This is incorporated
into the new Chapter 1 of the course notes (see below).
Chapter numbers and page numbers below refer to the ActEd notes, not to the original Core
Reading document.
Chapter 7, Page 39. Two typos have been corrected in the R-box. In the second line, “stores”
should be “store”. In the last line in the box, there should be an extra closing bracket right at
the end of the line, ie there should be four closing brackets in all.
Chapter 10, Page 25. The Core Reading in the second R box has been deleted.
Chapter 11, Page 16. The last equation on the page should say SSRES / 2 n2 2 .
Chapter 12, Page 18. In the box at the foot of the page, the Core Reading should say “In R, to
use a gamma distribution…”.
Chapter 12, Page 42. A new paragraph has been added just before the second R-box. Replace
old Chapter 12 pages 41 and 42 with new Chapter 13 pages 41 and 42.
Chapter 12, Page 44. The definition of the AIC has been amended. It now reads:
Chapter 13, Pages 17-18. The Core Reading has been altered. Replace old Chapter 13 pages
17-18 with new Chapter 14 pages 17-18.
Chapter 14, Page 19. A new paragraph of Core Reading has been added. Replace old Chapter
14 pages 19-22 with new Chapter 15 pages 19-23.
Chapter 15, Page 31. In the 4-line equation, there are two xij expressions. These should not
have bars on them – they should just say xij .
There is an additional chapter of the course notes (which incorporates the new Core Reading)
relating to new Syllabus item 2.1. This is Chapter 1 of the 2020 course notes. This means that
the chapter numbers of all the other chapters have changed – Chapter 1 in the 2019 course
has become Chapter 2 in the 2020 course, and so on.
A copy of the whole of new Chapter 1 is attached to this upgrade note. Insert this before old
Chapter 1.
In addition, the two chapters on regression have been combined into a single (long) chapter,
Chapter 12. This covers both the simple linear regression model and the multiple regression
model. So there is no longer a Chapter 11b.
Although Chapters 11 and 11b have been combined, the material remains very much the
same. So we have not attached any of this material.
As a result, cross references to other chapters throughout the course notes have been
updated.
We do not believe that students should need a new copy of the notes because of these
changes. However, students wanting to buy a new set of notes can obtain them from ActEd at
a significantly reduced price. See our current brochure for further details.
A number of corrections have been made to the course notes this year. These are listed here.
The chapter and page numbers refer to the OLD (2019) version of the course notes.
Since the courses are printed double-sided, we provide double sided replacement pages. So if
for example there is a replacement for page 5, it will contain page 6 on the back, even if there
are no changes to page 6.
Chapter 1
Page 6. An additional sentence has been added. Replace old Chapter 1 pages 5-6 with new
Chapter 2 pages 5-6.
Page 10. An additional sentence has been added. Replace old Chapter 1 pages 9-10 with new
Chapter 2 pages 9-10.
Page 14. An additional sentence has been added. Replace old Chapter 1 pages 13-14 with
new Chapter 2 pages 13-14.
Chapter 3
Page 3. A phrase of CR has been deleted. Replace old Chapter 3 pages 3-4 with new Chapter 4
pages 3-4.
Page 11. A new section of material has been added just before the start of old Section 1.4.
Please add Chapter 4, new pages 10a and 10b between old Chapter 3, pages 10 and 11. Delete
the solution at the top of old page 11. Do not remove any pages.
Page 45. An additional question has been added to the Practice Questions. Replace old
Chapter 3 pages 45 to 57 with new Chapter 4 pages 47-60.
Chapter 4
Page 5. A new question and solution have been added. Replace old Chapter 4 pages 5-6 with
new Chapter 5 pages 5-6.
Chapter 5
Pages 15-16. Several extra paragraphs of explanation have been added here. Replace old
Chapter 5 pages 15-16 with new Chapter 6 pages 15-16.
Chapter 6
Page 18. A new paragraph has been added. Replace old Chapter 6 pages 17-18 with new
Chapter 7 pages 16-17.
Page 20. A typo has been corrected. “just over 5%” should read “just less than 5%”.
Chapter 7
107
Page 19. Typo corrected. The penultimate equation should say: ˆ 3.057 .
35
Page 37. An extra sentence has been added. Replace old Chapter 7 pages 37-38 with new
Chapter 8 pages 37-38.
Page 58. An extra sentence has been added. Replace old Chapter 7 pages 57-58 with new
Chapter 8 pages 57-58.
Chapter 9
Page 10. Typo corrected in the last line. “Determine the probability …”.
Page 35. Typo corrected in the eighth line. “This is a one-sided test and our statistic is less
than 2.132 …”.
Page 51. Typo corrected. “ …the characteristic is dependent on the mother’s age.” The same
correction is made on page 52.
Page 54. An additional question has been added at the end of the chapter. Add new
Chapter 10 page 54 after old Chapter 9 page 54. Do not remove any pages.
Chapter 10
Chapter 11
Page 14. There are some missing symbols in the last two paragraphs. These paragraphs
should read as follows:
Note: With the full model in place the Yi ’s have normal distributions and it is
It is possible to show that the maximum likelihood estimators of and are the
same as the least squares estimators, but the MLE of 2 has a different denominator
from the least squares estimator).
Chapter 11b
Chapter 12
Page 46. Two additional lines of text have been added. Replace old Chapter 12 pages 45-48
with new Chapter 13 pages 45-48.
Question X1.3 has been replaced with a new question. Updated pages attached for both
question and solution. There are no other changes (apart from updating the references to the
chapters).
In addition to the CMP you might find the following services helpful with your study.
For further details on ActEd’s study materials, please refer to the 2020 Student Brochure,
which is available from the ActEd website at www.ActEd.co.uk.
5.2 Tutorials
For further details on ActEd’s tutorials, please refer to our latest Tuition Bulletin, which is
available from the ActEd website at www.ActEd.co.uk.
5.3 Marking
You can have your attempts at any of our assignments or mock exams marked by ActEd. When
marking your scripts, we aim to provide specific advice to improve your chances of success in
the exam and to return your scripts as quickly as possible.
For further details on ActEd’s marking services, please refer to the 2020 Student Brochure,
which is available from the ActEd website at www.ActEd.co.uk.
ActEd is always pleased to get feedback from students about any aspect of our study
programmes. Please let us know if you have any specific comments (eg about certain sections
of the notes or particular questions) or general suggestions about how we can improve the
study material. We will incorporate as many of your suggestions as we can when we update
the course material each year.
If you have any comments on this course please send them by email to [email protected].
Syllabus
The Syllabus for Subject CS1 is given here. To the right of each objective are the chapter numbers
in which the objective is covered in the ActEd course.
Aim
The aim of the Actuarial Statistics 1 subject is to provide a grounding in mathematical and
statistical techniques that are of particular relevance to actuarial work.
Competences
Syllabus topics
The weightings are indicative of the approximate balance of the assessment of this subject
between the main syllabus topics, averaged over a number of examination sessions.
The weightings also have a correspondence with the amount of learning material underlying each
syllabus topic. However, this will also reflect aspects such as:
the relative complexity of each topic, and hence the amount of explanation and support
required for it
the need to provide thorough foundation understanding on which to build the other
objectives
the extent of prior knowledge which is expected
the degree to which each topic area is more knowledge or application based.
Assumed knowledge
This subject assumes that a student will be competent in the following elements of foundational
mathematics and basic statistics:
1.1 Summarise a set of data using a table or frequency distribution, and display it
graphically using a line plot, a box plot, a bar chart, histogram, stem and leaf plot,
or other appropriate elementary device.
1.2 Describe the level/location of a set of data using the mean, median, mode, as
appropriate.
1.3 Describe the spread/variability of a set of data using the standard deviation,
range, interquartile range, as appropriate.
1.4 Explain what is meant by symmetry and skewness for the distribution of a set of
data.
2 Probability
2.1 Set functions and sample spaces for an experiment and an event.
2.2 Probability as a set function on a collection of events and its basic properties.
2.4 Derive and use the addition rule for the probability of the union of two events.
2.5 Define and calculate the conditional probability of one event given the occurrence
of another event.
2.7 Define independence for two events, and calculate probabilities in situations
involving independence.
3 Random variables
3.1 Explain what is meant by a discrete random variable, define the distribution
function and the probability function of such a variable, and use these functions
to calculate probabilities.
3.2 Explain what is meant by a continuous random variable, define the distribution
function and the probability density function of such a variable, and use these
functions to calculate probabilities.
3.3 Define the expected value of a function of a random variable, the mean, the
variance, the standard deviation, the coefficient of skewness and the moments of
a random variable, and calculate such quantities.
3.5 Derive the distribution of a function of a random variable from the distribution of
the random variable.
1.1.2 Define and explain the key characteristics of the continuous distributions: normal,
lognormal, exponential, gamma, chi-square, t , F , beta and uniform on an interval.
1.1.3 Evaluate probabilities and quantiles associated with distributions (by calculation
or using statistical software as appropriate).
1.1.4 Define and explain the key characteristics of the Poisson process and explain the
connection between the Poisson process and the Poisson distribution.
1.1.5 Generate basic discrete and continuous random variables using the inverse
transform method.
1.1.6 Generate discrete and continuous random variables using statistical software.
1.2.3 Specify the conditions under which random variables are independent.
1.2.4 Define the expected value of a function of two jointly distributed random
variables, the covariance and correlation coefficient between two variables, and
calculate such quantities.
1.2.5 Define the probability function/density function of the sum of two independent
random variables as the convolution of two functions.
1.2.6 Derive the mean and variance of linear combinations of random variables.
1.3.2 Show how the mean and variance of a random variable can be obtained from
expected values of conditional expected values, and apply this.
1.4.2 Define and determine the cumulant generating function of random variables.
1.4.3 Use generating functions to determine the moments and cumulants of random
variables, by expansion as a series or by differentiation, as appropriate.
1.4.4 Identify the applications for which a moment generating function, a cumulant
generating function and cumulants are used, and the reasons why they are used.
1.5.2 Generate simulated samples from a given distribution and compare the sampling
distribution with the Normal.
2.1.2 Describe the stages of conducting a data analysis to solve real-world problems in a
scientific manner and describe tools suitable for each stage.
2.1.3 Describe sources of data and explain the characteristics of different data sources,
including extremely large data sets.
2.1.4 Explain the meaning and value of reproducible research and describe the
elements required to ensure a data analysis is reproducible.
2.2.2 Use appropriate tools to calculate suitable summary statistics and undertake
exploratory data visualizations.
2.2.3 Define and calculate Pearson’s, Spearman’s and Kendall’s measures of correlation
for bivariate data, explain their interpretation and perform statistical inference as
appropriate.
2.3.4 Determine the mean and variance of a sample mean and the mean of a sample
variance in terms of the population mean, variance and sample size.
2.3.5 State and use the basic sampling distributions for the sample mean and the
sample variance for random samples from a normal distribution.
2.3.6 State and use the distribution of the t -statistic for random samples from a
normal distribution.
2.3.7 State and use the F distribution for the ratio of two sample variances from
independent samples taken from normal distributions.
3.1.2 Describe and apply the method of maximum likelihood for constructing
estimators of population parameters.
3.1.3 Define the terms: efficiency, bias, consistency and mean squared error.
3.1.5 Define the mean square error of an estimator, and use it to compare estimators.
3.1.6 Describe and apply the asymptotic distribution of maximum likelihood estimators.
3.2.2 Derive a confidence interval for an unknown parameter using a given sampling
distribution.
3.2.3 Calculate confidence intervals for the mean and the variance of a normal
distribution.
3.2.4 Calculate confidence intervals for a binomial probability and a Poisson mean,
including the use of the normal approximation in both cases.
3.2.5 Calculate confidence intervals for two-sample situations involving the normal
distribution, and the binomial and Poisson distributions using the normal
approximation.
3.2.6 Calculate confidence intervals for a difference between two means from paired
data.
3.3.2 Apply basic tests for the one-sample and two-sample situations involving the
normal, binomial and Poisson distributions, and apply basic tests for paired data.
3.3.4 Use a chi-square test to test the hypothesis that a random sample is from a
particular distribution, including cases where parameters are unknown.
3.3.5 Explain what is meant by a contingency (or two-way) table, and use a chi-square
test to test the independence of two classification criteria.
Key Information
The scaled deviance for a particular model M is defined as:
SDM 2 S M
DM
scaled deviance =
Remember that is a scale parameter, so it seems sensible that it should be used to connect the
deviance with the scaled deviance. For a Poisson or exponential distribution, 1 so the scaled
deviance and the deviance are identical.
The smaller the deviance, the better the model from the point of view of model fit.
However, there will be a trade-off here. A model with many parameters will fit the data well.
However a model with too many parameters will be difficult and complex to build, and will not
necessarily lead to better prediction in the future. It is possible for models to be ‘over-
parameterised’, ie factors are included that lead to a slightly, but not significantly, better fit.
When choosing linear models, we will usually need to strike a balance between a model with too
few parameters (which will not take account of factors that have a substantial impact on the data,
and will therefore not be sensitive enough) and one with too many parameters (which will be too
sensitive to factors that really do not have much effect on the results). We use the principle of
parsimony here – that is we choose the simplest model that does the job.
This can be illustrated by considering the case when the data are normally distributed.
n
( y ; , ) log fY ( y i ; i , )
i 1
n
n ( y i i )2
2
log2 2 2 2
i 1
The likelihood function for a random sample of size n is f (y1 ) f (y2 )... f (y n ) . Note that when we
take logs, we add the logs of the individual PDF terms to get the joint likelihood. Recall that for
the normal distribution the natural parameter is just the mean, i i .
For the saturated model, the parameter i is estimated by y i , and so the second term
disappears. Thus, the scaled deviance (twice the difference between the values of the log-
likelihood under the current and saturated models) is
n ( y i ˆi )2
i 1 2
The deviance (remembering that the scale parameter 2 ), is the well-known residual
sum of squares:
n
( y i ˆi )2
i 1
This is why the deviance is defined with a factor of two in it, so that for the normal model the
deviance is equal to the residual sum of squares that we met in linear regression.
The residual deviance (ie the deviance after all the covariates have been included) is
displayed as part of the results from summary(model). For example:
In R we can obtain a breakdown of how the deviance is reduced by each covariate added
sequentially by using anova(model). However, unlike for linear regression, this command
does not automatically carry out a test.
And recall that the smaller the residual (left over) deviance the better the fit of the model.
For normally distributed data, the scaled deviance has a 2 distribution. Since the scale
parameter for the normal 2 must be estimated we would compare models by taking
ratios of sum-of-squares and using F-tests (as in the analysis of variance for linear
regression models).
We covered this in Section 4.3 from the previous chapter.
Thus, if we want to decide if Model 2 (which has p q parameters and scaled deviance S2)
is a significant improvement over Model 1 (which has p parameters and scaled deviance
(S1 S2 ) q
S1), we see if is greater than the 5% value for the Fq ,n p q distribution.
S2 (n ( p q ))
The code for comparing two normally distributed models, model1 and model2, in R is:
anova(model1, model2, test=”F”)
In the case of data that are not normally distributed, the scale parameter may be known (for
example, for the Poisson distribution 1 ), and the deviance is only asymptotically a 2
distribution. For these reasons, the common procedure is to compare two models by
looking at the difference in the scaled deviance and comparing with a 2 distribution.
Since the distributions are only asymptotically normal the F test will not be very accurate. Hence,
by simply comparing two approximate 2 distributions we will get a better result.
Solution
The proportionality argument will be used and any constants simply omitted as appropriate.
Prior:
f ( ) 1(1 ) 1
( )
omitting the constant .
( ) ( )
Likelihood:
f ( x | ) x (1 )n x
n
omitting the constant .
x
Combining the prior PDF with the likelihood function gives the posterior PDF:
f ( | x ) x (1 )n x . 1(1 ) 1 x 1(1 )n x 1
Now it can be seen that, apart from the appropriate constant of proportionality, this is the
density of a beta random variable. Therefore the immediate conclusion is that the posterior
distribution of given X x is beta with parameters x and n x .
It can also be seen that the posterior density and the prior density belong to the same family
of distributions. Thus the conjugate prior for the binomial distribution is the beta
distribution.
The Bayesian estimate under quadratic loss is the mean of this distribution, that is:
x x
( x ) (n x ) n
We can use R to simulate this Bayesian estimate.
The R code to obtain the Monte Carlo Bayesian estimate of the above is:
pm <- rep(0,M)
for (i in 1:M)
x <- rbinom(1,n,theta)
The Actuarial Education Company IFE: 2020 Examinations
Page 18 CS1‐14: Bayesian statistics
The average of these Bayesian estimates under quadratic loss is given by:
mean(pm)
Question
A random sample of size 10 from a Poisson distribution with mean yields the following data
values:
3, 4, 3, 1, 5, 5, 2, 3, 3, 2
Calculate the Bayesian estimate of under squared error loss.
Solution
Using the formula for the PDF of the gamma distribution given on page 12 of the Tables, we see
that the prior PDF of is:
2 5 4 2
fprior ( ) e , 0
(5)
Alternatively, we could say:
The likelihood function obtained from the data is:
e 3 e 4 e 2
L() Ce10 31
3! 4! 2!
where C is a constant. (31 is the sum of the observed data values.)
Combining the prior distribution and the sample data, we see that:
IFE: 2020 Examinations The Actuarial Education Company
CS1-15: Credibility theory Page 19
nx
2 2
2 1
N 1 ,
n 1 n 1
2 2 2
2
1 2 1 2
where:
n
x xi / n
i 1
The Bayesian estimate of under quadratic loss is the mean of this posterior distribution:
nx
12 22
E ( | x )
n 1
12 22
n 1
12 22
x
n 1 n 1
12 22 12 22
or:
E ( | x ) Z x (1 Z ) (14.3.4)
where:
n
Z (14.3.5)
n ( 12 / 22 )
Notice that, as for the Poisson/gamma model, the estimate based solely on data from the
risk itself is a linear function of the observed data values.
There are some further points to be made about the credibility factor, Z , given by (14.3.5):
These features are all exactly what would be expected for a credibility factor.
Notice also that, as 12 increases, the denominator increases, and so Z decreases. 12 denotes
the variance of the distribution of the sample values. If this is large, then the sample values are
likely to be spread over a wide range, and they will therefore be less reliable for estimation.
The R code to obtain the Monte Carlo credibility premiums for the above based on M
simulations is:
Z <- n/(n+sigma1^2/sigma2^2)
pm <- rep(0,M)
for (i in 1:M)
x <- rnorm(1,theta,sigma1)
mean(cp)
The reason for doing this is that some of the observations will be helpful when empirical
Bayes credibility theory is considered in the next chapter.
In this section, as in Section 3.4, the problem is to estimate the expected aggregate claims
produced each year by a risk. Let:
X 1, X 2 , , X n , X n 1,
be random variables representing the aggregate claims in successive years. The following
assumptions are made.
The distribution of each X j depends on the value of a fixed, but unknown, parameter, .
Again, is a random variable whose value, once determined, does not change over time.
The values of X 1, X 2 , , X n have already been observed and the expected aggregate claims
in the coming, ie (n 1) th, year need to be estimated.
It is important to realise that the assumptions and problem outlined above are exactly the
same as the assumptions and problem outlined in Section 3.4. Slightly different notation
has been used in this section; in Section 3.4, X 1, , X n were denoted x1, , x n since their
values were assumed to be known, and X n 1 was denoted just X . The assumptions that
the distribution of each X j depends on , that the conditional distribution of X j given
is N( , 12 ) , and that the prior distribution of is N(, 22 ) were all made in Section 3.4. The
only assumption not made explicitly in Section 3.4 is that, given , the random variables
X j are independent.
Having stressed that everything is the same as in Section 3.4, some consequences of the
above assumptions will be considered. Some important consequences are:
1 ( )2 y
P(X j y ) 2 exp d
2 2 1
2 2
P( X j y) P( X j y | ) f ( ) d
1 1 2
f ( ) exp
2 2 2 2
y y
P( X j y | ) P N(0,1)
1 1
This expression is the same for each value of j and hence the random variables { X j } are
(unconditionally) identically distributed.
Using Equation (14.1.1) and the fact that, given , X 1 and X 2 are conditionally
independent:
E ( X 1 X 2 ) E E ( X 1 X 2 | )
E E ( X 1 | ) E ( X 2 | )
E ( 2 ) ( since E ( X 1 | ) E ( X 2 | ) )
2 22
The idea used in this argument will be used repeatedly in the next chapter.
E ( X 1X 2 ) E ( X 1) E ( X 2 )
E ( X 1) E [ E ( X 1 | )] E ( )
Similarly, E ( X 2 ) . Hence:
E ( X 1X 2 ) 2 22 E ( X 1)E ( X 2 )
This shows that X 1 and X 2 are not unconditionally independent. The relationship between
X 1 and X 2 is that their means are chosen from a common distribution. If this mean, is
known, then this relationship is broken and there exists conditional independence.
The first difficulty is whether a Bayesian approach to the problem is acceptable, and, if so,
what values to assign to the parameters of the prior distribution. For example, although the
Poisson/gamma model provides a formula (Equation 14.3.3) for the calculation of the
credibility factor, this formula involves the parameter . How a value for might be
chosen has not been discussed. The Bayesian approach to the choice of parameter values
for a prior distribution is to argue that they summarise the subjective degree of belief about
the possible values of the quantity to be estimated, for example, the mean claim number, ,
for the Poisson/gamma model.
The second difficulty is that even if the problem fits into a Bayesian framework, the
Bayesian approach may not work in the sense that it may not produce an estimate which
can readily be rearranged to be in the form of a credibility estimate. This point can be
illustrated by using a uniform prior with a Poisson distribution for the number of claims.
n
fpost ( ) P( Xi xi )
i 1
n
e xi
i 1 xi !
e n xi
for 0 . The posterior PDF is 0 for other values of (since the prior PDF is 0 unless
0 ). The posterior distribution of is not one of the standard distributions listed in the
Tables (in fact it is a truncated gamma distribution) so we can’t look up the formula for its mean
or easily deduce whether it can be expressed in the form of a credibility estimate.
Data analysis
Syllabus objectives
2.1 Data analysis
2.1.1 Describe the possible aims of data analysis (eg descriptive, inferential and
predictive).
2.1.2 Describe the stages of conducting a data analysis to solve real‐world
problems in a scientific manner and describe tools suitable for each stage.
2.1.3 Describe sources of data and explain the characteristics of different data
sources, including extremely large data sets.
2.1.4 Explain the meaning and value of reproducible research and describe the
elements required to ensure a data analysis is reproducible.
The Actuarial Education Company © IFE: 2020 Examinations
Page 2 CS1‐01: Data analysis
0 Introduction
This chapter provides an introduction to the underlying principles of data analysis, in particular
within an actuarial context.
Data analysis is the process by which data is gathered in its raw state and analysed or
processed into information which can be used for specific purposes. This chapter will
describe some of the different forms of data analysis, the steps involved in the process and
consider some of the practical problems encountered in data analytics.
Although this chapter looks at the general principles involved in data analysis, it does not deal
with the statistical techniques required to perform a data analysis. These are covered elsewhere,
in CS1 and CS2.
© IFE: 2020 Examinations The Actuarial Education Company
CS1‐01: Data analysis Page 3
1 Aims of a data analysis
Three keys forms of data analysis will be covered in this section:
descriptive;
inferential; and
predictive.
1.1 Descriptive analysis
Data presented in its raw state can be difficult to manage and draw meaningful conclusions
from, particularly where there is a large volume of data to work with. A descriptive analysis
solves this problem by presenting the data in a simpler format, more easily understood and
interpreted by the user.
Simply put, this might involve summarising the data or presenting it in a format which
highlights any patterns or trends. A descriptive analysis is not intended to enable the user
to draw any specific conclusions. Rather, it describes the data actually presented.
For example, it is likely to be easier to understand the trend and variation in the sterling/euro
exchange rate over the past year by looking at a graph of the daily exchange rate rather than a list
of values. The graph is likely to make the information easier to absorb.
Two key measures, or parameters, used in a descriptive analysis are the measure of central
tendency and the dispersion. The most common measurements of central tendency are the
mean, the median and the mode. Typical measurements of the dispersion are the standard
deviation and ranges such as the interquartile range.
Measures of central tendency tell us about the ‘average’ value of a data set, whereas measures of
dispersion tell us about the ‘spread’ of the values. We will use many of these measures later in
the course.
It can also be important to describe other aspects of the shape of the (empirical) distribution
of the data, for example by calculating measures of skewness and kurtosis.
Empirical means ‘based on observation’. So an empirical distribution relates to the distribution of
the actual data points collected, rather than any assumed underlying theoretical distribution.
Skewness is a measure of how symmetrical a data set is, and kurtosis is a measure of how likely
extreme values are to appear (ie those in the tails of the distribution). We shall touch on these
later.
1.2 Inferential analysis
Often it is not feasible or practical to collect data in respect of the whole population,
particularly when that population is very large. For example, when conducting an opinion
poll in a large country, it may not be cost effective to survey every citizen. A practical
solution to this problem might be to gather data in respect of a sample, which is used to
represent the wider population. The analysis of the data from this sample is called
inferential analysis.
The Actuarial Education Company © IFE: 2020 Examinations
Page 4 CS1‐01: Data analysis
The sample analysis involves estimating the parameters as described in Section 1.1 above
and testing hypotheses. It is generally accepted that if the sample is large and taken at
random (selected without prejudice), then it quite accurately represents the statistics of the
population, such as distribution, probability, mean, standard deviation, However, this is
also contingent upon the user making reasonably correct hypotheses about the population
in order to perform the inferential analysis.
Care may need to be taken to ensure that the sample selected is likely to be representative of the
whole population. For example, an opinion poll on a national issue conducted in urban locations
on weekday afternoons between 2pm and 4pm may not accurately reflect the views of the whole
population. This is because those living in rural areas and those who regularly work during that
period are unlikely to have been surveyed, and these people might tend to have a different
viewpoint to those who have been surveyed.
Sampling, inferential analysis and parameter estimation are covered in more detail later.
1.3 Predictive analysis
Predictive analysis extends the principles behind inferential analysis in order for the user to
analyse past data and make predictions about future events.
It achieves this by using an existing set of data with known attributes (also known as
features), known as the training set in order to discover potentially predictive relationships.
Those relationships are tested using a different set of data, known as the test set, to assess
the strength of those relationships.
In this example, the car’s speed is the explanatory (or independent) variable and the braking
distance is the dependent variable.
Question
Based on data gathered at a particular weather station on the monthly rainfall in mm ( r ) and the
average number of hours of sunshine per day ( s ), a researcher has determined the following
explanatory relationship:
s 9 0.1r
Using this model:
(i) Estimate the average number of hours of sunshine per day, if the monthly rainfall is 50mm.
(ii) State the impact on the average number of hours of sunshine per day of each extra
millimetre of rainfall in a month.
© IFE: 2020 Examinations The Actuarial Education Company
CS1‐01: Data analysis Page 5
Solution
(i) When r 50 :
s 9 0.1 50 4
ie there are 4 hours of sunshine per day on average.
(ii) For each extra millimetre of rainfall in a month, the average number of hours of sunshine
per day falls by 0.1 hours, or 6 minutes.
The Actuarial Education Company © IFE: 2020 Examinations
Page 6 CS1‐01: Data analysis
2 The data analysis process
While the process to analyse data does not follow a set pattern of steps, it is helpful to
consider the key stages which might be used by actuaries when collecting and analysing
data.
1. Develop a well-defined set of objectives which need to be met by the results of the
data analysis.
The objective may be to summarise the claims from a sickness insurance product by age,
gender and cause of claim, or to predict the outcome of the next national parliamentary
election.
The relevant data may be available internally (eg from an insurance company’s
administration department) or may need to be gathered from external sources (eg from a
local council office or government statistical service).
It will be important when communicating the results to make it clear what data was used,
what analyses were performed, what assumptions were made, the conclusion of the
analysis, and any limitations of the analysis.
9. Monitoring the process; updating the data and repeating the process if required.
A data analysis is not necessarily just a one‐off exercise. An insurance company analysing
the claims from its sickness policies may wish to do this every few years to allow for the
new data gathered and to look for trends. An opinion poll company attempting to predict
an election result is likely to repeat the poll a number of times in the weeks before the
election to monitor any changes in views during the campaign period.
© IFE: 2020 Examinations The Actuarial Education Company
CS1‐01: Data analysis Page 7
Throughout the process, the modelling team needs to ensure that any relevant professional
guidance has been complied with. For example, the Financial Reporting Council has issued
a Technical Actuarial Standard (TAS) on the principles for Technical Actuarial Work
(TAS100) which includes principles for the use of data in technical actuarial work.
Knowledge of the detail of this TAS is not required for CS1.
Further, the modelling team should also remain aware of any legal requirement to be
complied with. Such legal requirement may include aspects around consumer/customer
data protection and gender discrimination.
The Actuarial Education Company © IFE: 2020 Examinations
Page 8 CS1‐01: Data analysis
3 Data sources
Step 3 of the process described in Section 2 above refers to collection of the data needed to
meet the objectives of the analysis from appropriate sources. As consideration of Steps 3,
4, and 5 makes clear, getting data into a form ready for analysis is a process, not a single
event. Consequently, what is seen as the source of data can depend on your viewpoint.
Suppose you are conducting an analysis which involves collecting survey data from a
sample of people in the hope of drawing inferences about a wider population. If you are in
charge of the whole process, including collecting the primary data from your selected
sample, you would probably view the ‘source’ of the data as being the people in your
sample. Having collected, cleaned and possibly summarised the data you might make it
available to other investigators in JavaScript object notation (JSON) format via a web
Application programming interface (API). You will then have created a secondary ‘source’
for others to use.
In this section we discuss how the characteristics of the data are determined both by the
primary source and the steps carried out to prepare it for analysis – which may include the
steps on the journey from primary to secondary source. Details of particular data formats
(such as JSON), or of the mechanisms for getting data from an external source into a local
data structure suitable for analysis, are not covered in CS1.
These factors can affect the accuracy and reliability of the data collected. For example:
in a survey, an individual’s salary may be specified as falling into given bands, eg £20,000 ‐
£29,999, £30,000 ‐ £39,999 etc, rather than the precise value being recorded
if responses were collected on handwritten forms, and then manually input into a
database, there is greater scope for errors to appear.
Where randomisation has been used to reduce the effect of bias or confounding variables it
is important to know the sampling scheme used:
stratified sampling; or
© IFE: 2020 Examinations The Actuarial Education Company
CS1‐01: Data analysis Page 9
Question
A researcher wishes to survey 10% of a company’s workforce.
Describe how the sample could be selected using:
(a) simple random sampling
(b) stratified sampling.
Solution
(a) Simple random sampling
Using simple random sampling, each employee would have an equal chance of being selected.
This could be achieved by taking a list of the employees, allocating each a number, and then
selecting 10% of the numbers at random (either manually, or using a computer‐generated
process).
(b) Stratified sampling
Using stratified sampling, the workforce would first be split into groups (or strata) defined by
specific criteria, eg level of seniority. Then 10% of each group would be selected using simple
random sampling. In this way, the resulting sample would reflect the structure of the company by
seniority.
This aims to overcome one of the issues with simple random sampling, ie that the sample
obtained does not fully reflect the characteristics of the population. With a simple random
sample, it would be possible for all those selected to be at the same level of seniority, and so be
unrepresentative of the workforce as a whole.
Data may have undergone some form of pre-processing. A common example is grouping
(eg by geographical area or age band). In the past, this was often done to reduce the
amount of storage required and to make the number of calculations manageable. The scale
of computing power available now means that this is less often an issue, but data may still
be grouped: perhaps to anonymise it, or to remove the possibility of extracting sensitive (or
perhaps commercially sensitive) details.
Other aspects of the data which are determined by the collection process, and which affect
the way it is analysed include the following:
Cross-sectional data involves recording values of the variables of interest for each
case in the sample at a single moment in time.
For example, recording the amount spent in a supermarket by each member of a loyalty
card scheme this week.
The Actuarial Education Company © IFE: 2020 Examinations
Page 10 CS1‐01: Data analysis
Censored data occurs when the value of a variable is only partially known, for
example, if a subject in a survival study withdraws, or survives beyond the end of
the study: here a lower bound for the survival period is known but the exact value
isn’t.
Censoring is dealt with in detail in CS2.
Truncated data occurs when measurements on some variables are not recorded so
are completely unknown.
For example, if we were collecting data on the periods of time for which a user’s internet
connection was disrupted, but only recorded the duration of periods of disruption that
lasted 5 minutes or longer, we would have a truncated data set.
3.1 Big data
The term big data is not well defined but has come to be used to describe data with
characteristics that make it impossible to apply traditional methods of analysis (for
example, those which rely on a single, well-structured data set which can be manipulated
and analysed on a single computer). Typically, this means automatically collected data with
characteristics that have to be inferred from the data itself rather than known in advance
from the design of an experiment.
Given the description above, the properties that can lead data to be classified as ‘big’
include:
size, not only does big data include a very large number of individual cases, but
each might include very many variables, a high proportion of which might have
empty (or null) values – leading to sparse data;
speed, the data to be analysed might be arriving in real time at a very fast rate – for
example, from an array of sensors taking measurements thousands of time every
second;
variety, big data is often composed of elements from many different sources which
could have very different structures – or is often largely unstructured;
reliability, given the above three characteristics we can see that the reliability of
individual data elements might be difficult to ascertain and could vary over time (for
example, an internet connected sensor could go offline for a period).
Examples of ‘big data’ are:
the information held by large online retailers on items viewed, purchased and
recommended by each of its customers
measurements of atmospheric pressure from sensors monitored by a national
meteorological organisation
the data held by an insurance company received from the personal activity trackers (that
monitor daily exercise, food intake and sleep, for example) of its policyholders.
© IFE: 2020 Examinations The Actuarial Education Company
CS1‐01: Data analysis Page 11
Although the four points above (size, speed, variety, reliability) have been presented in the
context of big data, they are characteristics that should be considered for any data source.
For example, an actuary may need to decide if it is advisable to increase the volume of data
available for a given investigation by combining an internal data set with data available
externally. In this case, the extra processing complexity required to handle a variety of
data, plus any issues of reliability of the external data, will need to be considered.
3.2 Data security, privacy and regulation
In the design of any investigation, consideration of issues related to data security, privacy
and complying with relevant regulations should be paramount. It is especially important to
be aware that combining different data from different ‘anonymised’ sources can mean that
individual cases become identifiable.
Another point to be aware of is that just because data has been made available on the
internet, doesn’t mean that that others are free to use it as they wish. This is a very
complex area and laws vary between jurisdictions.
The Actuarial Education Company © IFE: 2020 Examinations
Page 12 CS1‐01: Data analysis
4 Reproducible research
An example reference for this section is in Peng (2016). For the full reference, see the end
of this section.
4.1 The meaning of reproducible research
Reproducibility refers to the idea that when the results of a statistical analysis are reported,
sufficient information is provided so that an independent third party can repeat the analysis
and arrive at the same results.
the study relies on data collected at great expense or over many years; or
So, rather than the results of the analysis being validated by an independent third party
completely replicating the study from scratch (including gathering a new data set), the validation
is achieved by an independent third party reproducing the same results based on the same data
set.
4.2 Elements required for reproducibility
Typically, reproducibility requires the original data and the computer code to be made
available (or fully specified) so that other people can repeat the analysis and verify the
results. In all but the most trivial cases, it will be necessary to include full documentation
(eg description of each data variable, an audit trail describing the decisions made when
cleaning and processing the data, and full documented code). Documentation of models is
covered in Subject CP2.
Full documented code can be achieved through literate statistical programming (as defined
by Knuth, 1992) where the program includes an explanation of the program in plain
language, interspersed with code snippets. Within the R environment, a tool which allows
this is R-markdown.
R‐markdown enables documents to be produced that include the code used, an explanation of
that code, and, if desired, the output from that code.
As a simpler example, it may be possible to document the work carried out in a spreadsheet by
adding comments or annotations to explain the operations performed in particular cells, rows or
columns.
© IFE: 2020 Examinations The Actuarial Education Company
CS1‐01: Data analysis Page 13
Although not strictly required to meet the definition of reproducibility, a good version
control process can ensure evolving drafts of code, documentation and reports are kept in
alignment between the various stages of development and review, and changes are
reversible if necessary. There are many tools that are used for version control. A popular
tool used for version control is git.
A detailed knowledge of the version control tool ‘git’ is not required in CS1.
sessionInfo()
provides information about the operating system, version of R and version of all R
packages being used.
Question
Give a reason why documenting the version number of the software used can be important for
reproducibility of a data analysis.
Solution
Some functions might be available in one version of a package that are not available in another
(older) version. This could prevent someone being able to reproduce the analysis.
Where there is randomness in the statistical or machine learning techniques being used (for
example random forests or neural networks) or where simulation is used, replication will
require the random seed to be set.
Machine learning is covered in Subject CS2.
Simulation will be dealt with in more detail later in the course. At this point, it is sufficient to
know that each simulation that is run will be based on a series of pseudo‐random numbers. So,
for example, one simulation will be based on one particular series of pseudo‐random numbers,
but unless explicitly coded otherwise, a different simulation will be based on a different series of
pseudo‐random numbers. The second simulation will then produce different results, rather than
replicating the original results, which is the desired outcome here.
To ensure the two simulations give the same results, they would both need to be based on the
same series of pseudo‐random numbers. This is known as ‘setting the random seed’. We will do
this regularly when using R to carry out a simulation.
The Actuarial Education Company © IFE: 2020 Examinations
Page 14 CS1‐01: Data analysis
Doing things ‘by hand’ is very likely to create problems in reproducing the work. Examples
of doing things by hand are:
manually editing spreadsheets (rather than reading the raw data into a programming
environment and making the changes there);
editing tables and figures (rather than ensuring that the programming environment
creates them exactly as needed);
pointing and clicking (unless the software used creates an audit trail of what has
been clicked).
‘Pointing and clicking’ relates to choosing a particular operation from an on‐screen menu, for
example. This action would not ordinarily be recorded electronically.
The main thing to note here is that the more of the analysis that is performed in an automated
way, the easier it will be to reproduce by another individual. Manual interventions may be
forgotten altogether, and even if they are remembered, can be difficult to document clearly.
4.3 The value of reproducibility
Many actuarial analyses are undertaken for commercial, not scientific, reasons and are not
published, but reproducibility is still valuable:
Reproducibility does not mean that the analysis is correct. For example, if an
incorrect distribution is assumed, the results may be wrong – even though they can
be reproduced by making the same incorrect assumption about the distribution.
However, by making clear how the results are achieved, it does allow transparency
so that incorrect analysis can be appropriately challenged.
© IFE: 2020 Examinations The Actuarial Education Company
CS1‐01: Data analysis Page 15
4.4 References
Further information on the material in this section is given in the references:
The Actuarial Education Company © IFE: 2020 Examinations
Page 16 CS1‐01: Data analysis
The chapter summary starts on the next page so that you can
keep all the chapter summaries together for revision purposes.
© IFE: 2020 Examinations The Actuarial Education Company
CS1‐01: Data analysis Page 17
Chapter 1 Summary
The three key forms of data analysis are:
descriptive analysis: producing summary statistics (eg measures of central tendency
and dispersion) and presenting the data in a simpler format
inferential analysis: using a data sample to estimate summary parameters for the
wider population from which the sample was taken, and testing hypotheses
predictive analysis: extends the principles of inferential analysis to analyse past data
and make predictions about future events.
The key steps in the data analysis process are:
1. Develop a well‐defined set of objectives which need to be met by the results of the
data analysis.
2. Identify the data items required for the analysis.
3. Collection of the data from appropriate sources.
4. Processing and formatting data for analysis, eg inputting into a spreadsheet,
database or other model.
5. Cleaning data, eg addressing unusual, missing or inconsistent values.
6. Exploratory data analysis, which may include descriptive analysis, inferential analysis
or predictive analysis.
7. Modelling the data.
8. Communicating the results.
9. Monitoring the process; updating the data and repeating the process if required.
In the data collection process, the primary source of the data is the population (or
population sample) from which the ‘raw’ data is obtained. If, once the information is
collected, cleaned and possibly summarised, it is made available for others to use via a web
interface, this is then a secondary source of data.
Other aspects of the data determined by the collection process that may affect the analysis
are:
Cross‐sectional data involves recording values of the variables of interest for each
case in the sample at a single moment in time.
Longitudinal data involves recording values at intervals over time.
Censored data occurs when the value of a variable is only partially known.
Truncated data occurs when measurements on some variables are not recorded so
are completely unknown.
The Actuarial Education Company © IFE: 2020 Examinations
Page 18 CS1‐01: Data analysis
The term ‘big data’ can be used to describe data with characteristics that make it impossible
to apply traditional methods of analysis. Typically, this means automatically collected data
with characteristics that have to be inferred from the data itself rather than known in
advance from the design of the experiment.
Properties that can lead to data being classified as ‘big’ include:
size of the data set
speed of arrival of the data
variety of different sources from which the data is drawn
reliability of the data elements might be difficult to ascertain.
Replication refers to an independent third party repeating an experiment and obtaining the
same (or at least consistent) results. Replication of a data analysis can be difficult, expensive
or impossible, so reproducibility is often used as a reasonably alternative standard.
Reproducibility refers to reporting the results of a statistical analysis in sufficient detail that
an independent third party can repeat the analysis on the same data set and arrive at the
same results.
Elements required for reproducibility:
the original data and fully documented computer code need to be made available
good version control
documentation of the software used, computing architecture, operating system,
external dependencies and version numbers
where randomness is involved in the process, replication will require the random
seed to be set
limiting the amount of work done ‘by hand’.
© IFE: 2020 Examinations The Actuarial Education Company
CS1‐01: Data analysis Page 19
Chapter 1 Practice Questions
1.1 The data analysis department of a mobile phone messaging app provider has gathered data on
the number of messages sent by each user of the app on each day over the past 5 years. The
geographical location of each user (by country) is also known.
(i) Describe each of the following terms as it relates to a data set, and give an example of
each as it relates to the app provider’s data:
(a) cross‐sectional
(b) longitudinal.
(ii) Give an example of each of the following types of data analysis that could be carried out
using the app provider’s data:
(a) descriptive
(b) inferential
(c) predictive.
1.2 Explain the regulatory and legal requirements that should be observed when conducting a data
analysis exercise.
1.3 A car insurer wishes to investigate whether young drivers (aged 17‐25) are more likely to have an
Exam style
accident in a given year than older drivers.
Describe the steps that would be followed in the analysis of data for this investigation. [7]
The Actuarial Education Company © IFE: 2020 Examinations
Page 20 CS1‐01: Data analysis
The solutions start on the next page so that you can
separate the questions and solutions.
© IFE: 2020 Examinations The Actuarial Education Company
CS1‐01: Data analysis Page 21
Chapter 1 Solutions
1.1 (i)(a) Cross‐sectional
Cross‐sectional data involves recording the values of the variables of interest for each case in the
sample at a single moment in time.
In this data set, this relates to the number of messages sent by each user on any particular day.
(i)(b) Longitudinal
Longitudinal data involves recording the values of the variables of interest at intervals over time.
In this data set, this relates to the number of messages sent by a particular user on each day over
the 5‐year period.
(ii)(a) Descriptive analysis
Examples of descriptive analysis that could be carried out on this data set include:
calculating the mean and standard deviation of the number of messages sent each day by
users in each country
plotting a graph of the total messages sent each day worldwide, to illustrate the overall
trend in the number of messages sent over the 5 years
calculating what proportion of the total messages sent in each year originate in each
country.
(ii)(b) Inferential analysis
Examples of inferential analysis that could be carried out on this data set include:
testing the hypothesis that more messages are sent at weekends than on weekdays
assessing whether there is a significant difference in the rate of growth of the number of
messages sent each day by users in different countries over the 5‐year period.
(ii)(c) Predictive analysis
Examples of predictive analysis that could be carried out on this data set include:
forecasting which countries will be the major users of the app in 5 years’ time, and will
therefore need the most technical support staff
predicting the number of messages sent on the app’s busiest day (eg New Year’s Eve) next
year, to ensure that the provider continues to have sufficient capacity.
The Actuarial Education Company © IFE: 2020 Examinations
Page 22 CS1‐01: Data analysis
1.2 Throughout the data analysis process, it is important to ensure that any relevant professional
guidance has been complied with. For example, the UK’s Financial Reporting Council has issued a
Technical Actuarial Standard (TAS) on the principles for Technical Actuarial Work (TAS100). This
describes the principles that should be adhered to when using data in technical actuarial work.
The data analysis team must also be aware of any legal requirements to be complied with relating
to, for example:
protection of an individual’s personal data and privacy
discrimination on the grounds of gender, age, or other reasons.
With regard to privacy regulations, it is important to note that combining data from different
sources may mean that individuals can be identified, even if they are anonymous in the original
data sources.
Finally, data that have been made available on the internet cannot necessarily be used for any
purpose. Any legal restrictions should be checked before using the data, noting that laws can vary
between jurisdictions.
1.3 The key steps in the data analysis process in this scenario are:
1. Develop a well‐defined set of objectives that need to be met by the results of the data
analysis. [½]
Here, the objective is to determine whether young drivers are more likely to have an
accident in a given year than older drivers. [½]
2. Identify the data items required for the analysis. [½]
The data items needed would include the number of drivers of each age during the
investigation period and the number of accidents they had. [½]
3. Collection of the data from appropriate sources. [½]
The insurer will have its own internal data from its administration department on the
number of policyholders of each age during the investigation period and which of them
had accidents. [½]
The insurer may also be able to source data externally, eg from an industry body that
collates information from a number of insurers. [½]
4. Processing and formatting the data for analysis, eg inputting into a spreadsheet, database
or other model. [½]
The data will need to be extracted from the administration system and loaded into
whichever statistical package is being used for the analysis. [½]
If different data sets are being combined, they will need to be put into a consistent format
and any duplicates (ie the same record appearing in different data sets) will need to be
removed. [½]
© IFE: 2020 Examinations The Actuarial Education Company
CS1‐01: Data analysis Page 23
5. Cleaning data, eg addressing unusual, missing or inconsistent values. [½]
For example, the age of the driver might be missing, or be too low or high to be plausible.
These cases will need investigation. [½]
6. Exploratory data analysis, which here takes the form of inferential analysis… [½]
… as we are testing the hypothesis that younger drivers are more likely to have an
accident than older drivers. [½]
7. Modelling the data. [½]
This may involve fitting a distribution to the annual number of accidents arising from the
policyholders in each age group. [½]
8. Communicating the results. [½]
This will involve describing the data sources used, the model and analyses performed, and
the conclusion of the analysis (ie whether young drivers are indeed more likely to have an
accident than older drivers), along with any limitations of the analysis. [½]
9. Monitoring the process – updating the data and repeating the process if required. [½]
The car insurer may wish to repeat the process again in a few years’ time, using the data
gathered over that period, to ensure that the conclusions of the original analysis remain
valid. [½]
10. Ensuring that any relevant professional guidance and legislation (eg on age discrimination)
has been complied with. [½]
[Maximum 7]
Replication refers to an independent third party repeating an analysis from scratch (including
gathering an independent data sample) and obtaining the same (or at least consistent) results. [1]
Reproducibility refers to reporting the results of a statistical analysis in sufficient detail that an
independent third party can repeat the analysis on the same data set and arrive at the same
results. [1]
[Total 2]
(ii) Three reasons why replication is difficult
Replication of a data analysis can be difficult if:
the study is big; [1]
the study relies on data collected at great expense or over many years; or [1]
the study is of a unique occurrence (eg the standards of healthcare in the aftermath of a
particular event). [1]
[Total 3]
The Actuarial Education Company © IFE: 2020 Examinations
All study material produced by ActEd is copyright and is sold
for the exclusive use of the purchaser. The copyright is
owned by Institute and Faculty Education Limited, a
subsidiary of the Institute and Faculty of Actuaries.
Another example of a Bernoulli random variable occurs when a fair die is thrown once. If X is the
number of sixes obtained, p 1 6 , 1 p 5 6 and P( X 0) 5 6 and P( X 1) 1 6 .
1.3 Binomial distribution
Consider a sequence of n Bernoulli trials as above such that:
(i) the trials are independent of one another, ie the outcome of any trial does not
depend on the outcomes of any other trials
and:
A quick way of saying independent and identically distributed is IID. We will need this idea later.
The independence allows the probability of a joint outcome involving two or more trials to
be expressed as the product of the probabilities of the outcomes associated with each
separate trial concerned.
n
Distribution: P ( X x ) p x (1 p )n x , x 0, 1, 2, , n ; 0 p 1
x
The coefficients here are the same as in the binomial expansion that can be obtained using the
n n!
numbers from Pascal’s triangle, ie nC x . We can work out these quantities using
x (n x)! x !
the nCr function on a calculator.
Moments: np
2 np(1 p)
Very often when using the binomial distribution we will write 1 p q .
The Actuarial Education Company © IFE: 2020 Examinations
Page 6 CS1‐02: Probability distributions
As an example of the binomial distribution, suppose that X is the number of sixes obtained when
10 x
a fair die is thrown 10 times. Then P( X x) 10C x 1 6 6
x 5
and the probability of exactly
one ‘six’ in ten throws is 10C1 1 6 6
1 5 9
0.3230 . There are 10 10C1 ways of obtaining
exactly one ‘six’, ie the ‘six’ could be on the first throw, the second throw, …. or the tenth throw.
Question
Calculate the probability that at least 9 out of a group of 10 people who have been infected by a
serious disease will survive, if the survival probability for the disease is 70%.
Solution
10 10
P( X 9) P( X 9 or 10) 0.79 0.3 0.710 0.1493
9 10
Alternatively, we could use the cumulative binomial probabilities given on page 187 of the Tables.
The figure for x 8 in the Tables for the Bin(10,0.7) distribution is 0.8507. Subtracting this from
1, we get 1 0.8507 0.1493 as before.
The R code for simulating values and calculating probabilities and quantiles from the
binomial distribution uses the R functions rbinom, dbinom, pbinom and qbinom. The
prefixes r, d, p, and q stand for random generation, density, distribution and quantile
functions respectively.
R code for simulating a random sample of 100 values from the binomial distribution with
n 20 and p 0.3 :
n = 20
p = 0.3
rbinom(100, n, p)
Calculate P ( X 2) :
dbinom(2, n, p)
Similarly, the cumulative distribution function (CDF) and quantiles can be calculated with
pbinom and qbinom.
© IFE: 2020 Examinations The Actuarial Education Company
CS1‐02: Probability distributions Page 9
1.5 Negative binomial distribution
This is a generalisation of the geometric distribution.
The random variable X is the number of the trial on which the k th success occurs, where
k is a positive integer.
For example, in a telesales company, X might be the number of phone calls required to make the
fifth sale.
x 1 k x k
Distribution: P ( X x ) p (1 p ) x k , k 1, ; 0 p 1
k 1
The probabilities satisfy the recurrence relationship:
x 1
P(X x ) (1 p )P ( X x 1)
x k
k k (1 p)
Moments: and: 2
p p2
Note: The mean and variance are just k times those for the geometric ( p) variable, which is
itself a special case of this random variable (with k 1 ). Further, the negative binomial
variable can be expressed as the sum of k geometric variables (the number of trials to the
first success, plus the number of additional trials to the second success, plus … to the
( k 1 )th success, plus the number of additional trials to the k th success.)
Question
If the probability that a person will believe a rumour about a scandal in politics is 0.8, calculate the
probability that the ninth person to hear the rumour will be the fourth person to believe it.
Solution
8
P( X 9) 0.84 0.25 0.00734
3
The Actuarial Education Company © IFE: 2020 Examinations
Page 10 CS1‐02: Probability distributions
k y 1 k k (1 p)
p (1 p ) , y 0, 1, 2, 3, , with mean . Y X k,
y
Then P (Y y )
y p
where X is defined as above.
This formulation is called the Type 2 negative binomial distribution and can be found on page 9 of
the Tables. It should be noted that in the Tables the combinatorial factor has been rewritten in
terms of the gamma function (defined later in this chapter).
The previous formulation is known as the Type 1 negative binomial distribution. The formulae for
this version are given on page 8 of the Tables.
The R code for simulating values and calculating probabilities and quantiles from the
negative binomial distribution is similar to the R code used for the binomial distribution
using the R functions rnbinom, dnbinom, pnbinom and qnbinom.
For example:
By default, R uses the Type‐2 version of the negative binomial distribution.
1.6 Hypergeometric distribution
This is the ‘finite population’ equivalent of the binomial distribution, in the following sense.
Suppose objects are selected at random, one after another, without replacement, from a
finite population consisting of k ‘successes’ and N k ‘failures’. The trials are not
independent, since the result of one trial (the selection of a success or a failure) affects the
make-up of the population from which the next selection is made.
k N k
x nx
P(X x ) , x 1, 2, 3, ; 0 p 1 .
N
n
nk
Moments:
N
nk (N k )(N n )
2
N 2 (N 1)
(The details of the derivation of the mean and variance of the number of successes are not
required by the syllabus).
© IFE: 2020 Examinations The Actuarial Education Company
CS1‐02: Probability distributions Page 13
This suggests that X has variance . This is in fact also the case. So 2 .
Question
Using the probability function for the Poisson distribution, prove the formulae for the mean and
variance. Hint: for the variance, consider E[ X ( X 1)] .
Solution
The mean is:
2 3 4
E ( X ) xP( X x) e 2 e 3 e 4 e
x 2! 3! 4!
3 4
e 2e e e
2! 3!
2 3
e 1
2! 3!
2 3
Since e 1 , we obtain:
2! 3!
E (X ) e e
2 3 4
E[ X ( X 1)] x(x 1) P( X x) 2 1 e 3 2 e 4 3 e
x 2 3! 4!
2
2e 1
2!
2e e
2
E[ X ( X 1)] E ( X 2 ) E ( X ) 2 E ( X 2 ) 2 E ( X ) 2
We can calculate Poisson probabilities in the usual way, using the probability function or the
cumulative probabilities given in the Tables.
The Actuarial Education Company © IFE: 2020 Examinations
Page 14 CS1‐02: Probability distributions
Question
If goals are scored randomly in a game of football at a constant rate of three per match, calculate
the probability that more than 5 goals are scored in a match.
Solution
The number of goals in a match can be modelled as a Poisson distribution with mean 3 .
P( X 5) 1 P( X 5)
We can use the recurrence relationship given:
P( X 0) e 3 0.0498
3
P( X 1) 0.0498 0.1494
1
3
P( X 2) 0.1494 0.2240
2
3
P( X 3) 0.2240 0.2240
3
3
P( X 4) 0.2240 0.1680
4
3
P( X 5) 0.1680 0.1008
5
Alternatively, we could obtain this directly using the cumulative Poisson probabilities given on
page 176 of the Tables. For Poi(3) , the figure for x 5 is 0.91608, and 1 0.91608 0.08392 .
The Poisson distribution provides a very good approximation to the binomial when n is
large and p is small – typical applications have n 100 or more and p 0.05 or less. The
approximation depends only on the product np ( ) – the individual values of n and p are
irrelevant. So, for example, the value of P ( X x ) in the case n 200 and p 0.02 is
effectively the same as the value of P ( X x ) in the case n 400 and p 0.01 . When
dealing with large numbers of opportunities for the occurrence of ‘rare’ events (under
‘binomial assumptions’), the distribution of the number that occurs depends only on the
expected number.
We will look at other approximations in Chapter 6.
© IFE: 2020 Examinations The Actuarial Education Company
CS1‐04: Joint distributions Page 3
1 Joint distributions
1.1 Joint probability (density) functions
Defining several random variables simultaneously on a sample space gives rise to a
multivariate distribution. In the case of just two variables, it is a bivariate distribution.
Discrete case
To illustrate this for a pair of discrete variables, X and Y , the probabilities associated with
the various values of ( x , y ) are as follows:
y
1 2 3
3 0.20 0.05 -
4 0.15 0.05 -
The requirements for a function to qualify as the probability function of a pair of discrete
random variables are:
f (x, y ) 1
x y
This parallels earlier results, where the probability function was P( X x) which had to satisfy
P( X x) 0 for all values of x and P( X x) 1 .
x
The Actuarial Education Company © IFE: 2020 Examinations
Page 4 CS1‐04: Joint distributions
m
P(M m, N n) , where m 1, 2, 3, 4 and n 1, 2, 3
35 2n2
Let’s draw up a table showing the values of the joint probability function for M and N.
2 1
Starting with the smallest possible values of M and N , P(M 1, N 1) .
1
35 2 35
Calculating the joint probability for all combinations of M and N, we get the table shown below.
M
1 2 3 4
2 4 6 8
1
35 35 35 35
1 2 3 4
N 2
35 35 35 35
1 1 3 2
3
70 35 70 35
Question
Use the table of probabilities given above to calculate:
(ii) P(N 3)
Solution
6 3 9
P(M 3, N 1 or 2) P(M 3, N 1) P(M 3, N 2)
35 35 35
© IFE: 2020 Examinations The Actuarial Education Company
CS1‐04: Joint distributions Page 10a
Solution
The marginal PDF of Y is:
2 2
1 1 1 1
fY (y) 16
(x 3y) dx x 2 3xy
16 2
(2 6 y)
x 0 16
x 0
So:
1
(x 3y)
fX ,Y (x , y) x 3y
fX|Y y (x , y) 16 0 x 2
(2 6y) 2(1 3y)
fY (y) 1
16
1.4 Distributions defined on more complex domains
For all the distributions we have seen so far, the limits on both the x and the y integrals have
been numbers. So, in the previous example, the joint PDF is defined over the rectangle whose
vertices are at the points (0, 0), (0, 2), (2, 0), and (2, 2).
It is possible for a joint distribution to be defined over a non‐rectangular area. In these cases, the
limits for y may be dependent on x , or vice versa. Care needs to be taken in these cases to
ensure that the correct limits are used when integrating.
Question
f (x , y) k(x2 xy) 0 y x 2
(i) Calculate the value of k .
(ii) Determine the conditional density function of Y | X x .
The Actuarial Education Company © IFE: 2020 Examinations
Page 10b CS1‐04: Joint distributions
Solution
So, integrating first with respect to x :
2
2 x3 x2y 8 y
3
y3 8 5y 3
k(x xy) dx k 3 2 k 3 2y 3 2 k 3 2y 6
2
y y
We now integrate this expression with respect to y , using the limits 0 and 2:
2 2
8 5y 3 8y 5y 4 16 10
k 2y dy k y2 k 4 6k
3 6 3 24 3 3
0 0
1
Since this must be equal to 1, we see that k .
6
18 5y 3
fY (y) 2y 0 y 2
6 3 6
x
x
1 1 1 1 1
fX (x) k x 2 xy dy x 2 y xy2 x 3 x 3 x 3 0 x 2
6 2 0 6 2 4
0
fY|X (x , y)
fX ,Y (x , y)
6
1 2
x xy
21 y
0 y 2
fX ( x ) 1 3 3 x x2
x
4
© IFE: 2020 Examinations The Actuarial Education Company
CS1-04: Joint distributions Page 47
f (x , y) c(x 3y) 0 x 2, 0 y 2
(ii) Use the result from part (i) to derive the conditional PDF of X given Y y . [1]
[Total 3]
f ( x , y)
6
1 2
x xy 0 y x 2
m
P(M m, N n) , for m 1, 2, 3, 4 and n 1, 2, 3
35 2n2
the conditional probability functions for M given N n and for N given M m are equal to the
corresponding marginal distributions.
Exam style
4
fX ,Y (x , y) 3x2 xy 0 x 1, 0 y 1
5
Determine:
4.6 Calculate the correlation coefficient of X and Y , where X and Y have the joint distribution:
0 1 2
1 0.1 0.1 0
4.7 Claim sizes on a home insurance policy are normally distributed about a mean of £800 and with a
standard deviation of £100. Claims sizes on a car insurance policy are normally distributed about
a mean of £1,200 and with a standard deviation of £300. All claims sizes are assumed to be
independent.
To date, there have already been home claims amounting to £800, but no car claims. Calculate
the probability that after the next 4 home claims and 3 car claims the total size of car claims
exceeds the total size of the home claims.
4.8 Two discrete random variables, X and Y , have the following joint probability function:
Exam style
X
1 2 3
1 0.2 0 0.2
Y 2 0 0.2 0
3 0.2 0 0.2
Determine:
(i) E (X ) [1]
4.9 The random variables X and Y have joint density function given by:
kx ey 1 x ,1 y
2
4.10 Show using convolutions that if X and Y are independent random variables and X has a m
distribution and Y has a n2 distribution, then X Y has a m
2
n distribution.
4.11 Let X be a random variable with mean 3 and standard deviation 2, and let Y be a random
variable with mean 4 and standard deviation 1. X and Y have a correlation coefficient of –0.3.
Exam style
Let Z X Y .
Calculate:
4.12 X has a Poisson distribution with mean 5 and Y has a Poisson distribution with mean 10. If
cov( X , Y ) 12 , calculate the variance of Z where Z X 2Y 3 . [2]
Exam style
4.13 Show that if X has a negative binomial distribution with parameters k and p , and Y has a
negative binomial distribution with parameters m and p , and X and Y are independent, then
X Y also has a negative binomial distribution, and specify its parameters.
4.14 For a certain company, claim sizes on car policies are normally distributed about a mean of £1,800
and with standard deviation £300, whereas claim sizes on home policies are normally distributed
Exam style
about a mean of £1,200 and with standard deviation £500. Assuming independence among all
claim sizes, calculate the probability that a car claim is at least twice the size of a home claim. [4]
4.15 (i) Two discrete random variables, X and Y , have the following joint probability function:
Exam style
X
1 2 3 4
fU ,V (u , v) 48
67
2uv u2 0 u 1, u2 v 2
Chapter 4 Solutions
2 2 2 2
c(x 3y) dxdy c 12 x 2 3xy dy
x 0
y 0 x 0 y 0
2
c(2 6y) dy
y 0
2
c 2y 3y 2
y 0
16c 1
c 16
1
2 1
P( X 1,Y 0.5) 1 (x 3y) dx dy
16
y 0.5 x 0
2 1
1 1 x 2 3xy
16 2 x 0
dy
y 0.5
2
1 1 3y
16 2 dy
y 0.5
2
1 1 y 3 y2
16
2 2 y 0.5
128
51 0.398
1 y
1 y
fY (y ) 2 dx 2 x 0 2 1 y , 0 y 1 [2]
0
fX ,Y (x , y) 2 1
fX|Y y (x , y) , 0 x 1y [1]
fY (y) 2(1 y) 1 y
4.3 (i) We saw in Section 1.4 of the chapter that the marginal distribution for Y is:
18 5y 3
fY (y) 2y 0 y 2
6 3 6
So the PDF for the conditional distribution X |Y y is the joint PDF divided by the
marginal PDF:
fX|Y y (x , y)
1 2
6
x xy
x 2 xy
0 x 2
18 5y 3 8 5y 3
2 y 2 y
6 3 6 3 6
1.5
1
1.5
1 x 3 x2y
2
P(1 X 1.5|Y y) x xy dx
8 5y 3 8 5y 3
3 2 1
2y 1 2y
3 6 3 6
1.125 1.125y 1 / 3 y / 2
8 5y 3
2y
3 6
Substituting in y 1 , we obtain:
P(1 X 1.5|Y 1)
1.125 1.125 1 / 3 1 / 2 1.4167 0.3696
8 5 3.8333
2
3 6
4.4 In the chapter, we found that the marginal probability functions for M and N were:
m
PM (m) for m 1, 2, 3, 4
10
and:
1
PN (n) for n 1, 2, 3
7 2n 3
So, dividing the joint probability function by the marginal probability function for N , we obtain:
P (m, n) m 1 m
PM|N n (m, n) M ,N , m 1, 2, 3, 4
PN (n) 35 2n 2
7 2n3 10
Similarly:
P(N n, M m) m m 1
PN|M m (m, n) , n 1, 2, 3
P(M m) 35 2n2 10 7 2n3
These are identical to the marginal distributions obtained in the chapter text.
1 1
fX ( x )
4
5
4
1
3x 2 xy dy 3x2 y xy 2
5 2
4 1
3x 2 x
y 0 5 2
[2]
y 0
fY|X x (x , y)
fX ,Y (x , y)
4
5 3 x 2 xy 3x 2 xy 3 x y
[1]
5 2
fX ( x ) 4 3x 2 1 x 3 x 2 1
x 3x 21
2
(iii) Covariance
1 1
4 3 1 2 4 3 1 11
E(X ) 5
3x x dx x 4 x 3
2 5 4
6 x 0 15
[1]
x 0
1 1
fY (y)
4
5
4
1
3x 2 xy dx x 3 x 2 y
5 2
4 1
1 y
x 0 5 2
x 0
1 1
4 1 2 4 1 2 1 3 8
E (Y ) y y dy y y
5 2 5 2
6 y 0 15
[1]
y 0
Now:
1 1
E ( XY )
4
5
3x 3y x 2 y 2 dy dx
x 0 y 0
1 1
4 3 3 2 1 2 3
5 2
x y x y
3 y 0
dx
x 0
1
43 3 1 2
5 2
x x dx
3
x 0
1
4 3 1
x4 x3
5 8 9 x 0
7
[2]
18
Hence:
7 11 8 1
cov( X ,Y ) [1]
18 15 15 450
4.6 The covariance of X and Y was obtained in Section 2.4 to be cov( X ,Y ) 0.02 . The variances of
the marginal distributions are:
cov( X ,Y ) 0.02
corr ( X ,Y ) 0.0322
var( X )var(Y ) 0.69 0.56
4.7 Let X be the amount of a home insurance claim and Y the amount of a car insurance claim.
Then:
X N(800,1002 ) Y N(1200,3002 )
We require:
P (Y1 Y2 Y3 ) ( X1 X2 X 3 X 4 ) 800
P (Y1 Y2 Y3 ) ( X1 X2 X 3 X 4 ) 800
Therefore:
800 400
P (Y1 Y2 Y3 ) ( X1 X2 X3 X 4 ) 800 P Z
310,000
P(Z 0.718)
Alternatively, we could use the fact that the distribution of X is symmetrical about 2.
P( X 1,Y y)
Using P(Y y | X 1) and P ( X 1) 0.4 gives:
P( X 1)
Y 1| X 1 Y 2| X 1 Y 3| X 1
0.5 0 0.5
[1]
(iii) Correlated?
So cov( X , Y ) E ( XY ) E ( X )E (Y ) 4 2 2 0 . [1]
cov( X ,Y )
Hence corr( X ,Y ) 0.
var( X )var(Y )
(iv) Independent?
kx e y / dx dy 1
y 1 x 1
y / y /
x 1 key /
kx e dx ke
1 1 1
x 1
ke y / k k e 1/
dy ey /
y 1
1 1 1 1
Equating this to 1:
k e 1/ ( 1)e1/
1 k
1
4.10 The chi-square distribution is a continuous distribution that can take any positive value. The
chi-square distribution with parameter m is the same as a gamma distribution with parameters
m / 2 and 1 / 2.
So, using the PDF of the gamma distribution, the PDF of the sum Z X Y is given by the
convolution formula:
z
(½)½m ½m1 ½ x (½)½n
x e (z x)½n1 e ½(z x )dx
(½m) (½n)
0
½(m n) z
1 1
e ½ z x ½m1 (z x)½n1 dx
2 (½m)(½n)
0
1
1
fZ (z) (½)½(mn) e ½ z (zt )½m1 (z zt )½n1 zdt
(½m)(½n)
0
Since the last integral represents the total probability for the Beta(½m,½n) distribution, we get:
2
Since this matches the PDF of the m n distribution (and Z can take any positive value), Z is a
2
m n random variable.
We have:
cov( X , Z ) cov( X , X Y )
cov( X , X ) cov( X ,Y )
var( X ) cov( X ,Y )
cov( X ,Y ) cov( X ,Y )
corr( X ,Y ) 0.3
var( X )var(Y ) 4 1
cov( X ,Y ) 0.6
Hence:
(ii) Variance
var(Z ) cov( X Y , X Y )
3.8 [2]
Now:
and:
cov(aX , bY ) ab cov( X , Y )
So:
5 4 10 4 (12) 93 [1]
k
pet
M X (t )
1 qet
m
pet
MY (t )
1 qet
k m k m
pet pet pet
M X Y (t ) M X (t )MY (t )
1 qet 1 qet 1 qet
This is the MGF of another negative binomial distribution with parameters p and k m . Hence,
by uniqueness of MGFs, X Y has this distribution.
P ( X 2Y ) P ( X 2Y 0) [1]
X 2Y N 600,1090000 [2]
Standardising:
0 (600)
z 0.575
1,090,000
So:
var( X |Y 2) E( X 2 |Y 2) E 2 ( X |Y 2)
P( X x Y 2)
E ( X |Y 2) xP( X x |Y 2) x
P(Y 2)
0 0.3 0.1 0.2
1 2 3 4
0.6 0.6 0.6 0.6
5
2 [1]
6
P( X x Y 2)
E ( X 2 |Y 2) x 2P( X x |Y 2) x 2
P(Y 2)
0 0.3 2 0.1 2 0.2
12 22 3 4
0.6 0.6 0.6 0.6
5
8 [1]
6
2
5 5 29
So var( X |Y 2) 8 2 0.80556 . [1]
6 6 36
We require:
E (U|V v) u f (u|v) du
u
Now:
1 1
48 (2uv u2 ) du 48 u2v 1 u3 v 1
f (v) 67 67 3 u 0
48
67 3
[1]
u 0
2
f (u , v ) 67 (2uv u ) 2uv u2
48
f (u|v) [1]
f (v) 48 (v 1 )
67 3
v 13
So:
1
1
2u2v u3 2 u3v 1 u4 2v 1
E (U |V v) du 3 4 3 4 [1]
v 1 v 1 v 1
u 0 3 3 u 0 3
Hence:
2 3x
x y 2
xy
dy y
E[Y | X x ] y 5 dy
y 0
6x
5 x 1 y 0
2( x 1)
2
2
xy y 2 1 xy 2 1 y 3
dy 2 3
2( x 1) 2( x 1)
y 0 y 0
2 x 83 x 43 3x 4
2( x 1) x 1 3( x 1)
We can also calculate conditional expectations in the case where the limits for one variable
depend on the other.
Question
fX ,Y (x , y)
1 2
6
x xy 0 y x 2
Determine the conditional expectation E[Y | X x ] .
Solution
We saw in the previous chapter in Section 1.4 that in this case the conditional distribution
Y | X x has PDF:
2 1 y
fY|X (x , y) 0 y 2
3 x x2
So the conditional expectation is given by:
2
2
2 1 y
2
2 y 2y 2 y 2 2y 3 4 16
E (Y | X x) y dy dy
2
3 x x 2 2 2
0 0
3x 3x 3 x 9 x 0 3 x 9 x
The Actuarial Education Company © IFE: 2020 Examinations
Page 6 CS1‐05: Conditional expectation
2 The random variable E [Y | X ]
The conditional expectation E [ Y | X x ] g ( x ) , say, is, in general, a function of x . It can
be thought of as the observed value of a random variable g ( X ) . The random variable g ( X )
is denoted E [ Y | X ] .
3x 4
We saw in a previous question that E[Y | X x ] ,which is a function of x. So E[Y | X ] in
3(x 1)
3X 4
this example is the random variable .
3( X 1)
In a later chapter the regression line will be defined as E [Y | x ] x .
E [ Y | X ] , like any other function of X , has its own distribution, whose properties depend
on those of the distribution of X itself. Of particular importance is the expected value (the
mean) of the distribution of E [ Y | X ] . The usefulness of considering this expected value,
E [ E [ Y | X ] ] , comes from the following result, proved here in the case of continuous
variables, but true in general.
Theorem: E [ E [ Y | X ]] E [ Y ]
Proof:
E [ E [ Y | X ] ] E [ Y | x ] f X ( x ) dx
y f ( y | x ) dy f X ( x ) dx
y f ( x , y ) dx dy E [ Y ]
We are integrating here over all possible values of x and y .
f (x , y)
The last two steps follow by noting that f (y | x) and f (x , y) dx fY (y) ie the marginal
fX ( x )
PDF of Y .
This formula is given on page 16 of the Tables.
© IFE: 2020 Examinations The Actuarial Education Company
CS1‐06: The Central Limit Theorem Page 15
Another method of comparing the distribution of our sample means, x , with the normal
distribution is to examine the quantiles.
In R we can find the quantiles of x using the quantile function. Using the default setting
(type 7) to obtain the sample lower quartile, median and upper quartile gives 4.775, 5.000
and 5.250, respectively. However in Subject CS1 we prefer to use type 5 or type 6.
In R, we can find the quartiles of the normal distribution using the qnorm function. This
gives a lower quartile, median and upper quartile of 4.762, 5.000 and 5.238, respectively.
The quantiles obtained here are those of a normal distribution with mean 5 and variance 5/40.
There is no universal agreement amongst statisticians over how to define sample quantiles. The
n1
lower quartile, for example, is sometimes defined to be the position of the th sample value,
4
n2 n3
where n is the sample size. Others may use the th sample value, or even the th
4 4
value.
n3 3n 1
In R, if we do not specify, R will use and for the lower and upper quartiles. Other
4 4
definitions can be used by specifying them in the R code. In fact, when we use R, we are often
using quite large sample sizes, in which case the differences between the different definitions will
be minimal.
The Actuarial Education Company © IFE: 2020 Examinations
Page 16 CS1‐06: The Central Limit Theorem
We observe that our distribution of the sample means is slightly more spread out in the tails
– which is what we observed in the previous diagram.
A quick way to compare all the quantiles in one go is by drawing a QQ-plot using the R
function qqnorm.
If our sample quantiles coincide with the quantiles of the normal distribution we would
observe a perfect diagonal line (which we have added to the diagram for clarity). For our
example we can see that x and the normal distribution are very similar except in the tails
where we see that x has a lighter lower tail and a heavier upper tail than the normal
distribution.
The QQ plot gives us sample quantiles which are very close to the diagonal line.
Care needs to be taken when interpreting a QQ plot. In this example, we see that, at the top end
of the distribution, the sample quantiles are slightly larger than we would expect them to be. This
suggests that our sample has slightly more weight in the upper tail than the corresponding normal
distribution.
At the lower end, again the sample quantiles are slightly larger than we would expect. This
suggests that our sample has slightly less weight in the lower tail than the corresponding normal
distribution. This might be the case if the sample distribution was (very slightly) positively
skewed.
If we use R to calculate the coefficient of skewness for this sample, we obtain a figure of 0.0731.
This confirms the very slight positive sample skewness.
© IFE: 2020 Examinations The Actuarial Education Company
Page 16 CS1-07: Sampling and statistical inference
The F distribution gives us the distribution of the variance ratio for two normal populations. v1
and v2 can be referred to as the number of degrees of freedom in the numerator and
denominator respectively.
It should be noted that it is arbitrary which one is the numerator and which is the
S22 / 22
denominator and so ~ Fn2 1,n1 1 .
S12 / 12
Since it is arbitrary which value is the numerator and which is the denominator, and since only the
upper critical points are tabulated, it is usually easier to put the larger value of the sample
variance into the numerator and the smaller sample variance into the denominator.
1
Alternatively, F ~ Fn1 1,n2 1 ~ Fn2 1,n1 1 .
F
This reciprocal form is needed when using tables of critical points, as only upper tail points
are tabulated. See ‘Formulae and Tables’.
This is an important result and will be used in Chapter 9 in the work on confidence intervals and
Chapter 10 in the work on hypothesis tests.
The percentage points for the F distribution can be found on pages 170-174 of the Tables.
Question
Determine:
Solution
(i) 3.779 is greater than 1, so we simply use the upper critical values given:
since 3.779 is the 2½% point of the F9,10 distribution (page 173).
(ii) Since 3.8 is greater than 1, it is again an upper value and so we use the Tables directly.
We simply turn the probability around:
1
(iii) Since this is a lower critical point we need to use the result:
Fm,n
1 1 1
P(F11,8 0.3392) P P F8,11 P(F8,11 2.948) 0.05
F11,8 0.3392 0.3392
(iv) Since only 1% of the distribution is below p, this implies that it must be a lower critical
1
point and so we use the result again:
Fm,n
1 1
P(F14,6 p) P F6,14 0.01 4.456 p 0.2244
p p
The mean of the F distribution is 1, regardless of the number of degrees of freedom. So values
such as 0.3392 and 0.2244 given above are values in the lower tail, whereas 3.779 and 3.8 are
upper tail values.
We can use the following R code to obtain a single resample with replacement from this
original sample.
sample.data <- c(0.61, 6.47, 2.56, 5.44, 2.72, 0.87, 2.77, 6.00,
0.14, 0.75)
sample(sample.data, replace=TRUE)
If we do this, R automatically gives us a sample of the same size as the original data sample, ie we
obtain a sample of size 10 in this case.
Note that this is non-parametric as we are ignoring the Exp( ) assumption to obtain a new
sample.
) using ˆ j 1 y j and
* *
The following R code obtains B 1,000 estimates (ˆ1* , ˆ2* ,..., ˆ1,000
*
set.seed(47)
estimate<-rep(0,1000)
for (i in 1:1000)
{x<-sample(sample.data, replace=TRUE);
estimate[i]<-1/mean(x)}
set.seed(47)
The Actuarial Education Company © IFE: 2020 Examinations
Page 38 CS1‐08: Point estimation
We can obtain estimates for the mean, standard error and 95% confidence interval of the
estimator ̂ using the following R code:
mean(estimate)
sd(estimate)
quantile(estimate, c(0.025,0.975))
7.3 Parametric bootstrap
If we are prepared to assume that the sample is considered to come from a given
distribution, we first obtain an estimate of the parameter of interest ˆ (eg using maximum
likelihood, or method of moments). Then we use the assumed distribution, with parameter
equal to ˆ , to draw the bootstrap samples. Once the bootstrap samples are available, we
proceed as with the non-parametric method before.
Example
Using our sample of 10 values (to 2 DP) from an Exp( ) distribution with unknown
parameter :
0.61 6.47 2.56 5.44 2.72 0.87 2.77 6.00 0.14 0.75
our estimate would for would be ˆ 1 y 1 2.833 0.3530 . We now use the Exp(0.3530)
distribution to generate the bootstrap samples.
© IFE: 2020 Examinations The Actuarial Education Company
CS1‐08: Point estimation Page 57
(ii) Method of moments estimator
We have one unknown, so we will use E(X ) x .
E ( X ) 2 18 2 4 12 3 5 83 33
8
3
From the data, we have:
7 2 6 4 17 5 123
x 4.1
30 30
Therefore:
33 3ˆ 4.1
8
ˆ 0.0083
This value lies between the limits derived in part (i).
(iii) Maximum likelihood
The likelihood of obtaining the observed results is:
7 6 17
L( ) constant 18 2 12 3 83
Taking logs and differentiating gives:
ln L( ) constant 7ln 18 2 6ln 12 3 17ln 83
d 14 18 17
ln L
d 1 2 2 3 8
1 3
8
Equating this to zero to find the maximum value of gives:
14 18 17
0
1 2ˆ 1 3ˆ 3 ˆ
8 2 8
14 12 3ˆ 83 ˆ 18 18 2ˆ 83 ˆ 17 18 2ˆ 12 3ˆ 0
3 5 ˆ 3ˆ 2 18 3 7 ˆ 2ˆ 2 17 1 5 ˆ 6ˆ 2 0
14 16 8 64 8 16 8
180ˆ 2 111
8
ˆ 32
91 0
The Actuarial Education Company © IFE: 2020 Examinations
Page 58 CS1‐08: Point estimation
(iv) MLE
Solving the quadratic equation gives:
1118
2
111
8
4 180 32
91
ˆ 0.170,0.0929
360
The maximum likelihood estimate is 0.0929.
The sample mean is:
1
(87,889 0 11,000 1 1,000 2 ) 0.13345
100,000
For the Poisson distribution, probabilities can be calculated iteratively using the relationship:
P( X x ) P( X x 1), x 1, 2, 3, ...
x
The expected numbers, based on this estimate, are:
0.13345
x 2 : 11,678 779
2
0.13345
x 3 : 779 35
3
0.13345
x 4 : 35 1
4
0.13345
x 5 : 1 0
5
© IFE: 2020 Examinations The Actuarial Education Company
Page 54 CS1‐10: Hypothesis testing
Question
A certain company employs both graduates and non‐graduates. A small sample of employees are
entered for a certain test, with the following results. Of the four graduates taking the test, all
passed. Of the eight non‐graduates taking the test, five passed. Using Fisher’s exact test, assess
whether graduates are more likely to pass the test than non‐graduates.
Solution
9
Given that we had nine passes, the number of ways of choosing four graduates to pass is .
4
3
Given that we had three fails, the number of ways of choosing no graduates to fail is . The
0
12
total number of ways of choosing four graduates out of 12 employees is .
4
So the probability of obtaining four graduate passes is:
9 3
4 0 14 0.2545
12 55
4
Since we cannot obtain more than four graduate passes when we only have four graduates, this is
the most extreme result possible, and the total probability of obtaining as extreme a result as this
is 0.2545. Since this is not less than 5%, we have insufficient evidence to conclude that graduates
are more likely to pass than non‐graduates.
We can see that in this case it will never be possible to obtain a significant result, based on the
small sample numbers we have here. Fisher’s exact test needs much bigger samples for it to be
usable to obtain satisfactory statistical results.
© IFE: 2020 Examinations The Actuarial Education Company
All study material produced by ActEd is copyright and is sold
for the exclusive use of the purchaser. The copyright is
owned by Institute and Faculty Education Limited, a
subsidiary of the Institute and Faculty of Actuaries.
Suppose we are modelling the number of claims on a motor insurance portfolio and we have data
on the driver’s age, sex and vehicle group. We would start with the null model (ie a single
constant equal to the sample mean) then we would try each of single covariate models (linear
function of age or the factors sex or vehicle group) to see which produces the most significant
improvement in a 2 test or reduces the AIC the most. Suppose this was sex. Then we would try
adding a second covariate (linear function of age or the factor vehicle group). Suppose this was
age. Then we would try adding the third covariate (vehicle group). Then maybe we would try a
quadratic function of the variable age (and maybe higher powers) or each of 2 term interactions
(eg sex*age or sex*group or age*group). Finally we would try the 3 term interaction
(ie sex*age*group).
(2) Backward selection. Start by adding all available covariates and interactions. Then
remove covariates one by one starting with the least significant until the AIC reaches a
minimum or there is no significant improvement in the deviance, and all the remaining
covariates have a statistically significant impact on the response.
So with the last example we would start with the 3 term interaction sex*age*group and look at
which parameter has the largest p-value (in a test of it being zero) and remove that. We should
see a significant improvement in a 2 test and the AIC should fall. Then we remove the next
parameter with the largest p-value and so on.
The Core Reading uses R to demonstrate this procedure. Whilst this will be covered in the CS1
PBOR, it’s important to understand the process here.
Example
We demonstrate both of these methods in R using a binomial model on the mtcars dataset
from the MASS package to determine whether a car has a V engine or an S engine (vs)
using weight in 1000 lbs (wt) and engine displacement in cubic inches (disp) as covariates.
Forward selection
The AIC of this model (which would be displayed using summary(model0)) is 45.86.
We have to choose whether we add disp or wt first. We try each and see which has the
greatest improvement in the deviance.
So we can see that disp has produced the more significant result – so we add that
covariate first.
Note that R always calls the models we are comparing ‘Model 1’ and ‘Model 2’, irrespective of
how we have named them. This can lead to confusion if we are not careful.
The AIC of model 1 (adding disp) is 26.7 whereas the AIC of model 2 (adding wt) is 35.37.
Therefore adding disp reduces the AIC more from model 0’s value of 45.86.
This has not led to a significant improvement in the deviance so we would not add wt (and
therefore we definitely would not add an interaction term between disp and wt).
The AIC of model 3 (adding wt) is 27.4 which is worse than model 1’s AIC of 26.7. Therefore we
would not add it.
Incidentally the AIC for models 0, 1, 2, 3 are 45.86, 26.7, 35.37 and 27.4. So using these
would have given the same results (as Model 1 produces a smaller AIC than Model 2, and
then Model 3 increases the AIC and so we would not have selected it).
Backward selection
None of these covariates is significant – so removing the interaction term wt:disp which is
the least significant.
The parameter of the interaction term has the highest p-value of 0.829 and so is most likely to be
zero.
Both of these coefficients are significant and the AIC has fallen from 27.4 to 26.696.
We would stop at this model. Had we removed the disp term (to give the null model) the
AIC increases to 45.86.
Alternatively, carrying out a 2 test between these two models would show a very significant
difference (p-value of less than 0.001) and therefore we should not remove the disp covariate.
We can see that both forward and backward selection lead to the same model being chosen.
Substituting the estimated parameters into the linear predictor gives the estimated value of the
linear predictor for different individuals. Now the link function links the linear predictor to the
mean of the distribution. Hence we can obtain an estimate for the mean of the distribution of Y
for that individual.
Suppose, we wish to estimate the probability of having a V engine for a car with weight
2100 lbs and displacement 180 cubic inches.
These coefficients displayed as part of the summary output of Model C in the example above.
Hence, for displacement 180 we have ˆ 4.137827 0.021600 180 0.24983 . We did not
specify the link function so we shall use the canonical binomial link function which is the
logit function.
ˆ e 0.24983
0.24983 log ˆ 0.562
1 ˆ 1 e 0.24983
Recall that the mean for a binomial model is the probability. So the probability of having a V
engine for a car with weight 2,100 lbs and displacement 180 cubic inches is 56.2%.
Because we removed the weight covariate, the figure 2,100 does not enter the calculation.
newdata <-data.frame(disp=180)
predict(model,newdata,type="response")
X1.1 An actuarial student has said that the following three distributions are the same:
(i) the chi square distribution with 2 degrees of freedom
(ii) the exponential distribution with mean ½
State with reasons whether the student is correct. [2]
X1.2 The number of telephone calls per hour on a working day received at an insurance office follows a
Poisson distribution with mean 2.5.
(i) Calculate the probability that more than 7 telephone calls are received on a working day
between 9am and 11am. [1]
(ii) Calculate the probability that, if the office opens at 8am, there are no telephone calls
received until after 9am. [2]
[Total 3]
2 52
f (x) , x 0
(5 x)3
(ii) Calculate two simulated observations from the distribution using the random numbers
0.656 and 0.285 selected from the U(0,1) distribution. [3]
[Total 4]
The Actuarial Education Company © IFE: 2020 Examinations
Page 2 CS1: Assignment X1 Questions
X1.6 A large life office has 1,000 policyholders, each of whom has a probability of 0.01 of dying during
the next year (independently of all other policyholders).
(i) Derive a recursive relationship for the binomial distribution of the form:
(ii) Calculate the probabilities of the following events:
(a) there will be no deaths during the year
(b) there will be more than two deaths during the year
E(Y | X x) 2x 400
x2
var(Y | X x)
2
The distribution of X over the portfolio is assumed to be normal with mean 50 and standard
deviation 14.
The random variables X and Y are jointly distributed with standard deviations of 5 and 7
respectively and corr( X ,Y ) 3 7 .
© IFE: 2020 Examinations The Actuarial Education Company
CS1: Assignment X1 Solutions Page 1
Assignment X1 Solutions
Markers: This document sets out one approach to solving each of the questions (sometimes with
alternatives). Please give credit for any other valid approaches.
Solution X1.1
The exponential distribution with mean ½ has parameter 2. This is a Gamma(1, 2) distribution,
and so is not equivalent to the other two. [1]
Be careful to distinguish between the parameter and the mean 1 for the exponential
distribution.
Solution X1.2
The number of telephone calls, N , in a two hour period also follows a Poisson distribution:
The waiting time, T , in hours for the first telephone call has an exponential distribution:
T Exp(2.5) [1]
Solution X1.3