Journal of Statistical Software: Actuar: An R Package For Actuarial Science
Journal of Statistical Software: Actuar: An R Package For Actuarial Science
Abstract
actuar is a package providing additional Actuarial Science functionality to the R sta-
tistical system. The project was launched in 2005 and the package is available on the
Comprehensive R Archive Network since February 2006. The current version of the pack-
age contains functions for use in the fields of loss distributions modeling, risk theory
(including ruin theory), simulation of compound hierarchical models and credibility the-
ory. This paper presents in detail but with few technical terms the most recent version of
the package.
Keywords: insurance, loss distributions, risk theory, ruin theory, credibility theory, distribu-
tions, phase-type, matrix exponential, simulation, hierarchical, R.
1. Introduction
actuar is a package providing additional Actuarial Science functionality to the R statistical
system (R Development Core Team 2008). Various packages on the Comprehensive R Archive
Network (CRAN, http://CRAN.R-project.org/) provide functions useful to actuaries, e.g.,
copula (Yan 2007), Rmetrics (Wuertz 2007), SuppDists (Wheeler 2008) or the R recommended
packages nlme (Pinheiro, Bates, DebRoy, Sarkar, and the R Development Core Team 2007)
and survival (Lumley 2008) just to name a few. However, actuar aims to serve as a central
location for more specifically actuarial functions and data sets. The project was officially
launched in 2005 and is under active development.
The feature set of the package can be split in four main categories: loss distributions modeling,
risk theory (including ruin theory), simulation of compound hierarchical models and credibility
theory. As much as possible, the developers have tried to keep the “user interface” of the
various functions of the package consistent. Moreover, the package follows the general R
philosophy of working with model objects.
This paper reviews the various features of version 0.9-7 of actuar. We provide enough actuarial
2 actuar: An R Package for Actuarial Science
background where needed to make the paper self-contained, but otherwise give numerous
references for the reader interested to dive more into the subject.
Future versions of the package can be obtained from CRAN at http://CRAN.R-project.
org/package=actuar.
1. introduction of 18 additional probability laws and utility functions to get raw moments,
limited moments and the moment generating function;
mk = E[X k ], (1)
when it exists. Every probability law of Table 1 is supported, plus the following ones: beta,
exponential, chi-square, gamma, lognormal, normal (no lev), uniform and Weibull of base R
Journal of Statistical Software 3
Table 1: Probability laws supported by actuar classified by family and root names of the R
functions.
and the inverse Gaussian distribution of the package SuppDists (Wheeler 2008). The m and
lev functions are especially useful with estimation methods based on the matching of raw or
limited moments; see Section 2.4 for their empirical counterparts. The mgf functions come in
handy to compute the adjustment coefficient in ruin theory; see Section 3.5.
In addition to the 17 distributions of Table 1, the package provides support for a family
of distributions deserving a separate presentation. Phase-type distributions (Neuts 1981) are
defined as the distribution of the time until absorption of continuous time, finite state Markov
processes with m transient states and one absorbing state. Let
T t
Q= (4)
0 0
be the transition rates (or intensity) matrix of such a process and let (π, πm+1 ) be the initial
probability vector. Here, T is an m × m non-singular matrix with tii < 0 for i = 1, . . . , m and
tij ≥ 0 for i 6= j, t = −T e and e is a column vector with all components equal to 1. Then
the cdf of the time until absorption random variable with parameters π and T is
(
πm+1 , x=0
F (x) = T x
(5)
1 − πe e, x > 0,
where
∞
X Mn
eM = (6)
n!
n=0
is the matrix exponential of matrix M .
4 actuar: An R Package for Actuarial Science
The exponential, the Erlang (gamma with integer shape parameter) and discrete mixtures
thereof are common special cases of phase-type distributions.
The package provides functions {d,p,r,m,mgf}phtype for phase-type distributions. Function
pphtype is central to the evaluation of ruin probabilities; see Section 3.6.
The core of all the functions presented in this subsection is written in C for speed. The matrix
exponential C routine is based on expm() from the package Matrix (Bates and Maechler 2008).
R> class(x)
With a suitable print method, these objects can be displayed in an unambiguous manner:
R> x
Second, the package supports the most common extraction and replacement methods for
"grouped.data" objects using the usual [ and [<- operators. In particular, the following
extraction operations are supported.
R> x[, 1]
(ii) Extraction of the vector or matrix of group frequencies (the second and third columns):
Line.1 Line.2
1 30 26
2 31 33
3 57 31
4 42 19
5 65 16
6 84 11
R> x[1:3, ]
Notice how extraction results in a simple vector or matrix if either of the group boundaries
or the group frequencies are dropped.
As for replacement operations, the package implements the following.
0.003
0.002
Density
0.001
0.000
x[, −3]
The package defines methods of a few existing summary functions for grouped data objects.
Computing the mean
r
X cj−1 + cj
nj (7)
2
j=1
R> mean(x)
Line.1 Line.2
188.0 108.2
Higher empirical moments can be computed with emm; see Section 2.4.
The R function hist splits individual data into groups and draws an histogram of the fre-
quency distribution. The package introduces a method for already grouped data. Only the
first frequencies column is considered (see Figure 1 for the resulting graph):
R has a function ecdf to compute the empirical cdf of an individual data set,
n
1X
Fn (x) = I{xj ≤ x}, (8)
n
j=1
8 actuar: An R Package for Actuarial Science
where I{A} = 1 if A is true and I{A} = 0 otherwise. The function returns a "function"
object to compute the value of Fn (x) in any x.
The approximation of the empirical cdf for grouped data is called an ogive (Klugman, Panjer,
and Willmot 1998; Hogg and Klugman 1984). It is obtained by joining the known values of
Fn (x) at group boundaries with straight line segments:
0, x ≤ c0
(c − x)F (c ) + (x − c )F (c )
j n j−1 j−1 n j
F̃n (x) = , cj−1 < x ≤ cj (9)
cj − cj−1
1, x > cr .
The package includes a function ogive that otherwise behaves exactly like ecdf. In particular,
methods for functions knots and plot allow, respectively, to obtain the knots c0 , c1 , . . . , cr of
the ogive and a graph (see Figure 2):
R> Fnt(knots(Fnt))
R> plot(Fnt)
R> data("dental")
R> dental
R> data("gdental")
R> gdental
cj nj
1 (0, 25] 30
2 ( 25, 50] 31
3 ( 50, 100] 57
4 (100, 150] 42
5 (150, 250] 65
Journal of Statistical Software 9
ogive(x)
1.0
●
0.8
●
0.6
F(x)
●
0.4
●
0.2
●
0.0
6 (250, 500] 84
7 (500, 1000] 45
8 (1000, 1500] 10
9 (1500, 2500] 11
10 (2500, 4000] 3
Second, in the same spirit as ecdf and ogive, function elev returns a function to compute
the empirical limited expected value — or first limited moment — of a sample for any limit.
Again, there are methods for individual and grouped data (see Figure 3 for the graphs):
10 actuar: An R Package for Actuarial Science
● ● ●
300
300
●
250
●
●
Empirical LEV
Empirical LEV
200
200
●
●
● ●
150
100
100
●
●
●
50
●
50
●
● ●
● ●
0
0 500 1000 1500 0 1000 2000 3000 4000
x x
Figure 3: Empirical limited expected value function of an individual data object (left) and a
grouped data object (right)
[1] 16.0 37.6 42.4 85.1 105.5 164.5 187.7 197.9 241.1 335.5
[1] 0.00 24.01 46.00 84.16 115.77 164.85 238.26 299.77 324.90
[10] 347.39 353.34
1. The Cramér-von Mises method (CvM) minimizes the squared difference between the
theoretical cdf and the empirical cdf or ogive at their knots:
n
X
d(θ) = wj [F (xj ; θ) − Fn (xj ; θ)]2 (10)
j=1
Journal of Statistical Software 11
for grouped data. Here, F (x) is the theoretical cdf of a parametric family, Fn (x) is
the empirical cdf, F̃n (x) is the ogive and w1 ≥ 0, w2 ≥ 0, . . . are arbitrary weights
(defaulting to 1).
2. The modified chi-square method (chi-square) applies to grouped data only and mini-
mizes the squared difference between the expected and observed frequency within each
group:
Xr
d(θ) = wj [n(F (cj ; θ) − F (cj−1 ; θ)) − nj ]2 , (12)
j=1
Pr
where n = j=1 nj . By default, wj = n−1
j .
3. The layer average severity method (LAS) applies to grouped data only and minimizes the
squared difference between the theoretical and empirical limited expected value within
each group:
Xr
d(θ) = ˜ n (cj−1 , cj ; θ)]2 ,
wj [LAS(cj−1 , cj ; θ) − LAS (13)
j=1
The arguments of mde are a data set, a function to compute F (x) or E[X ∧ x], starting values
for the optimization procedure and the name of the method to use. The empirical functions
are computed with ecdf, ogive or elev.
The expressions below fit an exponential distribution to the grouped dental data set, as per
Example 2.21 of Klugman et al. (1998):
rate
0.003551
distance
0.002842
rate
0.00364
distance
13.54
12 actuar: An R Package for Actuarial Science
Coinsurance (α) αX αX
Inflation (r) (1 + r)X (1 + r)X
Table 2: Coverage modifications for the per-loss and per-payment variables as defined in
Klugman et al. (2004).
rate
0.002966
distance
694.5
Assume X has a gamma distribution. Then an R function to compute the pdf (15) in any y
for a deductible d = 1 and a limit u = 10 is obtained with coverage as follows:
[1] 0
[1] 0.1343
[1] 0.02936
[1] 0
Note how function f is built specifically for the coverage modifications submitted and contains
as little useless code as possible. For comparison purpose, the following function contains no
deductible and no limit:
The vignette "coverage" contains more detailed pdf and cdf formulas under various combi-
nations of coverage modifications.
3. Risk theory
Risk theory refers to a body of techniques to model and measure the risk associated with a
portfolio of insurance contracts. A first approach consists in modeling the distribution of total
claims over a fixed period of time using the classical collective model of risk theory. A second
input of interest to the actuary is the evolution of the surplus of the insurance company over
many periods of time. In ruin theory, the main quantity of interest is the probability that the
surplus becomes negative, in which case technical ruin of the insurance company occurs.
The interested reader can find more on these subjects in Klugman et al. (2004); Gerber (1979);
Denuit and Charpentier (2004); Kaas, Goovaerts, Dhaene, and Denuit (2001), among others.
The current version of actuar contains four visible functions related to the above problems:
two for the calculation of the aggregate claim amount distribution and two for ruin probability
calculations.
We briefly expose the underlying models before we introduce each set of functions.
S = C1 + · · · + CN , (16)
where we assume that C1 , C2 , . . . are mutually independent and identically distributed random
variables each independent of N . The task at hand consists in evaluating numerically the cdf
of S, given by
FS (x) = P[S ≤ x]
X∞
= P[S ≤ x|N = n]pn
n=0
X∞
= FC∗n (x)pn , (17)
n=0
where FC (x) = P[C ≤ x] is the common cdf of C1 , . . . , Cn , pn = P[N = n] and FC∗n (x) =
Journal of Statistical Software 15
fx = F (x + h) − F (x) (19)
The true cdf passes exactly midway through the steps of the discretized cdf.
E[X ∧ a] − E[X ∧ a + h]
+ 1 − F (a), x=a
h
2E[X ∧ x] − E[X ∧ x − h] − E[X ∧ x + h]
fx = , a<x<b (22)
h
E[X ∧ b] − E[X ∧ b − h] − 1 + F (b),
x = b.
h
The discretized and the true distributions have the same total probability and expected
value on (a, b).
16 actuar: An R Package for Actuarial Science
Upper Lower
● ●
● ●
● ●
0.8
0.8
● ●
F(x)
F(x)
● ●
0.4
0.4
0.0
0.0
●
0 1 2 3 4 5 0 1 2 3 4 5
x x
Rounding Unbiased
● ● ●
● ●
0.8
0.8
● ●
● ●
F(x)
F(x)
0.4
0.4
● ●
0.0
0.0
0 1 2 3 4 5 0 1 2 3 4 5
x x
Figure 4 illustrates the four methods. It should be noted that although very close in this
example, the rounding and unbiased methods are not identical.
Usage of discretize is similar to R’s plotting function curve. The cdf to discretize and,
for the unbiased method only, the limited expected value function are passed to discretize
as expressions in x. The other arguments are the upper and lower bounds of the discretiza-
tion interval, the step h and the discretization method. For example, upper and unbiased
discretizations of a Gamma(2, 1) distribution on (0, 17) with a step of 0.5 are achieved with,
respectively,
Function discretize is written in a modular fashion making it simple to add other discretiza-
tion methods if needed.
Journal of Statistical Software 17
1. Recursive calculation using the algorithm of Panjer (1981). This requires the severity
distribution to be discrete arithmetic on 0, 1, 2, . . . , m for some monetary unit and the
frequency distribution to be a member of either the (a, b, 0) or (a, b, 1) family of distri-
butions (Klugman et al. 2004). (These families contain the Poisson, binomial, negative
binomial and logarithmic distributions and their extensions with an arbitrary mass at
x = 0.) The general recursive formula is:
Pmin(x,m)
(p1 − (a + b)p0 )fC (x) + y=1 (a + by/x)fC (y)fS (x − y)
fS (x) = ,
1 − afC (0)
with starting value fS (0) = PN (fC (0)), where PN (·) is the probability generating func-
tion of N . Probabilities are computed until their sum is arbitrarily close to 1.
The recursions are done in C to dramatically increase speed. One difficulty the program-
mer is facing is the unknown length of the output. This was solved using a common,
simple and fast technique: first allocate an arbitrary amount of memory and double this
amount each time the allocated space gets full.
2. Exact calculation by numerical convolutions using (17) and (18). This also requires
a discrete severity distribution. However, there is no restriction on the shape of the
frequency distribution. The package merely implements the sum (17), the convolu-
tions being computed with R’s function convolve, which in turn uses the Fast Fourier
Transform. This approach is practical for small problems only, even on today’s fast
computers.
where µS = E[S] and σS2 = VAR[S]. For most realistic models, this approximation is
rather crude in the tails of the distribution.
3/2
where γS = E[(S −µS )3 ]/σS . The approximation is valid for x > µS only and performs
reasonably well when γS < 1. See Daykin, Pentikäinen, and Pesonen (1994) for details.
Here also, adding other methods to aggregateDist is simple due to its modular conception.
The arguments of aggregateDist differ depending on the calculation method; see the help
page for details. One interesting argument to note is x.scale to specify the monetary unit of
the severity distribution. This way, one does not have to mentally do the conversion between
the support of 0, 1, 2, . . . assumed by the recursive and convolution methods and the true
support of S.
The function returns an object of class "aggregateDist" inheriting from the "function"
class. Thus, one can use the object as a function to compute the value of FS (x) in any x.
For illustration purposes, consider the following model: the distribution of S is a compound
Poisson with parameter λ = 10 and severity distribution Gamma(2, 1). To obtain an approx-
imation of the cdf of S we first discretize the gamma distribution on (0, 22) with the unbiased
method and a step of 0.5, and then use the recursive method in aggregateDist:
R> knots(Fs)
[1] 0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0 5.5
[13] 6.0 6.5 7.0 7.5 8.0 8.5 9.0 9.5 10.0 10.5 11.0 11.5
[25] 12.0 12.5 13.0 13.5 14.0 14.5 15.0 15.5 16.0 16.5 17.0 17.5
[37] 18.0 18.5 19.0 19.5 20.0 20.5 21.0 21.5 22.0 22.5 23.0 23.5
[49] 24.0 24.5 25.0 25.5 26.0 26.5 27.0 27.5 28.0 28.5 29.0 29.5
[61] 30.0 30.5 31.0 31.5 32.0 32.5 33.0 33.5 34.0 34.5 35.0 35.5
[73] 36.0 36.5 37.0 37.5 38.0 38.5 39.0 39.5 40.0 40.5 41.0 41.5
[85] 42.0 42.5 43.0 43.5 44.0 44.5 45.0 45.5 46.0 46.5 47.0 47.5
[97] 48.0 48.5 49.0 49.5 50.0 50.5 51.0 51.5 52.0 52.5 53.0 53.5
[109] 54.0 54.5 55.0 55.5 56.0 56.5 57.0 57.5 58.0 58.5 59.0 59.5
[121] 60.0 60.5 61.0 61.5 62.0 62.5 63.0 63.5 64.0 64.5 65.0 65.5
[133] 66.0 66.5 67.0 67.5 68.0 68.5 69.0 69.5 70.0 70.5 71.0
A nice graph of this function is obtained with a method of plot (see Figure 5):
The package defines a few summary methods to extract information from "aggregateDist"
objects. First, there are methods of mean and quantile to easily compute the mean and
obtain the quantiles of the approximate distribution:
Journal of Statistical Software 19
1.0
0.8
0.6
FS(x)
0.4
0.2
0.0
0 10 20 30 40 50 60
Figure 5: Graphic of the empirical cdf of S obtained with the recursive method
R> mean(Fs)
[1] 20
R> quantile(Fs)
99.9%
49.5
Second, the package introduces the generic functions VaR and CTE with methods for objects
of class "aggregateDist". The former computes the value-at-risk VaRα such that
P[S ≤ VaRα ] = α, (25)
where α is the confidence level. Thus, the value-at-risk is nothing else than a quantile. As for
the method of CTE, it computes the conditional tail expectation
CTEα = E[S|S > VaRα ]. (26)
Here are examples using object Fs obtained above:
20 actuar: An R Package for Actuarial Science
1.0
0.8
0.6
FS(x)
0.4
recursive + unbiased
recursive + upper
recursive + lower
0.2
simulation
normal approximation
0.0
0 10 20 30 40 50 60
Figure 6: Comparison between the empirical or approximate cdf of S obtained with five
different methods
R> VaR(Fs)
R> CTE(Fs)
To conclude on the subject, Figure 6 shows the cdf of S using five of the many combinations
of discretization and calculation method supported by actuar.
As mentioned previously, technical ruin of the insurance company occurs when the surplus
becomes negative. Therefore, the definition of the infinite time probability of ruin is
We define some other quantities needed in the sequel. Let N (t) denote the number of claims
up to time t ≥ 0 and Cj denote the amount of claim j. Then the definition of S(t) is analogous
to (16):
S(t) = C1 + · · · + CN (t) , (29)
assuming N (0) = 0 and S(t) = 0 as long as N (t) = 0. Furthermore, let Tj denote the
time when claim j occurs, such that T1 < T2 < T3 < . . . Then the random variable of the
interarrival (or wait) time between claim j − 1 and claim j is defined as W1 = T1 and
Wj = Tj − Tj−1 , j ≥ 2. (30)
2. the sequence {Tj }j≥1 forms an ordinary renewal process, with the consequence that
random variables W1 , W2 , . . . are independent and identically distributed;
ψ(u) ≤ e−ρu , u ≥ 0.
The adjustment coefficient is defined as the smallest strictly positive solution (if it exists) of
the Lundberg equation
h(t) = E[etC−tcW ] = 1, (31)
where the premium rate c satisfies the positive safety loading constraint E[C − cW ] < 0. If C
and W are independent, as in the most common models, then the equation can be rewritten
as
h(t) = MC (t)MW (−tc) = 1. (32)
Function adjCoef of actuar computes the adjustment coefficient ρ from the following argu-
ments: either the two moment generating functions MC (t) and MW (t) (thereby assuming
independence) or else function h(t); the premium rate c; the upper bound of the support of
MC (t) or any other upper bound for ρ.
For example, if W and C are independent, W ∼ Exponential(2), C ∼ Exponential(1) and
the premium rate is c = 2.4 (for a safety loading of 20% using the expected value premium
principle), then the adjustment coefficient is
22 actuar: An R Package for Actuarial Science
[1] 0.1667
The function also supports models with proportional or excess-of-loss reinsurance (Centeno
2002). Under the first type of treaty, an insurer pays a proportion α of every loss and the
rest is paid by the reinsurer. Then, for fixed α the adjustment coefficient is the solution of
Under an excess-of-loss treaty, the primary insurer pays each claim up to a limit L. Again,
for fixed L, the adjustment coefficient is the solution of
For models with reinsurance, adjCoef returns an object of class "adjCoef" inheriting from
the "function" class. One can then use the object to compute the adjustment coefficient
for any retention rate α or retention limit L. The package also defines a method of plot for
these objects.
For example, using the same assumptions as above with proportional reinsurance and a 30%
safety loading for the reinsurer, the adjustment coefficient as a function of α ∈ [0, 1] is (see
Figure 7 for the graph):
R> plot(rho)
Adjustment Coefficient
Proportional reinsurance
0.20
0.15
0.10
R(x)
0.05
0.00
Fortunately, phase-type distributions have come to the rescue since the early 1990s. Asmussen
and Rolski (1991) first show that in the classical Cramér–Lundberg model where interarrival
times are Exponential(λ) distributed, if claim amounts are Phase-type(π, T ) distributed, then
ψ(u) = 1 − F (u), where F is Phase-type(π + , Q) with
λ
π + = − πT −1
c (36)
Q = T + tπ + ,
with t = −T e as in Section 2.1.
In the more general Sparre Andersen model where interarrival times can have any Phase-
type(ν, S) distribution, Asmussen and Rolski (1991) also show that using the same claim
severity assumption as above, one still has ψ(u) = 1 − F (u) where F is Phase-type(π + , Q),
but with parameters
e0 (Q − T )
π+ = (37)
ce0 t
and Q solution of
Q = Ψ(Q)
(38)
= T − tπ (I n ⊗ ν)(Q ⊕ S)−1 (I n ⊗ s) .
In the above, s = −Se, I n is the n × n identity matrix, ⊗ denotes the usual Kronecker
product between two matrices and ⊕ is the Kronecker sum defined as
Am×m ⊕ B n×n = A ⊗ I n + B ⊗ I m . (39)
24 actuar: An R Package for Actuarial Science
Function ruin of actuar returns a function object of class "ruin" to compute the probability
of ruin for any initial surplus u. In all cases except the exponential/exponential model where
(35) is used, the output object calls function pphtype to compute the ruin probabilities.
Some thought went into the interface of ruin. Obviously, all models can be specified using
phase-type distributions, but the authors wanted users to have easy access to the most com-
mon models involving exponential and Erlang distributions. Hence, one first states the claim
amount and interarrival times models with any combination of "exponential", "Erlang" and
"phase-type". Then, one passes the parameters of each model using lists with components
named after the corresponding parameters of dexp, dgamma and dphtype. If a component
"weights" is found in a list, the model is a mixture of exponential or Erlang (mixtures of
phase-type are not supported). Every component of the parameter lists is recycled as needed.
The following examples should make the matter clearer. (All examples use c = 1, the default
value in ruin.) First, for the exponential/exponential model, one has
R> psi(0:10)
Second, for a model with mixture of two exponentials claim amounts and exponential inter-
arrival times, the simplest call to ruin is
Finally, one will obtain a function to compute ruin probabilities in a model with phase-type
claim amounts and mixture of exponentials interarrival times with
To ease plotting of the probability of ruin function, the package provides a method of plot
for objects returned by ruin that is a simple wrapper for curve (see Figure 8):
R> psi <- ruin(claims = "p", par.claims = list(prob = prob, rates = rates),
+ wait = "e", par.wait = list(rate = c(5, 1), weights = c(0.4, 0.6)))
R> plot(psi, from = 0, to = 50)
The random variables Φi , Λij , Ψi and Θij are generally seen as risk parameters in the actuarial
literature. The wijt s are known weights, or volumes. Using as convention to number the data
level 0, the above is a two-level hierarchical model.
Goulet and Pouliot (2008) describe in detail the model specification method used in simul.
For the sake of completeness, we briefly outline this method here.
A hierarchical model is completely specified by the number of nodes at each level (I, J1 , . . . , JI
and n11 , . . . , nIJ , above) and by the probability laws at each level. The number of nodes is
26 actuar: An R Package for Actuarial Science
Probability of Ruin
0.8
0.6
ψ(u)
0.4
0.2
0.0
0 10 20 30 40 50
passed to simul by means of a named list where each element is a vector of the number of
nodes at a given level. Vectors are recycled when the number of nodes is the same throughout
a level. Probability models are expressed in a semi-symbolic fashion using an object of mode
"expression". Each element of the object must be named — with names matching those
of the number of nodes list — and should be a complete call to an existing random number
generation function, with the number of variates omitted. Hierarchical models are achieved
by replacing one or more parameters of a distribution at a given level by any combination of
the names of the levels above. If no mixing is to take place at a level, the model for this level
can be NULL.
Function simul also supports usage of weights in models. These usually modify the frequency
parameters to take into account the“size”of an entity. Weights are used in simulation wherever
the name weights appears in a model.
Hence, function simul has four main arguments: 1) nodes for the number of nodes list;
2) model.freq for the frequency model; 3) model.sev for the severity model; 4) weights for
the vector of weights in lexicographic order, that is all weights of entity 1, then all weights of
entity 2, and so on.
For example, assuming that I = 2, J1 = 4, J2 = 3, n11 = · · · = n14 = 4 and n21 = n22 = n23 =
5 in model (41) above, and that weights are simply simulated from a uniform distribution on
(0.5, 2.5), then simulation of a data set with simul is achieved with:
The function returns the variates in a two-dimension list of class "portfolio" containing
all the individual claim amounts for each entity. Such an object can be seen as a three-
dimensional array with a third dimension of potentially varying length. The function also
returns a matrix of integers giving the classification indexes of each entity in the portfolio
(subscripts i and j in the notation above). Displaying the complete content of the object
returned by simul can be impractical. For this reason, the print method for this class only
prints the simulation model and the number of claims in each node:
R> pf
Frequency model
cohort ~ rexp(2)
entity ~ rgamma(cohort, 1)
year ~ rpois(weights * entity)
Severity model
cohort ~ rnorm(2, sqrt(0.1))
entity ~ rnorm(cohort, 1)
year ~ rlnorm(entity, 1)
The package defines methods for four generic functions to easily access key quantities of the
simulated portfolio.
1. By default, the method of aggregate returns the values of aggregate claim amounts
Sijt in a regular matrix (subscripts i and j in the rows, subscript t in the columns). The
method has a by argument to get statistics for other groupings and a FUN argument to
get statistics other than the sum:
28 actuar: An R Package for Actuarial Science
R> aggregate(pf)
2. The method of frequency returns the number of claims Nijt . It is a wrapper for
aggregate with the default sum statistic replaced by length. Hence, arguments by and
FUN remain available:
R> frequency(pf)
cohort freq
[1,] 1 17
[2,] 2 16
3. The method of severity (a generic function introduced by the package) returns the
individual claim amounts Cijtu in a matrix similar to those above, but with a number
of columns equal to the maximum number of observations per entity,
nij
X
max Nijt .
i,j
t=1
Thus, the original period of observation (subscript t) and the identifier of the severity
within the period (subscript u) are lost and each variate now constitutes a “period” of
observation. For this reason, the method provides an argument splitcol in case one
would like to extract separately the individual claim amounts of one or more periods:
Journal of Statistical Software 29
R> severity(pf)
$main
cohort entity claim.1 claim.2 claim.3 claim.4 claim.5 claim.6
[1,] 1 1 7.974 23.401 3.153 4.368 11.383 NA
[2,] 1 2 NA NA NA NA NA NA
[3,] 1 3 3.817 41.979 26.910 4.903 19.078 NA
[4,] 1 4 98.130 50.622 55.705 NA NA NA
[5,] 2 1 11.793 2.253 2.397 9.472 1.004 NA
[6,] 2 2 NA NA NA NA NA NA
[7,] 2 3 14.322 11.522 18.966 33.108 15.532 14.99
claim.7 claim.8 claim.9 claim.10 claim.11
[1,] NA NA NA NA NA
[2,] NA NA NA NA NA
[3,] NA NA NA NA NA
[4,] NA NA NA NA NA
[5,] NA NA NA NA NA
[6,] NA NA NA NA NA
[7,] 25.11 40.15 17.44 4.426 10.16
$split
NULL
$main
cohort entity claim.1 claim.2 claim.3 claim.4 claim.5 claim.6
[1,] 1 1 3.153 4.368 11.383 NA NA NA
[2,] 1 2 NA NA NA NA NA NA
[3,] 1 3 3.817 41.979 26.910 4.903 19.078 NA
[4,] 1 4 98.130 50.622 55.705 NA NA NA
[5,] 2 1 11.793 2.253 2.397 9.472 1.004 NA
[6,] 2 2 NA NA NA NA NA NA
[7,] 2 3 33.108 15.532 14.990 25.107 40.150 17.44
claim.7 claim.8
[1,] NA NA
[2,] NA NA
[3,] NA NA
[4,] NA NA
[5,] NA NA
[6,] NA NA
[7,] 4.426 10.16
$split
cohort entity claim.1 claim.2 claim.3
[1,] 1 1 7.974 23.40 NA
[2,] 1 2 NA NA NA
30 actuar: An R Package for Actuarial Science
[3,] 1 3 NA NA NA
[4,] 1 4 NA NA NA
[5,] 2 1 NA NA NA
[6,] 2 2 NA NA NA
[7,] 2 3 14.322 11.52 18.97
4. The method of weights extracts the weights matrix from a simulated data set:
R> weights(pf)
In addition, all methods have a classification and a prefix argument. When the first is
FALSE, the classification index columns are omitted from the result. The second argument
overrides the default column name prefix; see the simul.summaries help page for details.
Function simul was used to simulate the data in Forgues, Goulet, and Lu (2006).
5. Credibility theory
Credibility models are actuarial tools to distribute premiums fairly among a heterogeneous
group of policyholders (henceforth called entities). More generally, they can be seen as pre-
diction methods applicable in any setting where repeated measures are made for subjects with
different risk levels.
The credibility theory facilities of actuar consist of the matrix hachemeister containing the
famous data set of Hachemeister (1975) and the function cm to fit credibility models.
Function cm acts as a unified interface for all credibility models supported by the package. Cur-
rently, these are the unidimensional models of Bühlmann (1969) and Bühlmann and Straub
(1970), the hierarchical model of Jewell (1975) (of which the first two are special cases) and
the regression model of Hachemeister (1975). The modular design of cm makes it easy to add
new models if desired.
This subsection concentrates on usage of cm for hierarchical models.
There are some variations in the formulas of the hierarchical model in the literature. We
compute the credibility premiums as given in Bühlmann and Jewell (1987) or Bühlmann
and Gisler (2005). We support three types of estimators of the between variance structure
parameters: the unbiased estimators of Bühlmann and Gisler (2005) (the default), the slightly
different version of Ohlsson (2005) and the iterative pseudo-estimators as found in Goovaerts
and Hoogstad (1987) or Goulet (1998). See Belhadj, Goulet, and Ouellet (2008) for further
discussion on this topic.
The credibility modeling function assumes that data is available in the format most practical
applications would use, namely a rectangular array (matrix or data frame) with entity obser-
vations in the rows and with one or more classification index columns (numeric or character).
One will recognize the output format of simul and its summary methods.
Then, function cm works much the same as lm. It takes in argument: a formula of the
form ~ terms describing the hierarchical interactions in a data set; the data set containing
the variables referenced in the formula; the names of the columns where the ratios and the
weights are to be found in the data set. The latter should contain at least two nodes in
each level and more than one period of experience for at least one entity. Missing values are
represented by NAs. There can be entities with no experience (complete lines of NAs).
In order to give an easily reproducible example, we group states 1 and 3 of the Hachemeister
data set into one cohort and states 2, 4 and 5 into another. This shows that data does not
have to be sorted by level. The fitted model using the iterative estimators is:
R> X <- cbind(cohort = c(1, 2, 1, 2, 2), hachemeister)
R> fit <- cm(~cohort + cohort:state, data = X, ratios = ratio.1:ratio.12,
+ weights = weight.1:weight.12, method = "iterative")
R> fit
Call:
cm(formula = ~cohort + cohort:state, data = X, ratios = ratio.1:ratio.12,
weights = weight.1:weight.12, method = "iterative")
The function returns a fitted model object of class "cm" containing the estimators of the
structure parameters. To compute the credibility premiums, one calls a method of predict
for this class:
32 actuar: An R Package for Actuarial Science
R> predict(fit)
$cohort
[1] 1949 1543
$state
[1] 2048 1524 1875 1497 1585
One can also obtain a nicely formatted view of the most important results with a call to
summary:
R> summary(fit)
Call:
cm(formula = ~cohort + cohort:state, data = X, ratios = ratio.1:ratio.12,
weights = weight.1:weight.12, method = "iterative")
Detailed premiums
Level: cohort
cohort Indiv. mean Weight Cred. factor Cred. premium
1 1967 1.407 0.9196 1949
2 1528 1.596 0.9284 1543
Level: state
cohort state Indiv. mean Weight Cred. factor Cred. premium
1 1 2061 100155 0.8874 2048
2 2 1511 19895 0.6103 1524
1 3 1806 13735 0.5195 1875
2 4 1353 4152 0.2463 1497
2 5 1600 36110 0.7398 1585
The methods of predict and summary can both report for a subset of the levels by means of
an argument levels. For example:
Call:
cm(formula = ~cohort + cohort:state, data = X, ratios = ratio.1:ratio.12,
Journal of Statistical Software 33
Detailed premiums
Level: cohort
cohort Indiv. mean Weight Cred. factor Cred. premium
1 1967 1.407 0.9196 1949
2 1528 1.596 0.9284 1543
$cohort
[1] 1949 1543
The results above differ from those of Goovaerts and Hoogstad (1987) for the same example
because the formulas for the credibility premiums are different.
Xit = β0 + β1 (12 − t) + εt , t = 1, . . . , 12
Call:
cm(formula = ~state, data = hachemeister, ratios = ratio.1:ratio.12,
weights = weight.1:weight.12, xreg = 12:1)
Computing the credibility premiums requires to give the “future” values of the regressors as
in predict.lm, although with a simplified syntax for the one regressor case:
6. Documentation
In addition to the help pages required by the R packaging system, the package includes
vignettes and demonstration scripts; running
and
7. Conclusion
The paper presented the facilities of the R package actuar in the fields of loss distribution
modeling, risk theory, simulation of compound hierarchical models and credibility theory. We
feel this version of the package covers most of the basic needs in these areas. In the future we
plan to improve the functions currently available and to start adding more advanced features.
For example, future versions of the package should include support for dependence models in
risk theory and better handling of regression credibility models.
Obviously, the package left many other fields of Actuarial Science untouched. For this situa-
tion to change, we hope that experts in their field will join their efforts to ours and contribute
code to the actuar project. The project will continue to grow and to improve by and for the
community of developers and users.
Finally, if you use R or actuar for actuarial analysis, please cite the software in publications.
Use
Journal of Statistical Software 35
R> citation()
and
R> citation("actuar")
Acknowledgments
The package would not be at this stage of development without the stimulating contribution
of Sébastien Auclair, Louis-Philippe Pouliot and Tommy Ouellet.
This research benefited from financial support from the Natural Sciences and Engineering
Research Council of Canada and from the Chaire d’actuariat (Actuarial Science Chair) of
Université Laval.
Finally, the authors thank two anonymous referees for many improvements to the paper.
References
Bates D, Maechler M (2008). Matrix: A Matrix package for R. R package version 0.999375-7,
URL http://CRAN.R-project.org/package=Matrix.
Bühlmann H, Gisler A (2005). A Course in Credibility Theory and its Applications. Springer-
Verlag. ISBN 3-5402575-3-5.
Centeno MdL (2002). “Measuring the Effects of Reinsurance by the Adjustment Coefficient
in the Sparre-Anderson Model.” Insurance: Mathematics and Economics, 30, 37–49.
Daykin C, Pentikäinen T, Pesonen M (1994). Practical Risk Theory for Actuaries. Chapman
& Hall, London. ISBN 0-4124285-0-4.
Hogg RV, Klugman SA (1984). Loss Distributions. John Wiley & Sons, New York. ISBN
0-4718792-9-0.
Jewell WS (1975). “The Use of Collateral Data in Credibility Theory: A Hierarchical Model.”
Giornale dell’Istituto Italiano degli Attuari, 38, 1–16.
Kaas R, Goovaerts M, Dhaene J, Denuit M (2001). Modern Actuarial Risk Theory. Kluwer
Academic Publishers, Dordrecht. ISBN 0-7923763-6-6.
Klugman SA, Panjer HH, Willmot G (1998). Loss Models: From Data to Decisions. John
Wiley & Sons, New York. ISBN 0-4712388-4-8.
Klugman SA, Panjer HH, Willmot G (2004). Loss Models: From Data to Decisions. John
Wiley & Sons, New York, second edition. ISBN 0-4712157-7-5.
Pinheiro J, Bates D, DebRoy S, Sarkar D, the R Development Core Team (2007). nlme:
Linear and Nonlinear Mixed Effects Models. R package version 3.1-86, URL http://CRAN.
R-project.org/package=nlme.
Journal of Statistical Software 37
R Development Core Team (2008). R: A Language and Environment for Statistical Computing.
R Foundation for Statistical Computing, Vienna, Austria. URL http://www.R-project.
org/.
Venables WN, Ripley BD (2002). Modern Applied Statistics with S. Springer-Verlag, New
York, 4 edition. ISBN 0-3879545-7-0.
Yan J (2007). “Enjoy the Joy of Copulas: With a Package copula.” Journal of Statistical
Software, 21(4), 1–21. URL http://www.jstatsoft.org/v21/i04/.
Affiliation:
Vincent Goulet
École d’actuariat
Pavillon Alexandre-Vachon
1045, avenue de la Médecine
Université Laval
Québec (QC) G1V 0A6, Canada
E-mail: [email protected]
URL: http://vgoulet.act.ulaval.ca/