0% found this document useful (0 votes)
23 views

GLM Assign

The document discusses fitting generalized linear models to relate various factors to insurance claim numbers and passenger numbers on buses. It provides examples of R code to load and prepare data, fit Poisson and logit GLM models, and interpret model outputs and residuals. The tasks involve selecting appropriate error structures, building and comparing models, and making predictions based on the models.

Uploaded by

Akshita Jain
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
23 views

GLM Assign

The document discusses fitting generalized linear models to relate various factors to insurance claim numbers and passenger numbers on buses. It provides examples of R code to load and prepare data, fit Poisson and logit GLM models, and interpret model outputs and residuals. The tasks involve selecting appropriate error structures, building and comparing models, and making predictions based on the models.

Uploaded by

Akshita Jain
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 3

CHAPTER 12 ASSIGNMENT PAPER B

Question 1

Refer to the data file “Indices_Returns.csv” and answer the following questions:
Indices_Returns.csv file is provided in the system

(i) Load the csv file into R and create a new column called “Sensex_Direction”. The value of this column will be “Positive” when
the Sensex returns are positive and “Negative” when they are negative and convert the variable as a factor variable

(ii) Fit an appropriate generalized linear model (GLM) to with a ‘logit’ link function to relate the “Sensex_Direction” with the
returns of 10 sectors as a multivariate model and display the summary of the model.

(iii) Identify which sectors have significantly impacted the direction of Sensex returns at 95% and 99% confidence level.

(v) Plot the residuals of the fitted model and identify which month is the most significant outlier in the residuals.

(vi) Comment on the appropriateness of the model fitted.

(vii) Your actuarial friend has suggested that the current model can be improved by removing the variables which do not impact
the direction of Sensex returns at 95% confidence level and refitting the GLM with ‘logit’ link function.

(viii) Update the model fitted in (ii) above, as suggested by your friend and display the summary of the model.

(ix) Compare the models in (ii) and (vii) using an appropriate test and comment on the difference in the residual deviances
between the two models.

Question 2
A general insurance company is building a generalised linear model (GLM) to analyse claim numbers for a
motor insurance policy. For every policy in the past three years, the company has collected the number of
reported claims and the following data:
Age: Age of policyholder, a number between 18 and 100.
Car group: A code representing the car group, a value between 1 and 20.
Area: A description of the area where the policyholder lives.
No claim discount: A number representing the level of no claim discount, a value between 0 and 5.
Gender: Gender of the policyholder (male or female).
The data set has been loaded into the session for you already as “datatrain.csv”. Typing the name of the variable
will let you see this.

(i) Explain what error structure could be used in this GLM, including in your answer the R code used to justify
your choice. [5]

(ii) Fit a GLM that treats Age as linear factor and all other four factors as categorical variables. Your answer
should show the coefficient, standard error, and p-value of each parameter estimate in the model. [10]

(iii) Describe the association between gender and the number of reported claims based on your output from part
(ii). Your answer should include a numerical comparison of this association between male and female
policyholders. [10]

(iv) Comment on the fit of the model fitted in part (ii), based on the deviance value of the
model, with reference to the suitability of the model. [10]
The company considers using a more complex model, including a power transformation of factor age.

(v) (a) Add a variable representing the power of age squared to the data.
(b) Fit an appropriate model including age squared as an explanatory variable in addition to the variables used in
part (ii).
CHAPTER 12 ASSIGNMENT PAPER B

(c) Comment on whether age squared is associated with the number of reported claims and on whether its
inclusion improves the model fitted in part (ii), based on your output from part (v)(b). [20]

Question 3
A statistician wants to model the number of passengers boarding a bus from a bus stop close to a student
residential area. They can think of three explanatory variables: which route it is (at 8 am or 9 am), if it is during
the semester or not, and the temperature (temp) in degrees Celsius. The statistician has data for 20 days, given in
the file named CS1passenger.RData, and believes that the response variable (Passengers) follows a Poisson
distribution. After loading the data into R, the data frame data_passenger with all variables (Passengers, route,
semester, temp) will be available.
(i) State the linear predictor corresponding to models specified with the following R code, explaining all relevant
terms:
(a) temp+semester
(b) temp*semester
(c) temp*semester + route [6]

(ii) (a) Fit a Poisson Generalised Linear Model (GLM) to the data set for the model in part (i)(c). Label this
model as Model1. Your answer should include a summary of the fitted model.
(b) Comment on the significance of the parameters of the model fitted in part (ii)(a). [6]

(iii) (a) Fit an improved model for the model in part (ii)(a), using your answer in part (ii)(b). Label this model as
Model2.
(b) Justify why Model2 improves Model1 by referring to the R output. [4]
You are given a new model (Model3), specified by the following R code:
Model3 <- glm(Passengers~temp+temp:semester,family=poisson(link="log"))
(iv) (a) Demonstrate that Model3 outperforms the models defined in parts (i)(a) and (i)(b). [8]
(b) Comment on your answer in part (iv)(a). [1]
(v) (a) Draw a suitable plot, for the residuals of Model3, for checking the model’s validity.
(b) Comment on the plot in part (v)(a). [6]
(vi) Calculate the predicted number of passengers for an 8 am route during the semester at a temperature of 0
degree Celsius, using Model3. [4] [Total 35]
CHAPTER 12 ASSIGNMENT PAPER B

Question 4: The data given in the file policies_data.RData show the numbers of policies (n.policies) by sex of
policyholder (sex.code; 1 for male, 2 for female) and class of business (class.code; 5 different classes) from a certain
insurance portfolio.

(i) (a) Construct a plot of the logarithm of the number of policies (on the y axis) against the class of
business.
(b) Comment on the relationship in the data based on your plot in part (i)(a). [5]

In the plot produced in part (i) we can distinguish between male and female policyholders. The plot is shown
below, with “M” and “F” showing male and female policyholders respectively:

(ii) Comment on the relationship in the data based on this plot. [2]

For the remainder of the question you will need to ensure that the sex and class variables are treated as
categorical variables (factors). You can use the following R code:
class.code = as.factor(class.code)
sex.code = as.factor(sex.code)

(iii) Fit a generalised linear model analysis to the data, using a Poisson distribution, with the numbers of
policies as the response variable and the class of business as the only factor. Your answer should include
estimates of the parameters, corresponding p-values and a brief interpretation of their effect. [8]

(iv) Fit a second Poisson generalised linear model analysis to the data, using the numbers of policies as the
response variable and both the class of business and the sex of the policyholders as factors. Your answer
should include estimates of the parameters, corresponding p-values and a brief interpretation of their effect.
[8]

(v) Determine, using the deviance, which of the two models used in parts (iii) and (iv) provides a better fit to
the data. Your answer should include the null hypothesis, the p-value of the relevant test and a clear
conclusion. [6]

(vi) Calculate the predicted number of policies for male policyholders when the class of business is 2, based
on the model chosen in part (v). [3]

You might also like