Assignment 1 2020

This document provides instructions for Assignment 1, which is worth 100 total marks and 20% of the student's grade. It must be submitted by April 14th at 11:55pm as a single PDF file. The assignment involves analyzing employment data for 30 countries using R and interpreting the results. It consists of 4 questions worth various point totals. Question 1 involves testing for multivariate normality of the data and exploring univariate and multivariate plots. Question 2 examines differences in employment between regions using MANOVA. Question 3 uses principal component analysis to reduce the variables. Question 4 performs a factor analysis.

Uploaded by

Babi Feed

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

49 views

Assignment 1 2020

Uploaded by

Babi Feed

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 5

Assignment 1

Total Marks: 100; Weighting: 20%

Due: 14/4/20 11.55pm

Instructions:

 Submit only one file in pdf format to the link on the Study Desk.
 Assume that your report will be read by someone familiar with the data set but
with limited statistical knowledge. Fully explain plots and when stating statistics
or results explain what they mean statistically AND in context of the data.
 Presentation should be neat, consistent, spell-checked and proof read. All
questions should be clearly labelled, and all answers should clearly and concisely
address the questions.
 If you convert a Word document to pdf for submission check that all symbols,
equations etc. have converted correctly, i.e., proof-read your work.
 All answers must be typed – do not include handwritten/scanned or stylus/tablet
written responses in your document.
 If you do not use knitr to compile your submission, where asked to provide R
code, paste relevant code within the assignment document and italicise (or
otherwise highlight or distinguish from other content). Do not include code in an
appendix.
 Do not include an appendix at all. Any work included in an appendix will not be
marked.
 Please note that referencing text books and other resources is not the goal of this
assessment. This work requires students to demonstrate their understanding of
the analysis and interpretation, not provide quotes from resources.
 When interpreting output, you are expected to do so in context of the data and
the method (i.e. ensure you comment on aspects of the method that affect your
interpretation with the respect to the variables and sample).
 A maximum of 10 marks will be deducted from your total marks for poor
presentation.

Marks:
 Question 1: 25
 Question 2: 20
 Question 3: 30
 Question 4: 25

Page 1 of 5
Data File:
The same data set will be used for all four questions in this assignment.
The data file ‘europegroup.txt’ contains data for the percentage of employment by
country (n=30 countries). The first variable identifies the region of the country (Group)
and the next nine variables represent different employment sectors: AGR=agriculture,
forests and fishing; MIN=mining; MAN=manufacturing; PS=power and water supplies;
CON=construction; SER=services; FIN=finance; SPS=social and personal services;
TC=transport and communication. Although you may not find these data to be MVN in
Question 1, you should proceed with all analysis requested in Questions 2 to 4 assuming
MVN, and comment on this limitation where relevant.

Question 1 (25 marks):

Provide R code, output and written interpretation for parts a) to d) and part g) of this
question. Provide only output that is directly relevant to address each section.
Test for multivariate normality (MVN) by:

a) Describe the structure of the ‘europegroup.txt data. (1 mark total)

b) Produce (2 marks) and interpret (2 marks) univariate QQ plots and histograms and
univariate Shapiro-Wilks tests of normality for each of the nine employment
variables. Which are the most non-normally distributed variables (1 mark)? (5 marks
total)
c) Produce (1 mark) and interpret (1 mark) perspective and contour plots for the SER
and FIN variables. What is an inherent problem with using these plots to assess MVN
(1 mark)? (3 marks total)
d) Perform the analysis necessary to provide the results of the Mardia, Henze-Zirkler
and Royston tests of MVN based on all nine employment variables. Include in your
interpretation: (8 marks total)
i. The Chi-Square QQ plot (1 mark) and interpretation (1 mark)
ii. Describe how the QQ plot is constructed and its relationship to the univariate
normal QQ plots (2 marks).
iii. Output and interpretation for the 3 tests (3 marks).

iv. What is a key limitation of these MVN statistical tests (1 mark)?

e) One way to try and meet the MVN assumption could be to remove some of the
variables from the multivariate analysis (do not perform this analysis). Suggest three
additional ways that you might improve univariate and multivariate normality for
data sets in general. (3 marks total)

Page 2 of 5
f) In part e) we suggested removing some variables to try and help the data approach
MVN. Suggest one other reason why reducing the number of variables used in
multivariate analysis may be important (this question does not relate specifically to
this particular data set)? (2 marks total)
g) In part e) we suggested removing some variables to try and help the data approach
MVN. Check to see if the data is MVN if only those variables that are univariate
normal (UVN) are used (1 mark). Is your result reasonable given your understanding
of the relationship between UVN and MVN (2 marks)? (3 marks total)

Question 2 (20 marks):

For all of question 2, do not use all nine employment variables – use only those 4
identified in Question 1 as UVN.
Provide R code, output and written interpretation for parts a) to e) of this question.

a) Produce a draftsman display for the employment variables. Use the function
scatterplotMatrix (from week 2) and check the help documentation
(?scatterplotMatrix) to help you produce a plot with observations grouped by regional
group using different colours and include the associated legend. Your plot should not
include smoothing, regression lines, or distribution curves in the diagonal panels of
the plot (1 mark). Interpret these plots, relating back to the original data where it
may add to the interpretation (2 marks). What are the y and x axes on plot [3,2] of
the draftsman plot (1 mark)? (4 marks total)
b) In the context of MANOVA, list the dependent and independent variables (1 mark)
and define the relationship that the MANOVA would test (1 mark). (2 marks total)
c) Using MANOVA in R, test for differences in ‘percentage of employment’ between the
four country regions. Include tests using all four test statistics covered in this course
(2 marks) and interpret output (3 marks). (5 marks total)
d) Explain how Wilk’s Lambda statistic is calculated and why a small statistic is likely to
indicate significant differences between at least some groups (2 marks). Which of the
four tests used in part c) would be the best to interpret if there are concerns about
multivariate normality or covariance equality (1 mark)? (3 marks total)
e) Produce output that specifically compares each of the regions (Group) with each
other (you should have 6 comparisons) using Hotelling’s T2 test and a significance
level of 0.05 (2 marks). Determine the multiple test corrected significance level (1
mark). Do not provide R output; instead reproduce and complete the following table
for all comparisons and interpret. What were the sample sizes for each region and

Page 3 of 5
how may sample sizes have affected these results and those in part c) (2 marks)?
Will deviation from MVN influence these results (1 mark)? (6 marks total)

Comparison Hotelling’s Significant Significant after

Region 1 Region 2 p-value (Y/N) correction (Y/N)

Question 3 (30 marks):

For all of question 3, do not use all nine employment variables – use only those 4
identified in Question 1 as UVN.
Provide R code, output and written interpretation for parts a) to e) of this question.

a) Produce (2 marks) and interpret (2 marks) the correlation and covariance matrices
(2 marks). Explain the difference between these matrices in detail (i.e. explain
clearly how the values are adjusted mathematically and the effect of these changes)
(2 marks). Would using the covariance matrix in PCA on this data be appropriate (1
mark)? Why (1 mark)? (8 marks total)
b) Perform PCA analysis on the 4 employment variables using the prcomp function.
Provide the eigenvalues (1 mark), %variation (1 mark) and scree plot (1 mark).
Interpret each of these results (3 marks) and discuss how they influence your
decision on how many PCs to interpret from this analysis (2 marks). Remember to
keep in mind the overall purpose of PCA (8 marks total).
c) Interpret (2 marks) the first PC. Include the Z equation (1 mark) and a plot of the
loadings on the first PC in your answer (1 mark). (4 marks total)
d) What is the correlation between the first and second PCs and what does this tell you?
(2 marks total)
e) Produce (1 mark) and interpret (2 marks) a biplot based on the first 2 PCs. In
particular, explain your interpretation of the employment variables in country 19
compared to country 9 (1 mark). Relate your interpretation back to the original data
(1 mark). (5 marks total)
f) Was this a useful analysis for this data set? Explain with specific reference to the
results of your prior analysis in this question. (3 marks total)

Page 4 of 5
Question 4 (25 marks):
For all of question 4, do not use all nine employment variables – use only those 4
identified in Question 1 as UVN.
Provide R code, output and written interpretation for parts a) to e) of this question.

a) Perform a Factor Analysis using the factanal function. Initially use the number of
components you identified as informative in Question 3 (do not use parallel analysis
to help inform your decision here) and apply no rotation. You will get an error
message. In order to problem solve this issue and make further decisions about your
analysis you will need to have read the additional notes available in the Week 6 block
on the Studydesk called “notes on df limiting number of factors.pdf”. Provide your
initial line of code, subsequent error message and your final line of code that
successfully performs the factanal analysis (2 marks). What did you need to change
and why (2 marks)? (4 marks total)
b) From your successful factanal analysis in part a) provide output and interpretation for
(8 marks total):
• Variance explained (2 marks)
• Chi-square test (2 marks)
• Variable loadings (2 marks)
• Difference in uniqueness values for the variables FIN and SPS (2 marks)
c) How would your results change if you applied a rotation? Explain your reasoning. (4
marks total)
d) Perform parallel analysis using a seed value of 245 and 500 iterations. Produce the
scree plot for the PC results only (1 mark). Discuss how many PC’s are recommended
by this analysis and use the plot to help you explain these results (2 marks). As part
of your explanation provide the values for the 95th percentile for components 1 and 2
(1 mark). (4 marks total)
e) Explain in your own words how the parallel analysis works. (5 marks total)

Page 5 of 5

Consolidated AIQB Reference Guide GLO en
100% (11)
Consolidated AIQB Reference Guide GLO en
18 pages
Assignment 3: Logistic Regression (Individual Submission)
0% (1)
Assignment 3: Logistic Regression (Individual Submission)
3 pages
Yamaha RX135 Wiring Diagram
89% (65)
Yamaha RX135 Wiring Diagram
1 page
CAT 2021 Junior Questions
100% (1)
CAT 2021 Junior Questions
7 pages
Ass 1 Unit 26 16-17
0% (1)
Ass 1 Unit 26 16-17
3 pages
Homework 3
No ratings yet
Homework 3
3 pages
Data Interpretation Guide For All Competitive and Admission Exams
From Everand
Data Interpretation Guide For All Competitive and Admission Exams
Mohmmad Khaja Shareef
2.5/5 (6)
Trigonometry
100% (1)
Trigonometry
7 pages
Cable Fault Location, Cable Thumpers Thumping, HV Fault Locators, 11kv 33kv, Surge Generators
No ratings yet
Cable Fault Location, Cable Thumpers Thumping, HV Fault Locators, 11kv 33kv, Surge Generators
3 pages
Spring 2023 Signature Assignment ADA MAT 152 With Rubric Excel Bonanno 2
No ratings yet
Spring 2023 Signature Assignment ADA MAT 152 With Rubric Excel Bonanno 2
4 pages
Assignment 2 2020
No ratings yet
Assignment 2 2020
6 pages
IY461 Statistical Report Instructions Jan22
No ratings yet
IY461 Statistical Report Instructions Jan22
2 pages
Final Exam 2017
No ratings yet
Final Exam 2017
20 pages
Usl 70 Marks Set 1
No ratings yet
Usl 70 Marks Set 1
2 pages
fixed-random-trắc-nghiệm-tự-luận (1)
No ratings yet
fixed-random-trắc-nghiệm-tự-luận (1)
12 pages
Econometrics I: Problem Set II: Prof. Nicolas Berman November 30, 2018
No ratings yet
Econometrics I: Problem Set II: Prof. Nicolas Berman November 30, 2018
4 pages
HI6007 Group Assignment T3.2018
No ratings yet
HI6007 Group Assignment T3.2018
4 pages
STAT 31631 – Statistical Modeling_Assignment01
No ratings yet
STAT 31631 – Statistical Modeling_Assignment01
2 pages
R Project Task 2023-1
No ratings yet
R Project Task 2023-1
1 page
FE220 APR24 L0101 L0201 Sup
No ratings yet
FE220 APR24 L0101 L0201 Sup
23 pages
Midterm2
No ratings yet
Midterm2
4 pages
Exam 10
No ratings yet
Exam 10
18 pages
Ass1 Fin5eme Sem2 2019 Final
No ratings yet
Ass1 Fin5eme Sem2 2019 Final
5 pages
Assignment 2 2020
No ratings yet
Assignment 2 2020
3 pages
DOE in Minitab
No ratings yet
DOE in Minitab
35 pages
CS-30004(DSA)-CS_END_NOV_2024
No ratings yet
CS-30004(DSA)-CS_END_NOV_2024
17 pages
Assignment 3 LSM
No ratings yet
Assignment 3 LSM
3 pages
Taguchi's Design of Experiments and Selection of Orthogonal Array
No ratings yet
Taguchi's Design of Experiments and Selection of Orthogonal Array
22 pages
ECON1203 Exam 10 S 2
0% (1)
ECON1203 Exam 10 S 2
13 pages
March 8, 2023
100% (1)
March 8, 2023
32 pages
Project Questions
No ratings yet
Project Questions
3 pages
WRITING TASK 1 A 1
No ratings yet
WRITING TASK 1 A 1
20 pages
Assignment 1
No ratings yet
Assignment 1
2 pages
15ECSC701 - 576 - KLE47-15Ecsc701-set1 Cse Paper
No ratings yet
15ECSC701 - 576 - KLE47-15Ecsc701-set1 Cse Paper
5 pages
Linear Algebra Course Project
No ratings yet
Linear Algebra Course Project
7 pages
Midterm: Mathematics For Engineers (W2021)
No ratings yet
Midterm: Mathematics For Engineers (W2021)
5 pages
Review Questions DS
No ratings yet
Review Questions DS
14 pages
Krysh Rajendran: Final Exam, MA544 Data Visualization
No ratings yet
Krysh Rajendran: Final Exam, MA544 Data Visualization
8 pages
Econometrics Trial exam 1
No ratings yet
Econometrics Trial exam 1
15 pages
Assignment 1 specification_T1_2023_COIT12209
No ratings yet
Assignment 1 specification_T1_2023_COIT12209
3 pages
Assignment 2
No ratings yet
Assignment 2
3 pages
Assignment 2
No ratings yet
Assignment 2
2 pages
Chapter 4 Demand Estimation
50% (2)
Chapter 4 Demand Estimation
8 pages
Bana 3010 assignment 5
No ratings yet
Bana 3010 assignment 5
5 pages
Final Exam 2018
No ratings yet
Final Exam 2018
22 pages
Machine Learning Project: Sneha Sharma PGPDSBA Mar'21 Group 2
100% (4)
Machine Learning Project: Sneha Sharma PGPDSBA Mar'21 Group 2
36 pages
MG - Software Development L4 - Algorithm Fundamentals
No ratings yet
MG - Software Development L4 - Algorithm Fundamentals
21 pages
R Group Task Reg
No ratings yet
R Group Task Reg
1 page
Question Paper CBSE CLASS 12th Mathematics Sample Paper 2012-13-10
No ratings yet
Question Paper CBSE CLASS 12th Mathematics Sample Paper 2012-13-10
21 pages
PDF 2-1
No ratings yet
PDF 2-1
5 pages
Institute of Rural Management Anand: End-Term Examination (Openbook)
No ratings yet
Institute of Rural Management Anand: End-Term Examination (Openbook)
4 pages
DBAS MS Spring 2021 SAMPLE 2
No ratings yet
DBAS MS Spring 2021 SAMPLE 2
13 pages
Assignment Sta3053 - Sta301 1121
No ratings yet
Assignment Sta3053 - Sta301 1121
5 pages
MIE1727 2023 Assignment 1
No ratings yet
MIE1727 2023 Assignment 1
4 pages
ECON+1274 1248 Project 2023
No ratings yet
ECON+1274 1248 Project 2023
4 pages
Hmw 09
No ratings yet
Hmw 09
1 page
Tenn Algebra 2 EOC Practice Workbook
No ratings yet
Tenn Algebra 2 EOC Practice Workbook
128 pages
SEMESTER I, 20152016 Midterm
No ratings yet
SEMESTER I, 20152016 Midterm
6 pages
Ch26 Exercises
No ratings yet
Ch26 Exercises
14 pages
IY461 Mid-Term Assignment Topic - Autumn 2020
No ratings yet
IY461 Mid-Term Assignment Topic - Autumn 2020
2 pages
A1 Tutorial Problems QE1 2015-16
100% (1)
A1 Tutorial Problems QE1 2015-16
11 pages
Wa0002.
No ratings yet
Wa0002.
14 pages
Important Instructions To Examiners:: (Any Four Keywords Mark Each)
No ratings yet
Important Instructions To Examiners:: (Any Four Keywords Mark Each)
26 pages
Multi-dimensional Monte Carlo Integrations Utilizing Mathematica
From Everand
Multi-dimensional Monte Carlo Integrations Utilizing Mathematica
SUJAUL CHOWDHURY
No ratings yet
AP Computer Science Principles: Student-Crafted Practice Tests For Excellence
From Everand
AP Computer Science Principles: Student-Crafted Practice Tests For Excellence
Sama Alshatali
No ratings yet
Teachers Attitude To Pronunciation
No ratings yet
Teachers Attitude To Pronunciation
17 pages
3 On Ideology and Interpretation: The Debate With Hans-Georg Gadamer
No ratings yet
3 On Ideology and Interpretation: The Debate With Hans-Georg Gadamer
19 pages
PSDE1624 Main Exam Scope 2023
No ratings yet
PSDE1624 Main Exam Scope 2023
4 pages
امتحانات الثانوية العامة من 2011 ل 2018
No ratings yet
امتحانات الثانوية العامة من 2011 ل 2018
61 pages
Arasmith 24 Rotary Knife Grinder Operation & Main Manual 2022
No ratings yet
Arasmith 24 Rotary Knife Grinder Operation & Main Manual 2022
87 pages
Intel® XMP-Ready - Extreme Memory Profiles For Intel® Core™ Processors
No ratings yet
Intel® XMP-Ready - Extreme Memory Profiles For Intel® Core™ Processors
15 pages
Lecture 02 - Gates - K-Map (Done)
No ratings yet
Lecture 02 - Gates - K-Map (Done)
59 pages
Marketing Aspect
No ratings yet
Marketing Aspect
19 pages
迷思概念研究
No ratings yet
迷思概念研究
179 pages
Be An Upstander, Not A Bystander
No ratings yet
Be An Upstander, Not A Bystander
2 pages
Placement Exam
100% (2)
Placement Exam
2 pages
DLL in Math 5
100% (1)
DLL in Math 5
8 pages
Android Application Development
No ratings yet
Android Application Development
23 pages
6417 CBC 741 DC 4
No ratings yet
6417 CBC 741 DC 4
13 pages
16 12534 Content Upload 1688645343-MCR+Qualitative+-+Final+Exam
No ratings yet
16 12534 Content Upload 1688645343-MCR+Qualitative+-+Final+Exam
10 pages
HSEQ-SESI-QSOP-203 Rig Up - Rig Down Wellhead Isolation Tool
No ratings yet
HSEQ-SESI-QSOP-203 Rig Up - Rig Down Wellhead Isolation Tool
6 pages
Basics User Guide
No ratings yet
Basics User Guide
4 pages
Chapter 08,9,18 Partial Fractions, Binomial Expansion, Maclaurin Series Solutions PDF
No ratings yet
Chapter 08,9,18 Partial Fractions, Binomial Expansion, Maclaurin Series Solutions PDF
3 pages
JST Series Factsheet & Datasheet
No ratings yet
JST Series Factsheet & Datasheet
4 pages
Alarm List
No ratings yet
Alarm List
4 pages
FBG Sensors 1
No ratings yet
FBG Sensors 1
2 pages
Star Dist Dir 2017 18 1 PDF
50% (2)
Star Dist Dir 2017 18 1 PDF
304 pages
eCL8000 LIS Protocol A0
100% (1)
eCL8000 LIS Protocol A0
15 pages
Syllabus of Civil Engineering
No ratings yet
Syllabus of Civil Engineering
23 pages
2024 GKS-G Available Departments
No ratings yet
2024 GKS-G Available Departments
2 pages