0% found this document useful (0 votes)

9 views

Problem Sheet Week 6

A collection of statistics problems provided to students under the University of Cambridge statistics course for first year undergrads.

Uploaded by

John Smith

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

9 views

Problem Sheet Week 6

A collection of statistics problems provided to students under the University of Cambridge statistics course for first year undergrads.

Uploaded by

John Smith

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 7

Statistics – 6th Week Michaelmas Term - Answers

Association between variables

This week we are looking at associations between variables – does the value of x depend on the
value of y? We will look at this question for both categorical variables (like sex or ethnicity) and
interval variables (like height or exam scores).

We are covering the following topics:

• Z-test for proportions
• Chi Squared test for association
• Spearman’s rank correlation
• Pearson’s correlation
• Covariance

Note – these exercises require you to use a computer spread sheet program such as Excel of a free
online spread sheet program like Google docs. They are not optional. Write up your answers and
hand them in. Where you are asked to calculate quantities such as the standard deviation, in your
writeup you should give the equation you used and the calculated value. Where you are asked to
make graphs, please copy and paste them into your writeup, or if writing up by hand, print them out
and hand in with your writeup.

Note – in many cases you will need to do repeated calculation on lots of data, for example getting
(x − x) for every prisoner in Exercise 2, or every country in exercise 3. You can do this using
formulae and functions in Excel.

If you have not learned to use equations and functions in a spread sheet in school, you need to learn
as this is a skill you need for your practical classes and also will be needed in many jobs in the Real
World.

Fear not, you can find simple instructions here:

http://www.excel-easy.com/introduction/formulas-functions.html
Conceptual questions

a) A researcher measures weight (in stone) and height (in inches) for men. She calculates the
correlation and covariance. She then decides to convert her data to metric units, kilograms
and centimetres. One kilogram is 0.157 stone and one centimetre is 0.39 inches. What will
happen to the correlation and covariance?

b) Spearman’s rank correlation coefficient can be used when the assumptions for Pearson’s
correlation are not met. For each of the two datasets shown, state why Pearson’s correlation
is unsuitable and explain briefly why correlating the ranks rather than the data themselves
solves the problem.
[3]

i) ii)

c) What conditions should be fulfilled when Pearson’s correlation coefficient is used?

Testing for association – categorical variables
Exercise 1 – Chi Square test of association

In a classic study by Clark and Clark (1939), African-American children were shown one black doll and
one white doll and asked which one they wanted to play with. Out of 252 children, 169 chose the
white doll and 83 chose the black doll.

a) Use a z-test for proportions to determine more children chose the white doll than would be
expected due to chance.

In 1970, Hraba and Grant carried out a similar study. 89 African-American children were offered a
choice of 4 dolls (2 white, 2 black). 28 children chose a white doll and 61 chose a black doll.

b) Use a z test for proportions to determine whether children were more likely to choose the black
doll in Hraba and Grant’s study than Clark and Clark’s.

c) Construct a 2x2 contingency table for choice (black doll or white doll) by year (1939, 1970)

d) Carry out a Chi-Square test to see if there is a dependency between year and choice of doll. Use
the alpha level of 0.05. Set out your work carefully including hypotheses, working for the calculation
of the test statistic and how you determined the degrees of freedom.

e) What are the differences between the z-test approach and the Chi-square approach?

f) Say I introduced a Chinese doll to the experiment, could I still use both tests?
Testing for association – a straight line relationship
Exercise 2: Correlation and covariance

In the lecture we heard about the data on heights of 3000 prisoners that Student used to verify the t
distribution. In fact, the researcher who collected the data (MacDonell 1902) also measured the
middle finger of each prisoner. Student derived the sampling distribution for Person’s correlation r
(which gives you the formula for testing the significance of a correlation in your formula book), and
tested it on these data.

Access the Google Sheet at:

https://docs.google.com/spreadsheets/d/1PplCGDXAvS1gh3BXvtr4Mb63fjdGXLVJVlfohB5Y79M/edit
?usp=sharing

a) Plot a scatter plot of height against middle finger length. Notice anything odd about this?
Comment.

b) Calculate the covariance between the two measures. Include in your written report the formula
you used (from the formula book section 11)

To do this you will need to first work out the means 𝑥 and 𝑦, my summing all the values in the
column of data and dividing by the number of values.

Then you can add a column containing (𝑥 − 𝑥) for each prisoner by entering a formula, and another
column containing (𝑦 − 𝑦) for each prisoner in the same way. Add another new column in which
you multiply these together.

Finally you can use the SUM function to add up all the entries in the column as necessary according
to the formula in the formula book.

c) Work out the standard deviation for height and finger length, and use these to calculate the
correlation coefficient r (see formula book section 11).

d) Take a look at the axes of your scatter plot. What units do you think were used for height? What
about finger length?

e) Convert height and finger length to cm and recalculate the covariance and correlation. Only one of
these should change. Which one? Explain why with reference to the formula for covariance and
correlation.
Exercise 3: More Correlation

The United Nations publishes an annual Human Development Report, gathering data on indices of
development such as health (life expectancy, infant mortality), education, poverty and inequality.
They aggregate data on life expectancy, years of education and income per capita to calculate a
Human Development Index and categorize countries as very high, high, medium or low

HDI – darker colours are higher HDI category

Access the Google Sheet at:

https://docs.google.com/spreadsheets/d/1bopmmS-
7lqcsdNK0DwSBEg5Y2NOcK_2OJqkoQiyYHYc/edit?usp=sharing

a) Make a scatter plot comparing the proportion of males and females with secondary education

Let x be the proportion of females with secondary education and y be the proportion of males with
secondary education

b) Work out the covariance sxy between years of education for males and females across all countries

c) Work out the standard deviation in years of education for males sx and females sy

d) What is the correlation coefficient, Pearson’s r, the proportions of females and males with
secondary education over all countries?

e) Is there a significant correlation?

f) Calculate the correlation for each HDI group separately and comment
Testing for association – non-parametric correlation
Exercise 4 – Rank Correlation

Access the file “EducationFertility” at Google Sheets:

https://docs.google.com/spreadsheets/d/1f4_2lPcoy_GTp1wyff5EchSI-
MfDtBetqEqh6SXv_RE/edit?usp=sharing

This file contains data on the proportion of females having some secondary education, and the
fertility rate (number of children per woman) for many countries. It also contains rankings for each
variable.

It is proposed that when women are educated, the number of children per family is lower (and in
general, poverty is lowered as a consequence).

a) Make a scatter plot of fertility vs. percentage of women with secondary education. Think about
which variable should be on the x and y axes in relation to the hypotheses above.

b) Calculate Pearson’s correlation coefficient, r, for fertility against women’s education.

c) Looking at your scatter plot, was this an appropriate approach? Why?

d) Make a scatter plot of the ranks and comment, in relation to your answer to c

e) Calculate Spearman’s rank correlation coefficient for the data.

f) What is the most appropriate measure of correlation for these data?

Tutor-Marked question
Exercise 5

Bystander apathy is a phenomenon in which observers are less likely to help the victim of a crime or
accident when other observers are also present.

Consider an experiment in which participants think they are supposed to be memorizing pairs of
words. In one condition [NO BYSTANDERS] the participant is alone with the experimenter. In another
condition [4 BYSTANDERS], there are four other ‘participants’ present (in fact these are actors
pretending to be participants), who we shall call the bystanders. Part way through the memory task,
the experimenter pretends to be having a seizure. The real experimental question is whether the
participant leaves the room to seek assistance for the experimenter or not.

These are the number of participants seeking assistance, or not, in each condition:

Seeks assistance Does not seek assistance

NO BYSTANDERS 13 5
4 BYSTANDERS 7 10

Let A be the event that the participant seeks assistance for the experimenter. Let B be the event that
there are bystanders present.

a) Name two statistical tests that could be used to determine whether the proportion of
participants seeking assistance differed when bystanders were present vs. absent.

b) For each of the above tests, the null hypothesis is that the proportion of participants seeking
assistance is equal, whether or not bystanders are present, ie p(A|B) = p(A|Bc) = pA. For one
test, there is a single possible alternative hypothesis. For the other test there are three
possible alternative hypotheses. State the possible alternative hypotheses for each test.

c) Carry out each of the tests named in part c. In each case clearly state your hypotheses, test
statistic, the critical value and your conclusion.

d) Say the number of subjects in the experiment was doubled and the results were as follows.

Seeks assistance Does not seek assistance

NO BYSTANDERS 26 10
4 BYSTANDERS 14 20

State, without calculation but giving a reason, whether you would expect the p-value for each
test to go up or down.

e) Covariance and correlation are measures of association for continuous variables. What is the
difference between covariance and correlation and when should you use each one?

AP Q&A Statistics:With 600 Questions and Answers
From Everand
AP Q&A Statistics:With 600 Questions and Answers
Barron's Educational Series
No ratings yet
Summary of Chapter 4
No ratings yet
Summary of Chapter 4
3 pages
Q4 Week 6 Statistics and Probability
No ratings yet
Q4 Week 6 Statistics and Probability
21 pages
Pearson Correlation
No ratings yet
Pearson Correlation
59 pages
Measures of Relationship
No ratings yet
Measures of Relationship
17 pages
Statistics & Probability Q4 - Week 7-8
No ratings yet
Statistics & Probability Q4 - Week 7-8
15 pages
Non Parametrics
No ratings yet
Non Parametrics
72 pages
Measures of Relationship
No ratings yet
Measures of Relationship
11 pages
Sample Statistics Coursework Gcse
100% (2)
Sample Statistics Coursework Gcse
4 pages
Pearson'S Product-Moment Correlation Coefficient: Statistics and Probability
No ratings yet
Pearson'S Product-Moment Correlation Coefficient: Statistics and Probability
18 pages
Pearsons Correlation
No ratings yet
Pearsons Correlation
11 pages
Group 2: Alvita Akfarahin N Liza Purnamasari Johan Pratama Zhafira Putri Khairani K3-2017
No ratings yet
Group 2: Alvita Akfarahin N Liza Purnamasari Johan Pratama Zhafira Putri Khairani K3-2017
20 pages
Chi-Square: Example Problem
No ratings yet
Chi-Square: Example Problem
7 pages
Measures of Relationship
No ratings yet
Measures of Relationship
11 pages
Stat and Probability Finals
No ratings yet
Stat and Probability Finals
7 pages
What is Statistics
No ratings yet
What is Statistics
24 pages
Research Paper
No ratings yet
Research Paper
20 pages
12 - The Correlational Research Strategy Short
No ratings yet
12 - The Correlational Research Strategy Short
44 pages
Q4 Week 6 - Statistics and Probability
No ratings yet
Q4 Week 6 - Statistics and Probability
22 pages
Lab 4 Instructions
No ratings yet
Lab 4 Instructions
11 pages
Correlation D 17
No ratings yet
Correlation D 17
8 pages
Lesson 11 - Regression and Correlation Analysis
No ratings yet
Lesson 11 - Regression and Correlation Analysis
8 pages
Z Scores
No ratings yet
Z Scores
19 pages
Inferential Statistics
No ratings yet
Inferential Statistics
76 pages
Statistics 2nd Year Practice Sheet CH#15
No ratings yet
Statistics 2nd Year Practice Sheet CH#15
7 pages
Workshop Make Up Exam
100% (1)
Workshop Make Up Exam
6 pages
Pearson R
No ratings yet
Pearson R
25 pages
Inferential Statistics
No ratings yet
Inferential Statistics
79 pages
Correlation
No ratings yet
Correlation
21 pages
Correlation:: (Bálint Tóth, Pázmány Péter Catholic University:, Do Not Share Without Author's Permission)
No ratings yet
Correlation:: (Bálint Tóth, Pázmány Péter Catholic University:, Do Not Share Without Author's Permission)
9 pages
Chapter 5 - Bivariate Correlation - 2023
No ratings yet
Chapter 5 - Bivariate Correlation - 2023
41 pages
Spearman Coefficient
No ratings yet
Spearman Coefficient
6 pages
Correlation: Farrokh Alemi, Ph.D. Kashif Haqqi M.D
No ratings yet
Correlation: Farrokh Alemi, Ph.D. Kashif Haqqi M.D
26 pages
Topic08. Simple Linear Reg
No ratings yet
Topic08. Simple Linear Reg
29 pages
Pearson and Correlation
No ratings yet
Pearson and Correlation
8 pages
correlation & Regression counters!!
No ratings yet
correlation & Regression counters!!
21 pages
MANSCI Midterm Correlation
No ratings yet
MANSCI Midterm Correlation
27 pages
Homework week 5
No ratings yet
Homework week 5
3 pages
Module 4 Correlations and Nonparametric Statistics
No ratings yet
Module 4 Correlations and Nonparametric Statistics
53 pages
L3 Correlation
No ratings yet
L3 Correlation
101 pages
FODS Unit-3
No ratings yet
FODS Unit-3
25 pages
Testing Hypothesis
No ratings yet
Testing Hypothesis
6 pages
The Significance of Correlation
No ratings yet
The Significance of Correlation
6 pages
Edur 8131 Notes 6 Correlation
No ratings yet
Edur 8131 Notes 6 Correlation
25 pages
QUANTITATIVE DATA ANALYSIS TECHNICS-FINAL
No ratings yet
QUANTITATIVE DATA ANALYSIS TECHNICS-FINAL
16 pages
12- The Correlational Research Strategy short
No ratings yet
12- The Correlational Research Strategy short
44 pages
Choosing The Right Elementary Statistical Test: Type of Question Level of Data / Assumptions Examples Statistical Test
No ratings yet
Choosing The Right Elementary Statistical Test: Type of Question Level of Data / Assumptions Examples Statistical Test
2 pages
Correlation
No ratings yet
Correlation
8 pages
Biostatistics Unit 10. Measures of Relationship
No ratings yet
Biostatistics Unit 10. Measures of Relationship
37 pages
Notes6 Correlation
No ratings yet
Notes6 Correlation
28 pages
Measures of Association For Tables (8.4) : - Difference of Proportions - The Odds Ratio
No ratings yet
Measures of Association For Tables (8.4) : - Difference of Proportions - The Odds Ratio
24 pages
PS#3 - Hypothesis Testing
No ratings yet
PS#3 - Hypothesis Testing
2 pages
PN 24833 Solutions
No ratings yet
PN 24833 Solutions
12 pages
Correlation: Nidhi Pawan Tewathia Pavan Kumar Pal
No ratings yet
Correlation: Nidhi Pawan Tewathia Pavan Kumar Pal
22 pages
Lecture 23
No ratings yet
Lecture 23
14 pages
Correlaton Stats
No ratings yet
Correlaton Stats
8 pages
Simple Regression and Correlation Analysis
100% (2)
Simple Regression and Correlation Analysis
27 pages
Research Methods Chapter 5
No ratings yet
Research Methods Chapter 5
59 pages
Pearson R-Chi Square-ANOVA
No ratings yet
Pearson R-Chi Square-ANOVA
92 pages
Chi Squared for Beginners
From Everand
Chi Squared for Beginners
Stephanie Glen
No ratings yet
Okafor.2022.e.sdm
No ratings yet
Okafor.2022.e.sdm
13 pages
Research Jollibot
100% (1)
Research Jollibot
21 pages
PROJECT 1 STA 108 Baru
100% (1)
PROJECT 1 STA 108 Baru
26 pages
RRL 1
No ratings yet
RRL 1
28 pages
Analyzing and Interpreting
No ratings yet
Analyzing and Interpreting
18 pages
Ensemble Regression Models Applied To Dropout in Higher Education
No ratings yet
Ensemble Regression Models Applied To Dropout in Higher Education
6 pages
Mini Research The Impact of Fintech and Its Potential To Consumer
No ratings yet
Mini Research The Impact of Fintech and Its Potential To Consumer
30 pages
Chapter 18 Biodiversity, Classification and Conservation Edition 2 Student
No ratings yet
Chapter 18 Biodiversity, Classification and Conservation Edition 2 Student
165 pages
Ma Geography Syllabus Revised 2020
No ratings yet
Ma Geography Syllabus Revised 2020
61 pages
unit6_Spearman’s and Kendall’s Test (2)
No ratings yet
unit6_Spearman’s and Kendall’s Test (2)
5 pages
LIBRA Group 1
No ratings yet
LIBRA Group 1
18 pages
Formula and Notes For Class 11 Maths Download PDF Chapter 15. Statistics
No ratings yet
Formula and Notes For Class 11 Maths Download PDF Chapter 15. Statistics
16 pages
Research Methodology - QB
No ratings yet
Research Methodology - QB
5 pages
EDUC 202 PER Module SGS
No ratings yet
EDUC 202 PER Module SGS
5 pages
Computational Network Analysis with R Applications in Biology Medicine and Chemistry 1st Edition Matthias Dehmer download pdf
100% (1)
Computational Network Analysis with R Applications in Biology Medicine and Chemistry 1st Edition Matthias Dehmer download pdf
50 pages
Work-Readiness-Inventory-Administrators-Guide Brady
100% (2)
Work-Readiness-Inventory-Administrators-Guide Brady
16 pages
Land Use Policy
No ratings yet
Land Use Policy
11 pages
Ugc Model Curriculum Statistics: Submitted To The University Grants Commission in April 2001
No ratings yet
Ugc Model Curriculum Statistics: Submitted To The University Grants Commission in April 2001
101 pages
SSC 202 Statistical Methods and Sources II-1
No ratings yet
SSC 202 Statistical Methods and Sources II-1
106 pages
Team 7 MINI Project FINAL REPORT
No ratings yet
Team 7 MINI Project FINAL REPORT
41 pages
6 Pearson r Completed
No ratings yet
6 Pearson r Completed
13 pages
Part-2 - Statistics - Last Day Super-300 Questions
No ratings yet
Part-2 - Statistics - Last Day Super-300 Questions
10 pages
Beck's Hopelessness Scale
No ratings yet
Beck's Hopelessness Scale
41 pages
Exercise For Statistics Course
0% (1)
Exercise For Statistics Course
30 pages
Test Analysis and Utilization: Rodger R. de Padua Ed, D. PSDS - Hermosa
No ratings yet
Test Analysis and Utilization: Rodger R. de Padua Ed, D. PSDS - Hermosa
48 pages
Dip Secretarial Notes
No ratings yet
Dip Secretarial Notes
116 pages
Gwyn Research Chap 1
No ratings yet
Gwyn Research Chap 1
43 pages
Chapter - 13 Correlation and Linear Regression
No ratings yet
Chapter - 13 Correlation and Linear Regression
32 pages
P.A. OUTCOME 1 Combined
No ratings yet
P.A. OUTCOME 1 Combined
135 pages
Research Paper
No ratings yet
Research Paper
26 pages

Problem Sheet Week 6

Uploaded by

Problem Sheet Week 6

Uploaded by

Statistics – 6th Week Michaelmas Term - Answers

Association between variables

We are covering the following topics:

Fear not, you can find simple instructions here:

c) What conditions should be fulfilled when Pearson’s correlation coefficient is used?

Access the Google Sheet at:

HDI – darker colours are higher HDI category

Access the Google Sheet at:

e) Is there a significant correlation?

Access the file “EducationFertility” at Google Sheets:

b) Calculate Pearson’s correlation coefficient, r, for fertility against women’s education.

e) Calculate Spearman’s rank correlation coefficient for the data.

f) What is the most appropriate measure of correlation for these data?

Seeks assistance Does not seek assistance

Seeks assistance Does not seek assistance

You might also like