0% found this document useful (0 votes)
90 views

STAT2110 Instructions For Practical Work 2021

This document provides instructions for a statistical data processing assignment using SAS EG. Students will complete the assignment in groups of 1-3 people by analyzing a dataset of bank customer information and submitting a 10 page maximum written report. The report should include an introduction describing the research questions, a statistical description of the variables, analyses of statistical dependencies between variables, and a summary. Students must select at least 3 variables and formulate two research problems related to statistical dependencies between the variables. They should then describe the variable distributions and examine pairwise relationships between variables, performing at least two statistical tests of dependency. The deadline for submitting the assignment is January 16th for one course mode and the exam date for another course mode.

Uploaded by

orxanmeh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
90 views

STAT2110 Instructions For Practical Work 2021

This document provides instructions for a statistical data processing assignment using SAS EG. Students will complete the assignment in groups of 1-3 people by analyzing a dataset of bank customer information and submitting a 10 page maximum written report. The report should include an introduction describing the research questions, a statistical description of the variables, analyses of statistical dependencies between variables, and a summary. Students must select at least 3 variables and formulate two research problems related to statistical dependencies between the variables. They should then describe the variable distributions and examine pairwise relationships between variables, performing at least two statistical tests of dependency. The deadline for submitting the assignment is January 16th for one course mode and the exam date for another course mode.

Uploaded by

orxanmeh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

STAT2110 Statistical Data Processing SAS EG

WRITTEN ASSIGNMENT

PRACTICAL WORK 2021

 This assignment work is done in groups of 1-3 students upon individual choice.
 The assignment report is submitted via Moodle as a pdf-file. In a group of several students
all students submit the same report separately.
 The deadline for submitting this assignment is Sun 16th of January 2022 for those students
who follow Mode 1 to complete this course.
 The deadline for submitting this assignment is the same date as the exam date is for those
students who follow Mode 2 to complete this course.
 Feedback is given for each student in Moodle. The report will be accepted or it will be
returned for corrections. The corrected report should be submitted within 3 weeks after the
report was returned for corrections. There are at most two retries to correct the assignment.
 The writing guidelines of the University of Vaasa should be followed when writing the report.
However, the cover page of the report should have the next information: the name of the
written assignment, the name(s) and student number(s) of the student(s).
 The report can be written either in English or in Finnish.
 Maximum number of pages for the report: 10 pages (cover page and possible appendices
not included).

The structure of the report could be something like:

1 Introduction
This is a brief introduction where you tell the main features of your research
- what are the cases
- what are your two research problems/questions
(for instance: do younger people have more loan defaults than older
(=relationship between age and default)
is there a linear relationship between income and credit card debt)

2 Statistical Description of Data


You describe in writing and illustrate by using graphs and/or tables and/or descriptive statistics the
distributions of variables related to your research problems.

3 Analyzes of Statistical Dependency


You illustrate and present in writing the results of the analyzes and tests related to your research
problems.

4 Summary
Lastly you summarize your whole research.

The dataset for the empirical work is empwork2021. You can download the empwork2021.xlsx -file
from Moodle.

(You can also have a dataset of your own. In that case, you should try to apply these instructions,
too. When submitting the report in Moodle, you should also submit your SAS dataset.)
Here is a short description of the dataset empwork2021:

The data is a hypothetical data file of a bank’s youngish customers. The file contains financial and
demographic information on 279 customers. The variables in the dataset are:

age Age in years

ed Level of education 1 Did not complete high school


2 High school degree
3 Some college
4 College degree
5 Higher degree
employ Years with current employer

residence Place of residence 1 Westside


2 North town
3 East city
address Years at current address

income Customer’s yearly income in


thousands (€)
debtinc Debt to income ratio (%)
creddebt Credit card debt in thousands (€)

othdebt Other debt in thousands (€)

default Previously loan defaulted 0 No


1 Yes

For your two research problems/questions you need to select at least 3 variables. You can also
create new variables on the bases of the existing ones by using variable transformations or
classifications. You formulate exactly two research problems on the bases of the selected variables.
The research problems should be related to statistical dependencies.

You start the statistical analysis of the dataset by first describing or/and illustrating the distributions
of the selected variables. You can do it by creating suitable statistical graphs and/or you can
calculate such basic descriptive statistics that is appropriate to describe either the location and
dispersion or the frequency distribution of a single variable.

The next thing to do is to examine the possible (pairwise) relationships between the variables you
have chosen. (You can also examine multivariate relationships.) Apply just one accurate analysis
method and statistical test per relationship. Altogether you must perform (at least) two statistical
tests of dependency.
In the next table, there are some ideas, which statistical analysis method and test you might want
to use to examine for instance pairwise relationships. Detailed information of these methods (for
instance the assumptions of the tests) can be found in the lecture notes.

Level of measurement Some basic (methods and) tests in order of “quality”


X (possible cause) Y (response)
NOMINAL/ORDINAL NOMINAL 1) Crosstab and Contingency Coefficient/Cramer’s V with
the Chi-Square -test for independence.
2) Logistic regression with related tests if Y is dichotomous.
NOMINAL ORDINAL 1) Mann-Whitney –test (same as Wilcoxon two sample test).
(dichotomous) 2) Crosstab and Contingency Coefficient/Cramer’s V with
the Chi-Square -test for independence.
NOMINAL ORDINAL 1) Kruskal-Wallis –test.
(more than 2 values) 2) Crosstab and Contingency Coefficient/Cramer’s V with
the Chi-Square -test for independence.
NOMINAL INTERVAL/ 1) If Y is normal in both groups: two samples t-test of means.
(dichotomous) RATIO 2) If Y is non-normal: Mann-Whitney –test.
3) If Y is normal in both groups: linear regression with
related tests and a dummy explanatory variable.
NOMINAL INTERVAL/RATIO 1) If Y is normal in each group and the population variances
(more than 2 values) are equal: ANOVA.
2) If Y is normal in each group and the population variances
are not equal: Welch’s variance-weighted ANOVA.
3) If Y is non-normal: Kruskal-Wallis –test.
4) If Y is normal in each group: linear regression with related
tests and several explanatory dummy variables.
ORDINAL ORDINAL 1) Spearman’s rank correlation with test of significance if
monotonous relationship is of interest.
2) Mann-Whitney –test (same as Wilcoxon two sample test)
or Kruskal-Wallis –test.
3) Crosstab and Contingency Coefficient/Cramer’s V with
the Chi-Square -test for independence.
ORDINAL INTERVAL/RATIO 1) Spearman’s rank correlation with test of significance if
monotonous relationship is of interest.
2) If Y is normal in each group and the population variances
are equal: ANOVA or two-samples t-test of means assuming
equal variances.
3) If Y is normal in each group and the population variances
are not equal: Welch’s variance-weighted ANOVA or two-
samples t-test of means assuming unequal variances.
4) If Y is non-normal: Mann-Whitney –test (same as
Wilcoxon two sample test) or Kruskal-Wallis –test.
5) If Y is normal in each group: linear regression with related
tests and several explanatory dummy variables.
INTERVAL/RATIO NOMINAL 1) Logistic regression with related tests.
2) Classify X: Crosstab and Contingency Coefficient/Cramer’s
V with the Chi-Square -test for independence.
INTERVAL/RATIO ORDINAL 1) Spearman’s rank correlation with test of significance if
monotonous relationship is of interest.
2) Classify X: Crosstab and Contingency Coefficient/Cramer’s
V with the Chi-Square -test for independence.
3) Classify X: Mann-Whitney –test (same as Wilcoxon two
sample test) or Kruskal-Wallis –test.
INTERVAL/RATIO INTERVAL/RATIO 1) If data is normal: Pearson correlation with test of
significance if linear relationship is of interest or linear
regression with related tests.
2) If data is normal: Spearman’s rank correlation with test of
significance if monotonous relationship is of interest or
nonlinear regression with related tests.
3) If data is non-normal: Spearman’s rank correlation with
test of significance if monotonous relationship is of interest.
4) Classify X: Mann-Whitney –test (same as Wilcoxon two
sample test) or Kruskal-Wallis –test.
5) Classify X or Y or both and then other methods are
applicable, too

You might also like