0% found this document useful (0 votes)

240 views

Assignment # 1

Here are the steps to fit a binary logistic regression model to predict default on the basis of variables age, ed, income, debtinc, creddebt and othdebt using the bankloan.sav data file: 1. Select Binary Logistic Regression from the Analyze menu. 2. Move the variable "default" to the Dependent Variable box. 3. Move the variables "age", "ed", "income", "debtinc", "creddebt", "othdebt" to the Covariates box. 4. Click Continue and OK to run the analysis. 5. The Variables in the Equation table shows the significance of each predictor. All variables except "ed" are significant predictors of default

Uploaded by

Hashim javed

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

240 views

Assignment # 1

Uploaded by

Hashim javed

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 28

Department of Statistics

Program: M.Sc. Statistics

Course Number 1569

Data Analysis and Statistical

Course Title
Packages
Semester/Year 3rd / 2019 Autumn

Instructor Sir. Zahoor Ahmad

01
ASSIGNMENT No.

Submission Date

Due Date

Student Name Muhamamd Hashim Javed

Student ID BT-588221

Signature*

Name: M. Hashim Javed Roll: BT-588221

Name: M. Hashim Javed Roll: BT-588221
Q1. Perform the following tasks in SPSS.

a) Use catalog.sav sample data file to fit a multiple linear model to predict "Sales of Men's
Clothing" on the basis of varibales "Number of Catalogs Mailed", "Number of Pages in
Catalog", "Number of Phone Lines Open for Ordering", "Amount Spent on Print
Advertising" and "Number of Customer Service Representatives".
Use forward selection, backward elimination and enter methods in this respect. Interpret
your result in each case.
Variables Entered/Removeda

Model Variables Variables Method

Entered Removed

1 mail . Forward (Criterion: Probability-of-F-to-enter <= .050)

2 phone . Forward (Criterion: Probability-of-F-to-enter <= .050)
3 print . Forward (Criterion: Probability-of-F-to-enter <= .050)
4 page . Forward (Criterion: Probability-of-F-to-enter <= .050)

a. Dependent Variable: men

Model Summarye

Model R R Square Adjusted R Std. Error of the

Square Estimate

1 .803a .645 .642 3785.49685

2 .877b .770 .766 3061.36064
3 .885c .784 .778 2980.12178
4 .891d .794 .787 2919.90929

a. Predictors: (Constant), mail

b. Predictors: (Constant), mail, phone
c. Predictors: (Constant), mail, phone, print
d. Predictors: (Constant), mail, phone, print, page
e. Dependent Variable: men

Conclusion:
The model has been run 4 times, 1st included variable ‘mail’ and the adjusted R2 is 0.642,
and 2nd time the variable ‘phone’ is included and the adjusted R2 is 0.766, 3rd time the variable
‘print’ is included and the adjusted R2 is 0.778 and 4th time the variable ‘page’ is included and the
adjusted R2 is 0.787.

Name: M. Hashim Javed Roll: BT-588221

ANOVAa

Model Sum of Squares df Mean Square F Sig.

Regression 3069712621.002 1 3069712621.002 214.216 .000b

1 Residual 1690938397.841 118 14329986.422

Total 4760651018.843 119

Regression 3664135333.036 2 1832067666.518 195.485 .000c
2 Residual 1096515685.807 117 9371928.939
Total 4760651018.843 119
Regression 3730440424.702 3 1243480141.567 140.014 .000d
3 Residual 1030210594.141 116 8881125.812
Total 4760651018.843 119
Regression 3780175938.309 4 945043984.577 110.844 .000e
4 Residual 980475080.535 115 8525870.266

Total 4760651018.843 119

a. Dependent Variable: men

b. Predictors: (Constant), mail
c. Predictors: (Constant), mail, phone
d. Predictors: (Constant), mail, phone, print
e. Predictors: (Constant), mail, phone, print, page

Conclusion:
The model has been run 4 times, 1st time for including variable ‘mail’ and the
significance is 0.000, and 2nd time for including variable ‘phone’ is included and the significance is
0.000, 3rd time for including variable ‘print’ is included and the significance is 0.000 and 4th time for
including variable ‘page’ and the significance is 0.000. as every time the model showed that the
variable included has significant relation with dependent variable as their p-value is less than 0.05.

Coefficientsa

Model Unstandardized Coefficients Standardized t Sig.

Coefficients

B Std. Error Beta

(Constant) -14064.614 2099.365 -6.699 .000

1
mail 2.991 .204 .803 14.636 .000
(Constant) -15361.047 1705.559 -9.006 .000
2 mail 1.971 .209 .529 9.424 .000
phone 334.103 41.952 .447 7.964 .000
(Constant) -20665.869 2554.586 -8.090 .000
mail 1.862 .207 .500 8.977 .000
3
phone 339.159 40.880 .454 8.296 .000
print .218 .080 .121 2.732 .007
4 (Constant) -23898.558 2838.361 -8.420 .000

Name: M. Hashim Javed Roll: BT-588221

mail 1.847 .203 .496 9.083 .000

phone 327.802 40.329 .439 8.128 .000

print .208 .078 .115 2.656 .009

page 50.508 20.912 .104 2.415 .017

a. Dependent Variable: men

Excluded Variablesa

Model Beta In t Sig. Partial Collinearity

Correlation Statistics

Tolerance

page .149b 2.773 .006 .248 .980

phone .447b 7.964 .000 .593 .625

1
print .104b 1.877 .063 .171 .957

service .153b 1.997 .048 .182 .501

page .110c 2.496 .014 .226 .968
2 print .121c 2.732 .007 .246 .955
service -.064c -.933 .353 -.086 .416
page .104d 2.415 .017 .220 .965
3
service -.079d -1.183 .239 -.110 .413
4 service -.072e -1.096 .275 -.102 .412

a. Dependent Variable: men

b. Predictors in the Model: (Constant), mail
c. Predictors in the Model: (Constant), mail, phone
d. Predictors in the Model: (Constant), mail, phone, print
e. Predictors in the Model: (Constant), mail, phone, print, page

Residuals Statisticsa

Minimum Maximum Mean Std. Deviation N

Predicted Value 2636.2012 34103.5547 16242.8134 5636.14978 120

Residual -8822.03613 9087.41895 .00000 2870.41572 120
Std. Predicted Value -2.414 3.169 .000 1.000 120
Std. Residual -3.021 3.112 .000 .983 120

a. Dependent Variable: men

Name: M. Hashim Javed Roll: BT-588221

Name: M. Hashim Javed Roll: BT-588221
b) Use bankloan.sav sample data file to fit a binary logistic regression model to predict
default on the basis of variables age, ed, income, debtinc, creddebt and othdebt.
Interpret your result.

Dependent Variable Encoding

Original Value Internal Value

No 0
Yes 1

Block 0: Beginning Block

Classification Tablea,b

Observed Predicted

default Percentage

No Yes Correct

No 517 0 100.0
default
Step 0 Yes 183 0 .0

Overall Percentage 73.9

a. Constant is included in the model.

b. The cut value is .500

Variables in the Equation

B S.E. Wald df Sig. Exp(B)

Step 0 Constant -1.039 .086 145.782 1 .000 .354

Variables not in the Equation

Score df Sig.

age 13.265 1 .000

ed 9.205 1 .002

income 3.526 1 .060

Variables
Step 0 debtinc 106.238 1 .000

creddebt 41.928 1 .000

othdebt 14.863 1 .000

Overall Statistics 148.310 6 .000

Name: M. Hashim Javed Roll: BT-588221

Block 1: Method = Enter

Omnibus Tests of Model Coefficients

Chi-square df Sig.

Step 153.662 6 .000

Step 1 Block 153.662 6 .000

Model 153.662 6 .000

Model Summary

Step -2 Log likelihood Cox & Snell R Nagelkerke R

Square Square

1 650.702a .197 .289

a. Estimation terminated at iteration number 5 because parameter

estimates changed by less than .001.

Classification Tablea

Observed Predicted

default Percentage

No Yes Correct

No 483 34 93.4
default
Step 1 Yes 122 61 33.3

Overall Percentage 77.7

a. The cut value is .500

Variables in the Equation

B S.E. Wald df Sig. Exp(B)

age -.047 .014 10.632 1 .001 .954

ed .392 .105 13.802 1 .000 1.480

income -.013 .007 3.077 1 .079 .987

Step 1a debtinc .111 .027 16.986 1 .000 1.117

creddebt .341 .088 15.186 1 .000 1.407

othdebt -.069 .062 1.221 1 .269 .933

Constant -1.198 .563 4.523 1 .033 .302

a. Variable(s) entered on step 1: age, ed, income, debtinc, creddebt, othdebt.

Name: M. Hashim Javed Roll: BT-588221

Name: M. Hashim Javed Roll: BT-588221
c) Find frequency distribution of "Preferred breakfast" for those senior citizens who
are also living an active life (Use sample data set cereal.sav)
Ans:

First step is to select those cases only, for which the life-style is recorded as active. [active=1]
bfast * agecat Crosstabulation

Count

agecat Total

Under 31 31-45 46-60 Over 60

Breakfast Bar 58 60 23 12 153

bfast Oatmeal 2 12 31 57 102

Cereal 51 45 38 17 151

Total 111 117 92 86 406

The frequency distribution of those senior citizens who are living an active life is highlighted in yellow.
It is seen that senior citizens who are living an active life prefer Oatmeal most of the time.

Name: M. Hashim Javed Roll: BT-588221

d) Carry out a test of independence of attributes "Preferred breakfast" and "marital
status". (Use cereal.sav)

Case Processing Summary

Cases

Valid Missing Total

N Percent N Percent N Percent

bfast * marital 880 100.0% 0 0.0% 880 100.0%

bfast * marital Crosstabulation

Count

marital Total

Unmarried Married

Breakfast Bar 108 123 231

bfast Oatmeal 95 215 310

Cereal 100 239 339

Total 303 577 880

Chi-Square Tests

Value df Asymp. Sig. (2-sided)

Pearson Chi-Square 21.157a 2 .000

Likelihood Ratio 20.623 2 .000

Linear-by-Linear Association 16.226 1 .000

N of Valid Cases 880

a. 0 cells (0.0%) have expected count less than 5. The minimum expected count is 79.54.

Conclusion:
As the p-value (0.000) is less than our chosen significance level (α=0.05), so we reject the null
hypothesis and accept the alternate hypothesis that there is association between Preferred Breakfast
and Marital Status.

Name: M. Hashim Javed Roll: BT-588221

e) Carry out a test of independence of attributes "Preferred breakfast" and "Age
category". (Use cereal.sav)

Case Processing Summary

Cases

Valid Missing Total

N Percent N Percent N Percent

bfast * agecat 880 100.0% 0 0.0% 880 100.0%

bfast * agecat Crosstabulation

Count

agecat Total

Under 31 31-45 46-60 Over 60

Breakfast Bar 84 90 39 18 231

bfast
Oatmeal 4 24 97 185 310

Cereal 93 92 95 59 339
Total 181 206 231 262 880

Chi-Square Tests

Value df Asymp. Sig. (2-sided)

Pearson Chi-Square 309.336a 6 .000

Likelihood Ratio 350.688 6 .000

Linear-by-Linear Association 4.986 1 .026

N of Valid Cases 880

a. 0 cells (0.0%) have expected count less than 5. The minimum expected count is 47.51.

Name: M. Hashim Javed Roll: BT-588221

f) Use grocery_coupons.sav sample data file to test that the mean of amount spent is
equal to 105. Find 90% confidence interval for the mean of amount spent. Also test
that the mean of amount spent by both male and female customers is equal. What
would to say about the equality of means for amount spent on stores of different
sizes?
Ans:
Part 1: Use grocery_coupons.sav sample data file to test that the mean of amount spent is
equal to 105. Find 90% confidence interval for the mean of amount spent.

T-Test
One-Sample Statistics

N Mean Std. Deviation Std. Error Mean

amtspent 1404 99.9338 48.54435 1.29555

One-Sample Test

Test Value = 105

t df Sig. (2-tailed) Mean Difference 90% Confidence Interval of the

Difference

Lower Upper

amtspent -3.910 1403 .000 -5.06621 -7.1986 -2.9338

Conclusion:
As the p-value (0.000) is less than our chosen significance level (α=0.1), so it is concluded that
the mean of Amount Spent is not equal and is significantly different from 105. And the 90%
Confidence interval for the mean of Amount Spent is [-7.1986 – -2.9338]

Part 2: Also test that the mean of amount spent by both male and female customers is
equal.
T-TEST GROUPS=gender(0 1)
/MISSING=ANALYSIS
/VARIABLES=amtspent
/CRITERIA=CI(.95).
T-Test
Group Statistics

gender N Mean Std. Deviation Std. Error Mean

Male 740 107.5761 49.09908 1.80492

amtspent
Female 664 91.4168 46.49620 1.80440

Name: M. Hashim Javed Roll: BT-588221

Independent Samples Test

Levene's Test for t-test for Equality of Means

Equality of Variances

F Sig. t df Sig. (2- Mean Std. Error 95% Confidence Interval

tailed) Difference Difference of the Difference

Lower Upper

Equal variances
.458 .499 6.313 1402 .000 16.15930 2.55971 11.13803 21.18058
assumed
amtspent
Equal variances
6.332 1397.923 .000 16.15930 2.55218 11.15280 21.16581
not assumed

Conclusion:
We will use the top row (Equal variances assumed) as the p-value of Leven’s test is above 0.05
As the p-value (0.000) is less than our chosen significance level (α=0.05), so it is concluded that the
mean of Amount Spent is significantly different for both male and female customers. And the 95%
Confidence interval for the mean of Amount Spent is [11.138 – 2.9338]

Part 3: What would to say about the equality of means for amount spent on stores of
different sizes?

ONEWAY amtspent BY size

/MISSING ANALYSIS.
Oneway
ANOVA
amtspent

Sum of Squares df Mean Square F Sig.

Between Groups 20053.727 2 10026.864 4.275 .014

Within Groups 3286191.636 1401 2345.604
Total 3306245.363 1403

Conclusion:
As the p-value (0.014) for is less than our chosen significance level (α=0.05), so it is concluded
that the mean of Amount Spent is not equal and is significantly different for both male and female
customers. And the 95% Confidence interval for the mean of Amount Spent is [11.138 – 2.9338]

Name: M. Hashim Javed Roll: BT-588221

Q3 is a group assignment (Group size not more than 5 students).
a) Create a Questionnaire having 10-15 questions to collect data from 30 respondents.
Topic may be of your choice. Submit the soft copy of questionnaire through class
representative not later than 20th October 2019.
b) Enter your data in SPSS. Naming of variables and data types should be appropriate.
c) Carry out data analysis of this data set.
d) Submit report of the statistical analysis of this data in MS Word.

WHODAS 2.0
WORLD HEALTH ORGANIZATION DISABILITY ASSESSMENT
SCHEDULE 2.0

12-item version, self-administered

This questionnaire asks about difficulties due to health conditions. Health conditions include diseases or illnesses, other
health problems that may be short or long lasting, injuries, mental or emotional problems, and problems with alcohol or
drugs.

Think back over the past 30 days and answer these questions, thinking about how much difficulty you had doing the
following activities. For each question, please circle only one response.
In the past 30 days, how much difficulty did you have in:

S1 Standing for long periods such as 30 None Mild Moderate Severe Extreme
minutes? or cannot
do

S2 Taking care of your household None Mild Moderate Severe Extreme

responsibilities? or cannot
do

S3 Learning a new task, for example, None Mild Moderate Severe Extreme
learning how to get to a new place? or cannot
do

S4 How much of a problem did you have None Mild Moderate Severe Extreme
joining in community activities (for or cannot
example, festivities, religious or other do
activities) in the same way as anyone
else can?

S5 How much have you been emotionally None Mild Moderate Severe Extreme
affected by your health problems? or cannot
do

S6 Concentrating on doing something for ten None Mild Moderate Severe Extreme
minutes? or cannot
do

S7 Walking a long distance such as a None Mild Moderate Severe Extreme

kilometre [or equivalent]? or cannot
do

S8 Washing your whole body? None Mild Moderate Severe Extreme

or cannot
do

S9 Getting dressed? None Mild Moderate Severe Extreme

or cannot
do

S10 Dealing with people you do not know? None Mild Moderate Severe Extreme
or cannot
do

S11 Maintaining a friendship? None Mild Moderate Severe Extreme

or cannot
do

Name: M. Hashim Javed Roll: BT-588221

S12 Your day-to-day work? None Mild Moderate Severe Extreme
or cannot
do

H1 Overall, in the past 30 days, how many days were these

difficulties present? Record number of days

H2 In the past 30 days, for how many days were you totally unable
to carry out your usual activities or work because of any health Record number of days
condition?

H3 In the past 30 days, not counting the days that you were totally
unable, for how many days did you cut back or reduce your Record number of days
usual activities or work because of any health condition?

Descriptive Statistics

N Minimum Maximum Mean Std. Deviation

overal_score 30 27 50 35.47 5.776

Valid N (listwise) 30

Independent Samples Test

Levene's Test for t-test for Equality of Means

Equality of Variances

F Sig. t df Sig. (2-tailed) Mean Std. Error 95% Confidence Interval

Difference Difference of the Difference

Lower Upper

Equal variances
.050 .825 .052 28 .959 .125 2.427 -4.846 5.096
assumed
overal_score
Equal variances
.054 13.517 .958 .125 2.328 -4.884 5.134
not assumed

ANOVA
overal_score

Sum of Squares df Mean Square F Sig.

Between Groups 367.133 15 24.476 .571 .853

Within Groups 600.333 14 42.881
Total 967.467 29

Name: M. Hashim Javed Roll: BT-588221

Q4. Perform the following tasks in Minitab.
a) In order to ascertain the age distribution of operatives in a certain industry, random
samples of 1720 males and 1230 females are drawn. The sample means and standard
deviations were 33.93 years and 14.20 years for the males and 27.44 years and 10.79
years for the females. Calculate the 95 percent confidence interval for
i. The mean age of all the male operatives.
i – Ans:

Variable N Mean StDev SE Mean 95% CI

C1 1720 34.173 14.047 0.339 (33.509, 34.837)

ii. The differences between their mean ages.

Ii – Ans:
Estimate for difference: 6.204
95% CI for difference: (5.305, 7.104)
T-Test of difference = 0 (vs not =): T-Value = 13.52 P-Value = 0.000 DF =
2930

b) A psychology class performed an experiment to compare whether a recall score in

which instructions to form images of 25 words were given is better than an initial
recall score for which no images instruction were given. Twelve students
participated in the experiment with the following results:
With
20 24 20 18 22 19 20 19 17 21 17 20
Imagery
Without
5 9 5 9 6 11 8 11 7 9 8 16
Imagery
Does it appear that the average recall score is higher when imaginary is used? Also
construct 95% confidence interval for the difference between the mean of both the
imageries and interpret the results.
Ans:
Paired T-Test and CI: With Imagery, Without Imagery

Paired T for With Imagery - Without Imagery

N Mean StDev SE Mean

With Imagery 12 19.750 2.006 0.579
Without Imagery 12 8.667 3.055 0.882
Difference 12 11.08 3.70 1.07

95% CI for mean difference: (8.73, 13.44)

Name: M. Hashim Javed Roll: BT-588221
T-Test of mean difference = 0 (vs not = 0): T-Value = 10.37 P-Value = 0.000

Conclusion:
The mean of recall score when imagery is 19.75 and without using imagery it is 8.667, it is
clearly obvious that the score is better when using imagery, We also tested it statistically and results
are as the p-value (0.000) for paired t-test of mean difference is less than our chosen significance
level (α=0.05), so it is concluded that the mean difference of Recall Score with Imagery and Recall
Score without Imagery is significant. And the 95% Confidence interval for the mean difference of
Recall Score is [8.73 – 13.44].

c) Generate 4 samples of sizes 5, 6, 7 and 7 from normal populations with means 45, 40, 47
and 38 respectively. While the standard deviations of these distributions are 4, 6, 7 and 8
respectively. Test the equality of means.
Ans:
One-way ANOVA: Population versus Factor

Source DF SS MS F P
Factor 3 595.3 198.4 4.48 0.014
Error 21 929.3 44.3
Total 24 1524.6

S = 6.652 R-Sq = 39.05% R-Sq(adj) = 30.34%

Individual 95% CIs For Mean Based on

Pooled StDev
Level N Mean StDev ---------+---------+---------+---------+
1 5 47.891 3.939 (---------*---------)
2 6 36.004 6.483 (--------*--------)
3 7 47.847 9.382 (--------*-------)
4 7 41.373 4.636 (--------*--------)
---------+---------+---------+---------+
36.0 42.0 48.0 54.0

Pooled StDev = 6.652

Conclusion:
The F-Statistic value is 4.48 and as the p-value (0.014) is less than our chosen significance level
(α=0.05), so it is concluded that all of the means are not equal, at least one of the means is
significantly different.

Name: M. Hashim Javed Roll: BT-588221

Q5.
a) Explain the procedure for testing of equality of several means in Minitab and SPSS.
b) Use Minitab/SPSS to test equality of means for the following experiment of wheat yield for
different varieties. Varieties are shown by A, B, C, D and E.

A (8) B (5.3) C (4.1) D (5) E (16)

D (6.8) A (4.9) B (4.1) C (3.2) E (18)
B (6.3) E (16) C (4.7) D (4.0) A (5.0)
C (5.7) D (3.3) E (25) A (4.0) B (4.2)
E (18) C (4.7) A (4.2) D (6.6) B (6.2)
Ans:
One-way ANOVA: resp versus fac

Source DF SS MS F P
fac 4 740.14 185.03 44.59 0.000
Error 20 83.00 4.15
Total 24 823.13

S = 2.037 R-Sq = 89.92% R-Sq(adj) = 87.90%

Individual 95% CIs For Mean Based on

Pooled StDev
Level N Mean StDev -----+---------+---------+---------+----
1 5 5.220 1.613 (--*---)
2 5 5.220 1.052 (--*---)
3 5 4.480 0.918 (---*---)
4 5 5.140 1.549 (---*---)
5 5 18.600 3.715 (---*---)
-----+---------+---------+---------+----
5.0 10.0 15.0 20.0

Pooled StDev = 2.037

Conclusion:
The F-Statistic value is 44.59 and as the p-value (0.000) is less than our chosen significance level
(α=0.05), so it is concluded that all of the means are not equal, at least one of the means is
significantly different.

Name: M. Hashim Javed Roll: BT-588221

Q6.
a) Consider the experiment in which two fair dice are tossed and the absolute
difference of dots is recorded. Simulate this experiment 600 times using minitab.
Find the frequency distribution of the absolute differences and find mean and
variance of this distribution.
Ans:

Commands to generate 600 times the absolute difference of two dice:

MTB > Random 600 'abs_dif';
SUBC> Integer 0 5.
MTB >

Tabulated statistics: abs_diff

Rows: abs_diff

Count % of Total

0 104 17.33
1 99 16.50
2 110 18.33
3 89 14.83
4 90 15.00
5 108 18.00
All 600 100.00

Descriptive Statistics: abs_diff

Total
Variable Count Mean StDev Variance
abs_diff 600 2.4767 1.7333 3.0045

Frequency Distribution Chart

Chart of abs_diff
120

100

80
Count

0
0 1 2 3 4 5
abs_diff

Name: M. Hashim Javed Roll: BT-588221

b) Compare the statistical packages SPSS and Minitab with respect to statistical data
analysis in social sciences and physical sciences.
Ans:

Comparison of Minitab and IBM SPSS

1. Easy to learn and easy to use. SPSS is menu-driven; the software is very easy to use. Like
Minitab, most of the functionalities in SPSS are organized into pull-down menus in a very
intuitive way. The learning curves for SPSS and Minitab are similar.

2. SPSS is generally stronger in statistical analysis, especially in some specific area, such as
ANOVA-related procedures. The add-on modules give SPSS further flexibility and
potentials to develop its capacities. However, for cutting-edge statistical analysis, SPSS is
stronger than Minitab. So SPSS is most suitable to you if your work involves large dataset,
frequent data management, and intermediate/partially-advanced statistical analysis.

3. SPSS Statistics is loaded with powerful analytic techniques and time-saving features to
help you quickly and easily find new insights in your data, so you can make more accurate
predictions and achieve better outcomes for your organization.

4. View interactive SPSS Statistics output on smart devices (smartphones and tablets) and
Generate presentation-ready output quickly and easily

5. Enhanced Monte Carlo simulation to improve model accuracy with

a. Ability to fit a categorical distribution to string fields
b. Support for Automatic Linear Modeling (ALM)
c. Generate heat maps automatically when displaying scatterplots in which the
target or the input, or both, are categorical
d. Automatically determine and use associations between categorical inputs when
generating data for those inputs
e. Generating data in the absence of a predictive model

6. SPSS Advanced Statistics offers generalized linear mixed models (GLMM), general linear
models (GLM), mixed models procedures, generalized linear models (GENLIN) and
generalized estimating equations (GEE) procedures.

Name: M. Hashim Javed Roll: BT-588221

Comparison of Minitab and IBM SPSS in ANOVA

Product One-way Two-way MANOVA GLM Mixed model Post-hoc Latin squares

Minitab Yes Yes Yes Yes No Yes Yes

SPSS Yes Yes Yes Yes Yes Yes Yes

Comparison of Minitab and IBM SPSS in Regression

Product OLS WLS 2SLS NLLS Logistic GLM LAD Step Quantile Probit Cox Poisson MLR
wise

Minitab Yes Yes No Yes Yes No No Yes No No No No No

SPSS Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes

OLS Ordinary Least Squares WLS Weighted Least Squares

2SLS 2 Stage least Squares NLLS Non Linear Least Squares
LAD Least Absolute Deviation GLM Generalized Linear Models
MLR Multiple Linear Regression

Comparison of Minitab and IBM SPSS for Operation System Support

Product Windows Mac OS Linux BSD Unix

Minitab Yes No No No No

SPSS Yes Yes Yes No No

Name: M. Hashim Javed Roll: BT-588221

Comparison of Minitab and IBM SPSS for Charts and Diagrams

Chart Bar chart Box plot Correlogram Histogram Line chart Scatterplot

Minitab Yes Yes Yes Yes Yes Yes

SPSS Yes Yes Yes Yes Yes Yes

Product Descriptive statistics Nonparametric statistics Quality Survival Data processing

control analysis

Base Normality CTA Nonparametric Cluster Discriminant BDP Ext.

stat test comparison, ANOVA analysis analysis

Minitab Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes

SPSS Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes

Name: M. Hashim Javed Roll: BT-588221

Q7.
a) Perform regression analysis to predict trade on the basis of other two variables on sample
dataset Employ.MTW. Also use matrix approach to do the same task. Furthermore,
calculate predicted values.

Ans:
Regression Analysis: Trade versus Food, Metals

The regression equation is

Trade = 67.1 + 0.225 Food + 5.90 Metals

Predictor Coef SE Coef T P

Constant 67.05 22.03 3.04 0.004
Food 0.2255 0.2336 0.97 0.339
Metals 5.9001 0.5034 11.72 0.000

S = 11.2084 R-Sq = 74.8% R-Sq(adj) = 74.0%

Analysis of Variance

Source DF SS MS F P
Regression 2 21302 10651 84.78 0.000
Residual Error 57 7161 126
Total 59 28463

Source DF Seq SS
Food 1 4046
Metals 1 17257

Unusual Observations

Obs Food Trade Fit SE Fit Residual St Resid

2 53.0 317.00 340.38 1.87 -23.38 -2.12R
60 57.7 396.00 363.86 1.95 32.14 2.91R

R denotes an observation with a large standardized residual.

Name: M. Hashim Javed Roll: BT-588221

Residuals vs Fits for Trade

Versus Fits
(response is Trade)
40

10
Residual

-10

-20

-30
310 320 330 340 350 360 370 380 390
Fitted Value

Conclusion: As the adjusted R squared value for this fitted model is 74.0% so, this
means that the factors Food and Metals explain 74 percent of the variation in Trade. So this
model is considered as a good fit model.

Using Matrix Approach for Regression

MTB > %"E:\AIOU\M.Sc\3rd Semester\1569\macros\regMATRIX.mac" 'Trade' 'Food' 'Metals'
Executing from file: E:\AIOU\M.Sc\3rd Semester\1569\macros\regMATRIX.mac

This macro is to find coefficients of a regression problem with

two independent variables. Dependent variable is stored in C1,
while independent variables are stored in C2 and C3

coefficients

Matrix m7

67.0512
0.2255
5.9001
coefficients by regr command

f
67.0512 0.2255 5.9001

Name: M. Hashim Javed Roll: BT-588221

b) Discuss the normality tests available in Minitab?
Ans:

Types of normality tests

The following are types of normality tests that you can use to assess normality.

Anderson-Darling test
This test compares the ECDF (empirical cumulative distribution function) of your sample
data with the distribution expected if the data were normal. If the observed difference is adequately
large, you will reject the null hypothesis of population normality.

Ryan-Joiner normality test

This test assesses normality by calculating the correlation between your data and the normal
scores of your data. If the correlation coefficient is near 1, the population is likely to be normal. The
Ryan-Joiner statistic assesses the strength of this correlation; if it is less than the appropriate critical
value, you will reject the null hypothesis of population normality. This test is similar to the Shapiro-
Wilk normality test.

Kolmogorov-Smirnov normality test

This test compares the ECDF (empirical cumulative distribution function) of your sample
data with the distribution expected if the data were normal. If this observed difference is adequately
large, the test will reject the null hypothesis of population normality. If the p-value of this test is less
than your chosen α, you can reject your null hypothesis and conclude that the population is
nonnormal.

Comparison of Anderson-Darling, Kolmogorov-Smirnov, and

Ryan-Joiner normality tests
Anderson-Darling and Kolmogorov-Smirnov tests are based on the empirical distribution
function. Ryan-Joiner (similar to Shapiro-Wilk) is based on regression and correlation.

All three tests tend to work well in identifying a distribution as not normal when the
distribution is skewed. All three tests are less distinguishing when the underlying distribution is a t-
distribution and nonnormality is due to kurtosis. Usually, between the tests based on the empirical
distribution function, Anderson-Darling tends to be more effective in detecting departures in the tails
of the distribution. Usually, if departure from normality at the tails is the major problem, many
statisticians would use Anderson-Darling as the first choice.

NOTE: If you are checking normality to prepare for a normal capability analysis, the tails are the
most critical part of the distribution.

Name: M. Hashim Javed Roll: BT-588221

Q8. Perform the following tasks in R.
a) A psychology class performed an experiment to compare whether a recall score in which
instructions to form images of 25 words were given is better than an initial recall score for
which no images instruction were given. Twelve students participated in the experiment
with the following results:

With
20 24 20 18 22 19 20 19 17 21 17 20
Imagery
Without
5 9 5 9 6 11 8 11 7 9 8 16
Imagery
Does it appear that the average recall score is higher when imaginary is used? Also
construct 95% confidence interval for the difference between the mean of both the imageries and
interpret the results.

Data Input
> withImagery=c(20, 24, 20, 18, 22, 19, 20, 19, 17, 21, 17, 20)
> withoutimg=c(5,9,5,9,6,11,8,11,7,9,8,16)

T test Command
> t.test(withImagery, withoutimg, paired = TRUE, alternative = "two.sided")

Paired t-test

data: withImagery and withoutimg

t = 10.365, df = 11, p-value = 5.159e-07
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
8.729917 13.436750
sample estimates:
mean of the differences
11.08333

Name: M. Hashim Javed Roll: BT-588221

b) Consider the experiment in which two fair dice are tossed and the absolute difference of
dots is recorded. Simulate this experiment 600 times. Find the frequency distribution of
the absolute differences and find mean and variance of this distribution.

Frequency Distribution
die_diff = floor(runif(600, min=0, max=6))
freq <- data.frame(table(die_diff))
relFreq <- data.frame(prop.table(table(die_diff)))
relFreq$Relative_Freq <- relFreq$Freq
relFreq$Freq <- NULL
Cumulative_Freq <- cumsum(table(die_diff))
z <- cbind(merge(freq, relFreq), Cumulative_Freq)
z$Cumulative_Relative_Freq <- z$Cumulative_Freq / sum(z$Freq)
print(z)
die_diff Freq Relative_Freq Cumulative_Freq Cumulative_Relative_Freq
0 0 94 0.1566667 94 0.1566667
1 1 104 0.1733333 198 0.3300000
2 2 108 0.1800000 306 0.5100000
3 3 99 0.1650000 405 0.6750000
4 4 122 0.2033333 527 0.8783333
5 5 73 0.1216667 600 1.0000000

Mean and Variance

mean(die_diff)
[1] 2.45

var(die_diff)
[1] 2.675292

Name: M. Hashim Javed Roll: BT-588221

Full download (Ebook) Lean Six Sigma: A DMAIC Roadmap and Tools for Successful Improvements Implementation by Mohammad Al-Rifai ISBN 9781032688336, 9781032688329, 1032688335, 1032688327 pdf docx
100% (3)
Full download (Ebook) Lean Six Sigma: A DMAIC Roadmap and Tools for Successful Improvements Implementation by Mohammad Al-Rifai ISBN 9781032688336, 9781032688329, 1032688335, 1032688327 pdf docx
81 pages
Updated - STA416 - Project Guidelines
No ratings yet
Updated - STA416 - Project Guidelines
3 pages
Case Study (Weemow Lawn Service) : Presented To
No ratings yet
Case Study (Weemow Lawn Service) : Presented To
52 pages
Exercises - SPSS
No ratings yet
Exercises - SPSS
6 pages
Introduction To Statistics8
No ratings yet
Introduction To Statistics8
32 pages
Hypothesis Testing
No ratings yet
Hypothesis Testing
38 pages
Applied Longitudinal Analysis Lecture Notes
No ratings yet
Applied Longitudinal Analysis Lecture Notes
475 pages
APA 7th Edition Style Tutorial - APA Activity
No ratings yet
APA 7th Edition Style Tutorial - APA Activity
10 pages
Single Case Research Designs: Stri But Ion
No ratings yet
Single Case Research Designs: Stri But Ion
34 pages
Statistics Using Stata An Integrative Approach: Weinberg and Abramowitz 2016
No ratings yet
Statistics Using Stata An Integrative Approach: Weinberg and Abramowitz 2016
46 pages
Tutorial How To Run Panel Data Analysis by Using Stata
No ratings yet
Tutorial How To Run Panel Data Analysis by Using Stata
21 pages
Steps Quantitative Data Analysis
100% (1)
Steps Quantitative Data Analysis
4 pages
Statistics
No ratings yet
Statistics
27 pages
Data Analysis Using Spss
100% (2)
Data Analysis Using Spss
2 pages
Ch18 Multiple Regression
No ratings yet
Ch18 Multiple Regression
51 pages
How To Write Your Research Proposal: What Is Your Research Proposal Used For and Why Is It Important?
100% (1)
How To Write Your Research Proposal: What Is Your Research Proposal Used For and Why Is It Important?
5 pages
Spss Syllabus
No ratings yet
Spss Syllabus
2 pages
Module 5 - Ordinal Regression
No ratings yet
Module 5 - Ordinal Regression
55 pages
Statistics in Research
No ratings yet
Statistics in Research
11 pages
SPSS
No ratings yet
SPSS
90 pages
SPSS2 Workshop Handout 20200917
No ratings yet
SPSS2 Workshop Handout 20200917
17 pages
Survey Methods in Social Investigation
No ratings yet
Survey Methods in Social Investigation
1 page
Ancova - Using Spss
100% (1)
Ancova - Using Spss
12 pages
MSC Statistics
No ratings yet
MSC Statistics
36 pages
Define Statistics: Psychology
No ratings yet
Define Statistics: Psychology
6 pages
Research Method For Economist Full Lecture Note
No ratings yet
Research Method For Economist Full Lecture Note
142 pages
Nonparametric Testing in Excel PDF
No ratings yet
Nonparametric Testing in Excel PDF
72 pages
Spss Assignment
No ratings yet
Spss Assignment
1 page
Week 9 Data Analysis Using SPSS 33
0% (1)
Week 9 Data Analysis Using SPSS 33
82 pages
SPSS Data Analysis
100% (6)
SPSS Data Analysis
47 pages
Practice Test in Statistics
100% (2)
Practice Test in Statistics
3 pages
Pre Ph.D. (Education)
No ratings yet
Pre Ph.D. (Education)
10 pages
STP531 Course Syllabus Fall2013
No ratings yet
STP531 Course Syllabus Fall2013
2 pages
Asst. Prof. Florence C. Navidad, RMT, RN, M.Ed
100% (1)
Asst. Prof. Florence C. Navidad, RMT, RN, M.Ed
37 pages
Bowerman Regression CHPT 1
100% (1)
Bowerman Regression CHPT 1
18 pages
Spss
No ratings yet
Spss
50 pages
Statistics in Research 2018
No ratings yet
Statistics in Research 2018
8 pages
Spss Cheat Sheet
No ratings yet
Spss Cheat Sheet
2 pages
Starbucks Case Analysis
No ratings yet
Starbucks Case Analysis
45 pages
Assignment of Statistics
No ratings yet
Assignment of Statistics
3 pages
Two-Way Anova
No ratings yet
Two-Way Anova
19 pages
Instant Download Longitudinal Structural Equation Modeling 1st Edition Todd D. Little PDF All Chapter
100% (16)
Instant Download Longitudinal Structural Equation Modeling 1st Edition Todd D. Little PDF All Chapter
47 pages
Research 1
No ratings yet
Research 1
22 pages
Exploring Linkage Between Organizational Culture and Performance - A Case Study of Telesom Company Somaliland
No ratings yet
Exploring Linkage Between Organizational Culture and Performance - A Case Study of Telesom Company Somaliland
87 pages
Regression Analysis
100% (1)
Regression Analysis
280 pages
On The Theory of Scales of Measurement - S. S. Stevens
100% (3)
On The Theory of Scales of Measurement - S. S. Stevens
5 pages
Multiple Regression
0% (1)
Multiple Regression
41 pages
Statistical Computing Using Statistical Computing Using
No ratings yet
Statistical Computing Using Statistical Computing Using
128 pages
Applied Statistics II-2 and III
100% (1)
Applied Statistics II-2 and III
59 pages
Anova
67% (3)
Anova
55 pages
NVivo - Mind Map
No ratings yet
NVivo - Mind Map
1 page
Epsc 123 Statistical Methods in Edc
100% (1)
Epsc 123 Statistical Methods in Edc
34 pages
Statistical Methods
100% (1)
Statistical Methods
77 pages
Harvard SPSS Tutorial PDF
No ratings yet
Harvard SPSS Tutorial PDF
84 pages
Business Research Method
100% (1)
Business Research Method
138 pages
Research Texts 2012
No ratings yet
Research Texts 2012
16 pages
Types of Statistical Analysis
No ratings yet
Types of Statistical Analysis
2 pages
PSSC Maths Statistics Project Handbook Eff08 PDF
No ratings yet
PSSC Maths Statistics Project Handbook Eff08 PDF
19 pages
Data Analysis and Statistical Packages 1
No ratings yet
Data Analysis and Statistical Packages 1
19 pages
Variables Entered
No ratings yet
Variables Entered
2 pages
output-twoway
No ratings yet
output-twoway
10 pages
Tugas Ouput Regresi Sederhana
No ratings yet
Tugas Ouput Regresi Sederhana
4 pages
two way ANOVA
No ratings yet
two way ANOVA
10 pages
Q4 L1 Six Trigonometric Ratios
No ratings yet
Q4 L1 Six Trigonometric Ratios
14 pages
Error-Free Compression: Variable Length Coding
No ratings yet
Error-Free Compression: Variable Length Coding
13 pages
Ahlers Michael Lab 04
No ratings yet
Ahlers Michael Lab 04
8 pages
Residual Power Series Method For Obstacle Boundary Value Problems
No ratings yet
Residual Power Series Method For Obstacle Boundary Value Problems
5 pages
Wind Power Calculation
No ratings yet
Wind Power Calculation
8 pages
RECAP Graph Vocabulary
No ratings yet
RECAP Graph Vocabulary
5 pages
Mechatronics: C. Nicol, C.J.B. Macnab, A. Ramirez-Serrano
No ratings yet
Mechatronics: C. Nicol, C.J.B. Macnab, A. Ramirez-Serrano
12 pages
The Oxford Handbook of Quantitative Methods, Vol 1 Foundations
No ratings yet
The Oxford Handbook of Quantitative Methods, Vol 1 Foundations
32 pages
Programming Puzzles
No ratings yet
Programming Puzzles
3 pages
Application of The Cube-per-Order Index Rule For Stock Location in A Distribution Warehouse
100% (1)
Application of The Cube-per-Order Index Rule For Stock Location in A Distribution Warehouse
11 pages
A2 Module 4737 Decision Mathematics 2 Sample Pages
No ratings yet
A2 Module 4737 Decision Mathematics 2 Sample Pages
62 pages
Binary-Bot DIGITDIF
No ratings yet
Binary-Bot DIGITDIF
11 pages
Immediate download Elements of Partial Differential Equations 2 Exp Rev Edition Pavel Drábek ebooks 2024
100% (1)
Immediate download Elements of Partial Differential Equations 2 Exp Rev Edition Pavel Drábek ebooks 2024
77 pages
CBSE Physics Chapter 1 Units and Measurements Class 11 Notes PDF
No ratings yet
CBSE Physics Chapter 1 Units and Measurements Class 11 Notes PDF
22 pages
Compiler Construction - CS606 Power Point Slides Lecture 01
100% (1)
Compiler Construction - CS606 Power Point Slides Lecture 01
20 pages
Cryptographic Hash Functions
No ratings yet
Cryptographic Hash Functions
40 pages
General Relativity From An Action: 1 The Einstein-Hilbert Action
No ratings yet
General Relativity From An Action: 1 The Einstein-Hilbert Action
7 pages
Document 1
No ratings yet
Document 1
2 pages
Activity Coefficient at Infinte Dilution
No ratings yet
Activity Coefficient at Infinte Dilution
4 pages
BSC208 Revision Question
No ratings yet
BSC208 Revision Question
6 pages
Tos-Math 3 Q2
No ratings yet
Tos-Math 3 Q2
6 pages
Ut Ssrg1 Test-2 Code-A1
No ratings yet
Ut Ssrg1 Test-2 Code-A1
6 pages
GRADE 12 CALCULUS CUBIC SKETCHING
No ratings yet
GRADE 12 CALCULUS CUBIC SKETCHING
85 pages
22 Scientific Notation
No ratings yet
22 Scientific Notation
5 pages
Review of Vector Analysis
100% (2)
Review of Vector Analysis
56 pages
SP Practical Solution
No ratings yet
SP Practical Solution
10 pages
Rios Reyes Dissertation
No ratings yet
Rios Reyes Dissertation
164 pages