Assignment # 1
Assignment # 1
01
ASSIGNMENT No.
Submission Date
Due Date
Signature*
a) Use catalog.sav sample data file to fit a multiple linear model to predict "Sales of Men's
Clothing" on the basis of varibales "Number of Catalogs Mailed", "Number of Pages in
Catalog", "Number of Phone Lines Open for Ordering", "Amount Spent on Print
Advertising" and "Number of Customer Service Representatives".
Use forward selection, backward elimination and enter methods in this respect. Interpret
your result in each case.
Variables Entered/Removeda
Model Summarye
Conclusion:
The model has been run 4 times, 1st included variable ‘mail’ and the adjusted R2 is 0.642,
and 2nd time the variable ‘phone’ is included and the adjusted R2 is 0.766, 3rd time the variable
‘print’ is included and the adjusted R2 is 0.778 and 4th time the variable ‘page’ is included and the
adjusted R2 is 0.787.
Conclusion:
The model has been run 4 times, 1st time for including variable ‘mail’ and the
significance is 0.000, and 2nd time for including variable ‘phone’ is included and the significance is
0.000, 3rd time for including variable ‘print’ is included and the significance is 0.000 and 4th time for
including variable ‘page’ and the significance is 0.000. as every time the model showed that the
variable included has significant relation with dependent variable as their p-value is less than 0.05.
Coefficientsa
Excluded Variablesa
Tolerance
Residuals Statisticsa
No 0
Yes 1
Classification Tablea,b
Observed Predicted
default Percentage
No Yes Correct
No 517 0 100.0
default
Step 0 Yes 183 0 .0
Score df Sig.
ed 9.205 1 .002
Chi-square df Sig.
Model Summary
Classification Tablea
Observed Predicted
default Percentage
No Yes Correct
No 483 34 93.4
default
Step 1 Yes 122 61 33.3
First step is to select those cases only, for which the life-style is recorded as active. [active=1]
bfast * agecat Crosstabulation
Count
agecat Total
Cereal 51 45 38 17 151
The frequency distribution of those senior citizens who are living an active life is highlighted in yellow.
It is seen that senior citizens who are living an active life prefer Oatmeal most of the time.
Cases
marital Total
Unmarried Married
Chi-Square Tests
a. 0 cells (0.0%) have expected count less than 5. The minimum expected count is 79.54.
Conclusion:
As the p-value (0.000) is less than our chosen significance level (α=0.05), so we reject the null
hypothesis and accept the alternate hypothesis that there is association between Preferred Breakfast
and Marital Status.
Cases
agecat Total
bfast
Oatmeal 4 24 97 185 310
Cereal 93 92 95 59 339
Total 181 206 231 262 880
Chi-Square Tests
a. 0 cells (0.0%) have expected count less than 5. The minimum expected count is 47.51.
Conclusion:
As the p-value (0.000) is less than our chosen significance level (α=0.05), so we reject the null
hypothesis and accept the alternate hypothesis that there is association between Preferred Breakfast
and Age Category.
T-Test
One-Sample Statistics
One-Sample Test
Lower Upper
Conclusion:
As the p-value (0.000) is less than our chosen significance level (α=0.1), so it is concluded that
the mean of Amount Spent is not equal and is significantly different from 105. And the 90%
Confidence interval for the mean of Amount Spent is [-7.1986 – -2.9338]
Part 2: Also test that the mean of amount spent by both male and female customers is
equal.
T-TEST GROUPS=gender(0 1)
/MISSING=ANALYSIS
/VARIABLES=amtspent
/CRITERIA=CI(.95).
T-Test
Group Statistics
Lower Upper
Equal variances
.458 .499 6.313 1402 .000 16.15930 2.55971 11.13803 21.18058
assumed
amtspent
Equal variances
6.332 1397.923 .000 16.15930 2.55218 11.15280 21.16581
not assumed
Conclusion:
We will use the top row (Equal variances assumed) as the p-value of Leven’s test is above 0.05
As the p-value (0.000) is less than our chosen significance level (α=0.05), so it is concluded that the
mean of Amount Spent is significantly different for both male and female customers. And the 95%
Confidence interval for the mean of Amount Spent is [11.138 – 2.9338]
Part 3: What would to say about the equality of means for amount spent on stores of
different sizes?
Conclusion:
As the p-value (0.014) for is less than our chosen significance level (α=0.05), so it is concluded
that the mean of Amount Spent is not equal and is significantly different for both male and female
customers. And the 95% Confidence interval for the mean of Amount Spent is [11.138 – 2.9338]
WHODAS 2.0
WORLD HEALTH ORGANIZATION DISABILITY ASSESSMENT
SCHEDULE 2.0
Think back over the past 30 days and answer these questions, thinking about how much difficulty you had doing the
following activities. For each question, please circle only one response.
In the past 30 days, how much difficulty did you have in:
S1 Standing for long periods such as 30 None Mild Moderate Severe Extreme
minutes? or cannot
do
S3 Learning a new task, for example, None Mild Moderate Severe Extreme
learning how to get to a new place? or cannot
do
S4 How much of a problem did you have None Mild Moderate Severe Extreme
joining in community activities (for or cannot
example, festivities, religious or other do
activities) in the same way as anyone
else can?
S5 How much have you been emotionally None Mild Moderate Severe Extreme
affected by your health problems? or cannot
do
S6 Concentrating on doing something for ten None Mild Moderate Severe Extreme
minutes? or cannot
do
S10 Dealing with people you do not know? None Mild Moderate Severe Extreme
or cannot
do
H2 In the past 30 days, for how many days were you totally unable
to carry out your usual activities or work because of any health Record number of days
condition?
H3 In the past 30 days, not counting the days that you were totally
unable, for how many days did you cut back or reduce your Record number of days
usual activities or work because of any health condition?
Descriptive Statistics
Descriptive Statistics
Lower Upper
Equal variances
.050 .825 .052 28 .959 .125 2.427 -4.846 5.096
assumed
overal_score
Equal variances
.054 13.517 .958 .125 2.328 -4.884 5.134
not assumed
ANOVA
overal_score
Conclusion:
The mean of recall score when imagery is 19.75 and without using imagery it is 8.667, it is
clearly obvious that the score is better when using imagery, We also tested it statistically and results
are as the p-value (0.000) for paired t-test of mean difference is less than our chosen significance
level (α=0.05), so it is concluded that the mean difference of Recall Score with Imagery and Recall
Score without Imagery is significant. And the 95% Confidence interval for the mean difference of
Recall Score is [8.73 – 13.44].
c) Generate 4 samples of sizes 5, 6, 7 and 7 from normal populations with means 45, 40, 47
and 38 respectively. While the standard deviations of these distributions are 4, 6, 7 and 8
respectively. Test the equality of means.
Ans:
One-way ANOVA: Population versus Factor
Source DF SS MS F P
Factor 3 595.3 198.4 4.48 0.014
Error 21 929.3 44.3
Total 24 1524.6
Conclusion:
The F-Statistic value is 4.48 and as the p-value (0.014) is less than our chosen significance level
(α=0.05), so it is concluded that all of the means are not equal, at least one of the means is
significantly different.
Source DF SS MS F P
fac 4 740.14 185.03 44.59 0.000
Error 20 83.00 4.15
Total 24 823.13
Conclusion:
The F-Statistic value is 44.59 and as the p-value (0.000) is less than our chosen significance level
(α=0.05), so it is concluded that all of the means are not equal, at least one of the means is
significantly different.
Rows: abs_diff
Count % of Total
0 104 17.33
1 99 16.50
2 110 18.33
3 89 14.83
4 90 15.00
5 108 18.00
All 600 100.00
Total
Variable Count Mean StDev Variance
abs_diff 600 2.4767 1.7333 3.0045
Chart of abs_diff
120
100
80
Count
60
40
20
0
0 1 2 3 4 5
abs_diff
2. SPSS is generally stronger in statistical analysis, especially in some specific area, such as
ANOVA-related procedures. The add-on modules give SPSS further flexibility and
potentials to develop its capacities. However, for cutting-edge statistical analysis, SPSS is
stronger than Minitab. So SPSS is most suitable to you if your work involves large dataset,
frequent data management, and intermediate/partially-advanced statistical analysis.
3. SPSS Statistics is loaded with powerful analytic techniques and time-saving features to
help you quickly and easily find new insights in your data, so you can make more accurate
predictions and achieve better outcomes for your organization.
4. View interactive SPSS Statistics output on smart devices (smartphones and tablets) and
Generate presentation-ready output quickly and easily
6. SPSS Advanced Statistics offers generalized linear mixed models (GLMM), general linear
models (GLM), mixed models procedures, generalized linear models (GENLIN) and
generalized estimating equations (GEE) procedures.
Product One-way Two-way MANOVA GLM Mixed model Post-hoc Latin squares
Product OLS WLS 2SLS NLLS Logistic GLM LAD Step Quantile Probit Cox Poisson MLR
wise
SPSS Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes
Minitab Yes No No No No
Chart Bar chart Box plot Correlogram Histogram Line chart Scatterplot
Minitab Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes
SPSS Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes
Ans:
Regression Analysis: Trade versus Food, Metals
Analysis of Variance
Source DF SS MS F P
Regression 2 21302 10651 84.78 0.000
Residual Error 57 7161 126
Total 59 28463
Source DF Seq SS
Food 1 4046
Metals 1 17257
Unusual Observations
Versus Fits
(response is Trade)
40
30
20
10
Residual
-10
-20
-30
310 320 330 340 350 360 370 380 390
Fitted Value
Conclusion: As the adjusted R squared value for this fitted model is 74.0% so, this
means that the factors Food and Metals explain 74 percent of the variation in Trade. So this
model is considered as a good fit model.
coefficients
Matrix m7
67.0512
0.2255
5.9001
coefficients by regr command
f
67.0512 0.2255 5.9001
Anderson-Darling test
This test compares the ECDF (empirical cumulative distribution function) of your sample
data with the distribution expected if the data were normal. If the observed difference is adequately
large, you will reject the null hypothesis of population normality.
All three tests tend to work well in identifying a distribution as not normal when the
distribution is skewed. All three tests are less distinguishing when the underlying distribution is a t-
distribution and nonnormality is due to kurtosis. Usually, between the tests based on the empirical
distribution function, Anderson-Darling tends to be more effective in detecting departures in the tails
of the distribution. Usually, if departure from normality at the tails is the major problem, many
statisticians would use Anderson-Darling as the first choice.
NOTE: If you are checking normality to prepare for a normal capability analysis, the tails are the
most critical part of the distribution.
With
20 24 20 18 22 19 20 19 17 21 17 20
Imagery
Without
5 9 5 9 6 11 8 11 7 9 8 16
Imagery
Does it appear that the average recall score is higher when imaginary is used? Also
construct 95% confidence interval for the difference between the mean of both the imageries and
interpret the results.
Data Input
> withImagery=c(20, 24, 20, 18, 22, 19, 20, 19, 17, 21, 17, 20)
> withoutimg=c(5,9,5,9,6,11,8,11,7,9,8,16)
T test Command
> t.test(withImagery, withoutimg, paired = TRUE, alternative = "two.sided")
Paired t-test
Frequency Distribution
die_diff = floor(runif(600, min=0, max=6))
freq <- data.frame(table(die_diff))
relFreq <- data.frame(prop.table(table(die_diff)))
relFreq$Relative_Freq <- relFreq$Freq
relFreq$Freq <- NULL
Cumulative_Freq <- cumsum(table(die_diff))
z <- cbind(merge(freq, relFreq), Cumulative_Freq)
z$Cumulative_Relative_Freq <- z$Cumulative_Freq / sum(z$Freq)
print(z)
die_diff Freq Relative_Freq Cumulative_Freq Cumulative_Relative_Freq
0 0 94 0.1566667 94 0.1566667
1 1 104 0.1733333 198 0.3300000
2 2 108 0.1800000 306 0.5100000
3 3 99 0.1650000 405 0.6750000
4 4 122 0.2033333 527 0.8783333
5 5 73 0.1216667 600 1.0000000
mean(die_diff)
[1] 2.45
var(die_diff)
[1] 2.675292