0% found this document useful (0 votes)
6 views16 pages

DA note

The document presents statistical analyses comparing gender differences in GPA and quiz scores, revealing significant differences favoring females in GPA and quiz2 scores. It also includes an ANOVA analysis of traveler satisfaction across different status levels, indicating significant differences between groups. Lastly, a regression analysis examines the relationship between graduation rates and various predictors, showing that 53% of the variation in graduation rates is explained by the model.

Uploaded by

Hai Dang Pham
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views16 pages

DA note

The document presents statistical analyses comparing gender differences in GPA and quiz scores, revealing significant differences favoring females in GPA and quiz2 scores. It also includes an ANOVA analysis of traveler satisfaction across different status levels, indicating significant differences between groups. Lastly, a regression analysis examines the relationship between graduation rates and various predictors, showing that 53% of the variation in graduation rates is explained by the model.

Uploaded by

Hai Dang Pham
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 16

Group Statistics

gender N Mean Std. Deviation Std. Error Mean

Female 64 102,03 13,896 1,737


total
Male 41 98,29 17,196 2,686

Independent Samples Test


Levene's Test for t-test for Equality of Means
Equality of
Variances

F Sig. t df Sig. (2- Mean Std. Error 95% Confidence


tailed) Difference Difference Interval of the
Difference

Lower Upper

Equal variances
2,019 ,158 1,224 103 ,224 3,739 3,053 -2,317 9,794
assumed
total
Equal variances
1,169 72,421 ,246 3,739 3,198 -2,637 10,114
not assumed

Sig. = signficiant = p-value = y nghia thong ke

0,05
GPA – Gender (chấm điểm template, đúng câu văn, có giả thuyết, khoa học, thống kê, và trong đó kết
quả chỉ chiếm ½ kết quả)

Group Statistics

gender N Mean Std. Deviation Std. Error Mean

Female 64 2,8967 ,74622 ,09328


gpa
Male 41 2,5949 ,76346 ,11923

Independent Samples Test

Levene's Test for t-test for Equality of Means


Equality of
Variances

F Sig. t df Sig. (2- Mean Std. Error 95% Confidence


tailed) Difference Difference Interval of the
Difference

Lower Upper

Equal variances
,331 ,566 2,004 103 ,048 ,30184 ,15062 ,00312 ,60056
assumed
gpa
Equal variances
1,994 83,974,049 ,30184 ,15138 ,00080 ,60288
not assumed

1.1 Levene’s test

 H0: The variances of the groups are equal (homogeneity of variance).

 H1: The variances of the groups are not equal (heterogeneity of variance).

 If the Sig. value in the Levene’s test < 0.05, the variances of the 2 groups are different, using the
t-test for Equality of Means result in the line Equal variances not assumed.

 If Sig. value in the Levene’s test > 0.05, then the variances of the two groups are not different,
using the t-test for Equality of Means result in the Equal variances assumed line.

 In the example above Sig. of the test F = 0,566 > 0.05  the variances of the two groups are not
different => use the t-test for Equality of Means result in the line Equal variances assumed.
1.2 T-test for Equality of Means

 H0: There is no difference between the means of the two groups.

 H1: The means of the two groups are not equal.

 If the Sig. value in the T-test for Equality of Means < 0.05  there is a significant difference in
the means of variable (A) between 2 groups.

 If the Sig. value in the T-test for Equality of Means > 0.05  there is no significant difference in
the means variable (A) between 2 groups.

 In the example above sig. = 0.048 < 0.05, so it is concluded that there is significant difference in
the means between 2 groups (Female and male)

Paired-sample t-test:

Paired Samples Statistics

Mean N Std. Deviation Std. Error Mean

quiz1 7,47 105 2,481 ,242


Pair 1
quiz2 7,98 105 1,623 ,158

Paired Samples Test

Paired Differences t df Sig. (2-

Mean Std. Std. Error 95% Confidence Interval tailed)

Deviation Mean of the Difference

Lower Upper

Pair quiz1 -
-,514 1,835 ,179 -,869 -,159 -2,872 104 ,005
1 quiz2

This paired-samples t test analysis indicates that for the 105 subjects, there is a significant difference in
the mean of the two quizzes (sig. = 0.005 < 0.05) (Có sự khác nhau có ý nghĩa thống kê về giá trị trung
bình của 2 bài quiz). In other words, the mean score on the second quiz (M = 7.98) was significantly
greater than the mean score on the first quiz (M = 7.47).
ANOVA
Descriptives

Overall, I am satisfied with the price performance ratio of Oddjob Airways.

N Mean Std. Deviation Std. Error 95% Confidence Interval for Minimum Maximum
Mean

Lower Bound Upper Bound

Blue 677 4,47 1,641 ,063 4,35 4,60 1 7

Silver 245 4,03 1,560 ,100 3,84 4,23 1 7

Gold 143 3,99 1,556 ,130 3,73 4,24 1 7

Total 1065 4,31 1,625 ,050 4,21 4,40 1 7

Đọc kết quả, the Blue status has the highest level of satisfaction, where as the Gold is the lowest. In the
meantime, the Silver stay between these 2 status

Test of Homogeneity of Variances

Overall, I am satisfied with the price performance


ratio of Oddjob Airways.

Levene Statistic df1 df2 Sig.

,907 2 1062 ,404

Step 1: Levene

Since sig. (0.404) > 0.05, the variances of the groups are not different. Therefore, we
use the result in the Anova table.

ANOVA

Overall, I am satisfied with the price performance ratio of Oddjob Airways.

Sum of Squares df Mean Square F Sig.

Between Groups 51,755 2 25,878 9,963 ,000

Within Groups 2758,455 1062 2,597

Total 2810,210 1064


Step 2: Anova

Since sig. (0.000) < 0.05, it is concluded that there is a statistically significantly difference between at
least 2 groups in the mean value of price satisfaction

Multiple Comparisons

Dependent Variable: Overall, I am satisfied with the price performance ratio of Oddjob Airways.

Tukey HSD

(I) Traveler Status (J) Traveler Status Mean DifferenceStd. Error Sig. 95% Confidence Interval
(I-J)
Lower Bound Upper Bound

Silver ,440* ,120 ,001 ,16 ,72


Blue
Gold ,487* ,148 ,003 ,14 ,83

Blue -,440* ,120 ,001 -,72 -,16


Silver
Gold ,047 ,170 ,959 -,35 ,44

Blue -,487* ,148 ,003 -,83 -,14


Gold
Silver -,047 ,170 ,959 -,44 ,35

*. The mean difference is significant at the 0.05 level.

Since sig. (0.001) < 0.05, it is concluded that there is a statisticsally significant difference in the mean
value of price satisfaction between the Blue and Silver groups.

Since sig. (0.003) < 0.05, it is concluded that there is a statisticsally significant difference in the mean
value of price satisfaction between the Blue and Silver groups.

Since sig. (0.959) > 0.05, it is concluded that there is no statisticsally significant difference in the mean
value of price satisfaction between the Blue and Silver groups.

(tham khảo bảng tiếng việt)


Robust Tests of Equality of Means

Overall, I am satisfied with the price performance


ratio of Oddjob Airways.

Statistica df1 df2 Sig.

Welch 10,230 2 345,211 ,000

a. Asymptotically F distributed.

Grade file: Ethincity and total (ANOVA)

Descriptives

total

N Mean Std. DeviationStd. Error 95% Confidence Interval for Minimum Maximum
Mean

Lower Bound Upper Bound

Native 5 95,20 17,094 7,645 73,98 116,42 75 115

Asian 20 102,90 12,876 2,879 96,87 108,93 78 123

Black 24 100,08 14,714 3,004 93,87 106,30 65 124

White 45 102,27 14,702 2,192 97,85 106,68 51 123

Hispanic 11 92,91 21,215 6,397 78,66 107,16 52 120

Total 105 100,57 15,299 1,493 97,61 103,53 51 124

The Asian has the highest level of total mean value. The White is second-highest total mean value. The
Black, Native and Hispanic are ranked at 3th, 4th and last.

Test of Homogeneity of Variances

total

Levene Statistic df1 df2 Sig.

,930 4 100 ,450


Since sig. (0.450) > 0.05, the variances of the groups are not different. Therefore, we use the result in the
Anova table.

ANOVA

total

Sum of Squares df Mean Square F Sig.

Between Groups 1033,572 4 258,393 1,109 ,357

Within Groups 23310,142 100 233,101

Total 24343,714 104

Since sig. (0.357) > 0.05, it is concluded that there is no difference between the ethic groups in the mean
value of the total variables.

Multiple Comparisons

Dependent Variable: total

Tukey HSD

(I) ethnicity (J) ethnicity Mean Difference Std. Error Sig. 95% Confidence Interval
(I-J)
Lower Bound Upper Bound

Asian -7,700 7,634 ,851 -28,91 13,51

Black -4,883 7,506 ,966 -25,74 15,97


Native
White -7,067 7,197 ,863 -27,06 12,93

Hispanic 2,291 8,235 ,999 -20,59 25,17

Native 7,700 7,634 ,851 -13,51 28,91

Black 2,817 4,623 ,973 -10,03 15,66


Asian
White ,633 4,103 1,000 -10,77 12,03

Hispanic 9,991 5,731 ,413 -5,93 25,91

Black Native 4,883 7,506 ,966 -15,97 25,74

Asian -2,817 4,623 ,973 -15,66 10,03


White -2,183 3,859 ,980 -12,90 8,54

Hispanic 7,174 5,559 ,698 -8,27 22,62

Native 7,067 7,197 ,863 -12,93 27,06

Asian -,633 4,103 1,000 -12,03 10,77


White
Black 2,183 3,859 ,980 -8,54 12,90

Hispanic 9,358 5,135 ,367 -4,91 23,62

Native -2,291 8,235 ,999 -25,17 20,59

Asian -9,991 5,731 ,413 -25,91 5,93


Hispanic
Black -7,174 5,559 ,698 -22,62 8,27

White -9,358 5,135 ,367 -23,62 4,91


Regression

Y = BoX1 + B1X2 + …

Linear

- Simple linear regression: only one independent variable


- Multiple linear regression: more than 2

Chạy SPSS thì tìm regression Coefficient (Beeta)

Analyze mối quan hệ giữa Graduation rate với Acceptance Rate, Expenditure, Top 10% HS, Median SAT.

Model Summary

Model R R Square Adjusted R Std. Error of the


Square Estimate

1 ,731a ,534 ,492 5,308

a. Predictors: (Constant), Top 10% HS, Median SAT,


Expenditures/Student, Acceptance Rate

53% of the variation in the dependent variable (Graduation %) is explained by the independent
variables in the model (specifically. Top10% HS, Median SAT, Expenditures/Student,
Acceptance Rate)

49% of the variation in the dependent variable (Graduation %) is explained by the independent
variables in the model (specifically top 10% HS, Median SAT, Expenditure/Student, Acceptance
Rate); adjusting for sample size and the number of independent variables in the model.
ANOVAa

Model Sum of Squares df Mean Square F Sig.

Regression 1423,209 4 355,802 12,627 ,000b

1 Residual 1239,852 44 28,178

Total 2663,061 48

a. Dependent Variable: Graduation %


b. Predictors: (Constant), Top 10% HS, Median SAT, Expenditures/Student, Acceptance Rate

Used to the test overall significant of a regression model


Sig. compared to 0.05
If sig. < 0.05  The overall model is statistiscally significant at the 5% significance level
If sig. > 0.05  The overall model is not statistiscally significant at the 5% significance level

Coefficientsa

Model Unstandardized Coefficients Standardized t Sig.


Coefficients

B Std. Error Beta

(Constant) 17,921 24,557 ,730 ,469

Median SAT ,072 ,018 ,606 4,004 ,000

1 Acceptance Rate -,249 ,083 -,446 -2,990 ,005

Expenditures/Student ,000 ,000 -,282 -2,057 ,046

Top 10% HS -,163 ,079 -,296 -2,051 ,046

a. Dependent Variable: Graduation %

Notes:
The signs of Unstandardize and Standardized coefficients are always the same
Unstandardized Coefficients: coefficient that still retain the original units of the variable
Standardized Coefficients: Coefficients that have been standardized, no longer retain the original
units of the variable
To explain the economic meaning, Unstandardized Coefficients will be used.
Cách đọc kết quả hồi quy

Bước 1: Xem hệ số hồi quy có ý nghĩa thống kê hay không?


(Step 1: Check if the regression coefficient is statistically significant?)

 Sig. = significant = p-value (ý nghĩa thống kê mức ý nghĩa thống kê phổ biến là 0.05)

 Nếu p < 0.05 → hệ số hồi quy có ý nghĩa thống kê → bước 2

 Nếu p > 0.05 → không có ý nghĩa thống kê → X không có tác động lên Y

Bước 2: Xem tác động tích cực hay tiêu cực?

 Nếu dấu của hệ số (+): X có tác động tích cực lên Y.

o Ví dụ: Điểm Median SAT có tác động tích cực lên tỷ lệ tốt nghiệp (graduation).

 Nếu dấu của hệ số (-): X có tác động tiêu cực/ngược chiều lên Y.

Để giải thích ý nghĩa kinh tế, sẽ dùng Unstandardized Coefficients

If X increases by 1 unit, Y increases how many units? keeping other factors unchanged
Nếu X tăng 1 đơn vị, Y tăng bao nhiêu đơn vị, giữ nguyên các yếu tố khác không thay đổi

Nếu điểm trung vị SAT (median SAT) tăng lên 1 điểm, thì tỷ lệ tốt nghiệp tăng lên 0.072%, giữ nguyên các
yếu tố khác không thay đổi.

Nếu tỷ lệ chấp thuận (acceptance rate) tăng 1%, thì tỷ lệ tốt nghiệp giảm 0.249%, giữ nguyên các yếu tố
khác không thay đổi.

Graduation = 17.92 + 0.072MedianSAT – 0.249AcceptanceRate – 0.000136Expenditures/Student –


0.163Top10%HS

Correlation
Correlations

S1 S2 S3 S4

Pearson Correlation 1 ,739** ,619** ,717**

S1. Sig. (2-tailed) ,000 ,000 ,000

N 1038 1037 952 1033

Pearson Correlation ,739** 1 ,694** ,766**

S2 Sig. (2-tailed) ,000 ,000 ,000

N 1037 1040 952 1034

Pearson Correlation ,619** ,694** 1 ,645**

S3 Sig. (2-tailed) ,000 ,000 ,000

N 952 952 954 951

Pearson Correlation ,717** ,766** ,645** 1

S4 Sig. (2-tailed) ,000 ,000 ,000

N 1033 1034 951 1035

**. Correlation is significant at the 0.01 level (2-tailed).

Check Đa cộng tuyến trước khi chạy Regression

We will run the multicollinearity test before we run the regression.

◦ If there is no problem of multicollinearity, we continue with the regression estimation.

◦ If there is a problem of multicollinearity, we run the regression estimation with highly-


correlated independent variables in separate regressions.

Y = X1 + X2 + X3 + X4

Module 1: Y = X1 + X3 + X4

Module 2: Y = X2 + X3 + X4

Nếu nặng quá thì thu thập lại dữ liệu hoặc xây lại mô hình.
Correlations

Median SAT Acceptance Expenditures/Student Top 10% HS


Rate

Pearson
1 -,602** ,573** ,503**
Correlation
Median SAT
Sig. (2-tailed) ,000 ,000 ,000

N 49 49 49 49

Pearson
-,602** 1 -,284* -,610**
Correlation
Acceptance Rate
Sig. (2-tailed) ,000 ,048 ,000

N 49 49 49 49

Pearson
,573** -,284* 1 ,506**
Correlation
Expenditures/Student
Sig. (2-tailed) ,000 ,048 ,000

N 49 49 49 49

Pearson
,503** -,610** ,506** 1
Correlation
Top 10% HS
Sig. (2-tailed) ,000 ,000 ,000

N 49 49 49 49

**. Correlation is significant at the 0.01 level (2-tailed).

*. Correlation is significant at the 0.05 level (2-tailed).


End-of-semester presentation

Mid-term Exam: 3/3/2025

4 questions:

Chạy Data => Results

Analyze => theo yêu cầu đề bài (viết đầy đủ, chi tiết).

Regression with categorical variables

Dummy Variables:

- 0
- 1
-
Coefficientsa

Model Unstandardized Coefficients Standardized t Sig.


Coefficients

B Std. Error Beta

(Constant) 893,588 1824,575 ,490 ,628

1 Age 1044,146 42,141 ,975 24,777 ,000

MBA 14767,232 1351,802 ,430 10,924 ,000

a. Dependent Variable: Salary

The coefficient of MBA is statistically significant and positive. This indicates that having an MBA
degree has a positive association/influence on salary. (Hệ số của biến MBA có ý nghĩa thống kê
và có dấu dương. Điều này cho thấy có bằng MBA có tác động tích cực lên lương.)

If an employee has an MBA degree, the salary increases by 14767.23 USDs compared with the one
without an MBA degree, keeping all other independent variables constant. (Nếu một người nhân
viên có bằng MBA, người này có lương cao hơn 14767.23 USD so với người không có bằng MBA,
giữ nguyên các yếu tố khác không thay đổi.)
Moderating effect Postive effect

Age  Salary
Negative effect

s no effect
Interaction (biến tương tác)

Age_MB=Age*MBA
Transform  Compute  Age_MBA  Age * MBA  Okay

 Run regression again with new independent

Coefficientsa

Model Unstandardized Coefficients Standardized t Sig.


Coefficients

B Std. Error Beta

(Constant) 3902,509 1336,398 2,920 ,006

Age 971,309 31,069 ,907 31,263 ,000


1
MBA -2971,080 3026,242 -,086 -,982 ,334

Interaction 501,848 81,552 ,531 6,154 ,000

a. Dependent Variable: Salary

The results shows that the p

2 rules khi chạy hồi qui biến tương tác:

1. When regressing with interaction variables, it is necessary to include components variables (e.g:
Age, MBA)
2. When regressing with interaction variables, do not read the results of components variables (e.g:
Age, MBA)

You might also like