0% found this document useful (0 votes)
16 views27 pages

analyze and comparison

The document presents statistical data on gender and ethnicity within a sample population of 105 individuals, revealing that females constitute 61% and Whites make up 42.9%. It includes various analyses, such as t-tests and ANOVA, demonstrating significant differences in mean values for GPA and price satisfaction based on gender and traveler status, respectively. Overall, the findings highlight notable disparities in gender representation and ethnic composition, as well as significant differences in various performance metrics.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views27 pages

analyze and comparison

The document presents statistical data on gender and ethnicity within a sample population of 105 individuals, revealing that females constitute 61% and Whites make up 42.9%. It includes various analyses, such as t-tests and ANOVA, demonstrating significant differences in mean values for GPA and price satisfaction based on gender and traveler status, respectively. Overall, the findings highlight notable disparities in gender representation and ethnic composition, as well as significant differences in various performance metrics.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 27

gender

Frequen Valid Cumulative


cy Percent Percent Percent
Vali Femal 64 61.0 61.0 61.0
d e
Male 41 39.0 39.0 100.0
Total 105 100.0 100.0

Percent với valid percent giống nhau


Cumulative percent: cộng dồn 61 vs 39
Comparative: The table presents a gender comparison based on frequency,
percentage, and cumulative percentage. The table clearly indicates a predominance
of females in the population. This gender disparity is significant, with females
outnumbering males by approximately 23 individuals. The table clearly shows
make up the majority of the population. The number of females over males is
around 23, indicating a considerable gender gap. The data suggests a larger
predominance of females (61%) compared to males (39%) within the sample
population.
The data indicates a higher prevalence of females (61%) compared to males (39%)
within the sample population.
Cumulative Percentage: As females are the first category, their cumulative
percentage matches their individual percentage (61%). Males, being the second
category, have a cumulative percentage of 100%, indicating that all individuals in
the sample are accounted for.

Case 1: the table shows that females make up majority of the sample at 61%, while
males constitute 39%
Case 2: this table indicates that females represent 61% of the sample, making them
the majority, while males account for 39%
Case 3: the data indicates that women comprise the majority of the sample,
accounting for 61% of participants, while men represents the remaining 39%
ethnicity
Frequen Valid Cumulative
cy Percent Percent Percent
Vali Native 5 4.8 4.8 4.8
d Asian 20 19.0 19.0 23.8
Black 24 22.9 22.9 46.7
White 45 42.9 42.9 89.5
Hispani 11 10.5 10.5 100.0
c
Total 105 100.0 100.0

Analyze:
The provided table offers a comprehensive snapshot of the ethnic composition
within a specific population. The table clearly indicates a predominance of Whites
in the population. This group represents the largest segment, accounting for nearly
half of the total individuals 42.9%. followedd by black and asians, account for
22.9% and 19% respectively.
The table shows that the largest group in the sample is White, comprising 42.9%,
followed by Black individuals at 22.9%. Asian make up 19& of the sample,
Hispanic account for 10.5%, and Native represent the smallest group at 4.8%.
The demographic breakdown (thuat ngu chuyen nganh) reveals that White
individuals constitute the largest group in the sample at 42.9%. Black participants
form the second largest group, representing 22.9% of the sample. Asian
participants represent the smallest group, making up 4.8% of the sample.
3.
Để hiện percentage, vô analyze -> descriptive ana-> crosstab-> cells-> chọn row ở
mục percentage

Row:
gender * ethnicity Crosstabulation
ethnicity
Hispani
Native Asian Black White c Total
gender Femal Count 4 13 14 26 7 64
e % within 6.3% 20.3% 21.9% 40.6% 10.9% 100.0%
gender
Male Count 1 7 10 19 4 41
% within 2.4% 17.1% 24.4% 46.3% 9.8% 100.0%
gender
Total Count 5 20 24 45 11 105
% within 4.8% 19.0% 22.9% 42.9% 10.5% 100.0%
gender

Females make up the majority of the sample, and their distribution across ethnic
groups shows that the largest proportion are White (40.6%), followed by Black
(21.9%), Asian (20.3%), Hispanic (10.9%), and Native (6.3%).
Males also have the highest representation in the White ethic group (46.3%),
followed by Black (24.4%), Asian (17.1%), Hispanic (9.8%), and Native (2.4%)

Column:

gender * ethnicity Crosstabulation


ethnicity
Hispani
Native Asian Black White c Total
gender Femal Count 4 13 14 26 7 64
e % within 80.0% 65.0% 58.3% 57.8% 63.6% 61.0%
ethnicity
Male Count 1 7 10 19 4 41
% within 20.0% 35.0% 41.7% 42.2% 36.4% 39.0%
ethnicity
Total Count 5 20 24 45 11 105
% within 100.0% 100.0% 100.0% 100.0% 100.0% 100.0%
ethnicity

GRAPH AND CHART:


Để tạo bar chart: Graphs -> chọn Bar-> kéo hình đầu tiên màu vàng dô khung ở
trên -> kéo gender vô ở dưới cùng->ok
Để hiện thông số ta double click hình results->element-> show data lables

Để hiện %, ta chọn
Bar 2: 1 categorical variable
T1: continuous variable ( chia cho mean)
Graph-> chart builder-> bar-> gender hang ngang, total hang doc-> ok
The bar chart reveals some variability in the mean total among different ethnicities.
While the overall average is around 100, the Asian ethnicity demonstrates a
slightly higher mean total compared to the other groups. The Hispanic ethnicity, on
the other hand, shows a lower mean total. The remaining ethnicities exhibit
relatively similar values, indicating a relatively homogeneous distribution of the
mean total within those groups. The Asian ethnicity has the highest mean total at
approximately 103. This suggests that individuals from the Asian ethnicity tend to
have higher total scores compared to the other groups. On the other hand, The
Hispanic ethnicity has the lowest mean total at around 93, which shows that this
group tend to have the least students to have good scores. The remaining
ethnicities (Native, Black, and White) have relatively similar mean totals, ranging
from 95 to 102. indicating that there is less variation in scores among these groups.

LINE GRAPH
PIE CHART
SCATTER PLOT
Relationship between 2 variables: positive, negative, no relationship

 relationship btw these 2 variables is negative

T-TEST:
Sig. = significant = p-value (ý nghĩa thống kê); mức ý nghĩa thống kê phổ biến là
0.05
Không có trường hợp =0.05

Group Statistics
Std. Std. Error
gender N Mean Deviation Mean
total Femal 64 102.03 13.896 1.737
e
Male 41 98.29 17.196 2.686

Độ lệch chuẩn nam > nữ

Independent Samples Test


Levene's
Test for
Equality
of
Variances t-test for Equality of Means
95%
Confidence
Interval of
Sig. Std. the
(2- Mean Error Difference
tailed Differenc Differenc Lowe
F Sig. t df ) e e r Upper
tota Equal 2.01 .15 1.22 103 .224 3.739 3.053 - 9.794
l variance 9 8 4 2.317
s
assumed
Equal 1.16 72.42 .246 3.739 3.198 - 10.11
variance 9 1 2.637 4
s not
assumed

Bước 1: Levene’s Test for Equality of Variances


Vì sig. = 0.158 > 0.05 nên phương sai của 2 tổng thể ko khác nhau. Do đó, sdung
kết quả t test for equality of means ở dòng equal variance assumed
Since Sig. of the test = 0.158 > 0.05  the variances of the two groups are not
different. Therefore, use the t-test for Equality of Means in the line “Equal
variances assumed”.
Bước 2: t-test for equality of means
Vì sig.=0.224 > 0,05 nên kết luận ko có sự khác biệt có ý nghĩa thống kê về giá trị
trung bình của biến total giữa 2 nhóm giới tính nam nữ.
sig. = 0.224 > 0.05, so it is concluded that there is no significant difference in
means of the variable “total” between 2 groups (Female and male)

EXERCISE:

Group Statistics
Std. Std. Error
gender N Mean Deviation Mean
gpa Femal 64 2.8967 .74622 .09328
e
Male 41 2.5949 .76346 .11923

Independent Samples Test


Levene's t-test for Equality of Means
Test for
Equality
of
Variance
s
95%
Confidence
Interval of
Sig. the
(2- Mean Std. Error Difference
tailed Differenc Differenc Lowe
F Sig. t df ) e e r Upper
gp Equal .33 .56 2.00 103 .048 .30184 .15062 .0031 .6005
a variance 1 6 4 2 6
s
assumed
Equal 1.99 83.97 .049 .30184 .15138 .0008 .6028
variance 4 4 0 8
s not
assumed

Since Sig. of the test = 0.556 > 0.05  the variances of the two populations are not
different. Therefore, use the result of t-test for Equality of Means in the line “Equal
variances assumed”.
sig. = 0.048 < 0.05, so it is concluded that there is a statistically significant
difference in mean value of the variable “gpa” between 2 gender groups
(Female and male)

analyze -> compared means -> pair sample t test

Paired Samples Statistics


Std. Std. Error
Mean N Deviation Mean
Pair quiz 7.47 105 2.481 .242
1 1
quiz 7.98 105 1.623 .158
2
Paired Samples Test
Paired Differences
95% Confidence
Std. Interval of the
Std. Error Difference Sig. (2-
Mean Deviation Mean Lower Upper t df tailed)
Pair quiz1 - -.514 1.835 .179 -.869 -.159 - 104 .005
1 quiz2 2.872

This paired-sample t test analysis indicates that for the 105 subjects, there is a
significant difference in the mean of the 2 quizzes (sig. = 0.005<0.05) có sự khác
nhau có ý nghĩa thống kê về giá trị trung bình của 2 bài quiz), In other words, the
mean score on the second quiz (M=7.98) was significantly greater than the mean
score on the first quiz (M=7.47)

ANOVA
Tạo bar chart, để hiện số liệu phần trăm: double click -> element ->number
format -> ghi số 2 ở phần demical number/place j đó quên r
Analyze -> One-way ANOVA-> option -> chọn descriptive, Homogeneity of
variance test, Welch
Dependent lish: overall_sat
Factor: status

Test of Homogeneity of Variances


Levene
Statistic df1 df2 Sig.
Overall, I am Based on Mean .907 2 1062 .404
satisfied with the Based on Median .068 2 1062 .934
price performance Based on Median .068 2 1017.92 .934
ratio of Oddjob and with adjusted df 5
Airways. Based on trimmed .771 2 1062 .463
mean

Bước 1: Levene’s test


Since sig.=0.404>0.05, the variances of the groups are different .Therefore, use the
results in the Anova table
ANOVA
Overall, I am satisfied with the price performance ratio of Oddjob
Airways.
Sum of Mean
Squares df Square F Sig.
Between 51.755 2 25.878 9.963 .000
Groups
Within 2758.455 1062 2.597
Groups
Total 2810.210 1064

Bupwsc 2
Since sig. = 0.000 <0.05, it í concluded that there is a statistically significant
difference between at least 2 groups in the mean value of price satisfaction

POST HOC TEST( NẰM TRONG ANOVA)

Multiple Comparisons
Dependent Variable: Overall, I am satisfied with the price performance ratio of Oddjob
Airways.
Tukey HSD
95% Confidence
Mean Interval
(I) Traveler (J) Traveler Difference Std. Lower Upper
Status Status (I-J) Error Sig. Bound Bound
*
Blue Silver .440 .120 .001 .16 .72
Gold .487* .148 .003 .14 .83
*
Silver Blue -.440 .120 .001 -.72 -.16
Gold .047 .170 .959 -.35 .44
*
Gold Blue -.487 .148 .003 -.83 -.14
Silver -.047 .170 .959 -.44 .35
*. The mean difference is significant at the 0.05 level.

Since sig. = 0.001 <0.05, it is concluded that there is a statistically significant


difference in the mean value of price satisfaction between the Blue and Silver
group.
Since sig. = 0.03 <0.05, it is concluded that there is a statistically significant
difference in the mean value of price satisfaction between the Blue and Gold
group.
Since sig. = 0.0959 <0.05, it is concluded that there is no statistically significant
difference in the mean value of price satisfaction between the Gold and Silver
group.

30/10/20204

Test of Homogeneity of Variances


Levene
Statistic df1 df2 Sig.
total Based on Mean .930 4 100 .450
Based on Median .449 4 100 .773
Based on Median and .449 4 78.532 .773
with adjusted df
Based on trimmed .825 4 100 .512
mean
Since sig. (0.450) > 0.05, the variances of the groups are not different. Therefore,
we use the results in the Anova table.

ANOVA
total
Sum of Mean
Squares df Square F Sig.
Between 1033.572 4 258.393 1.109 .357
Groups
Within Groups23310.142 100 233.101

Total 24343.714 104

Since sig. (0.357) > 0.05, it can be concluded that there is no statistically
significant difference between ethinic groups in the mean value of the total
variable.

CORRELATION
what is correlation: is a common measure of how strongly two vảiables related to
each other.
How to measure the correlation: a Methods to Statistically Measure Data
Correlation is Pearson Correlation. It is a measure of the linear correlation between
two sets of numeric data.
The correlation coefficient ranges from −1 to 1, where −1 indicates a perfect
negative relationship and 1 indicates a perfect positive relationship.
r=0: no relationship
Correlations

S1 S2 S3 S4

Pearson Correlation 1 .739** .619** .717**

S1 Sig. (2-tailed) .000 .000 .000

N 1038 1037 952 1033


Pearson Correlation .739** 1 .694** .766**

S2 Sig. (2-tailed) .000 .000 .000

N 1037 1040 952 1034


Pearson Correlation .619** .694** 1 .645**

S3 Sig. (2-tailed) .000 .000 .000

N 952 952 954 951


Pearson Correlation .717** .766** .645** 1

S4 Sig. (2-tailed) .000 .000 .000

N 1033 1034 951 1035


**. Correlation is significant at the 0.01 level (2-
tailed).

Khi một biến tương quan với chính nó, thì hệ số Pearson Correlation = 1
Relationship between many variables of Pearson Correlation: The correlation
matrix shows the correlation between each pairwise combination of variables.
the correlation between s1 (“ … with Oddjob Airways you will arrive on time.”)
and s2 (“ … the entire journey with Oddjob Airways will occur as booked.”) is
0.739, which indicates a strong relationship
the correlation between s1 (“ … with Oddjob Airways you will arrive on time.”)
and s3 (“ … ... in case something does not work out as planned, Oddjob Airways
will find a good solution..”) is 0.619, which indicates a quite strong relationship
the result is that S1,S2,S3 and S4: strong relationship.

REGRESSION ANALYSIS
Slide
Model Summary
Std. Error of the
Model R R Square Adjusted R Square Estimate

1 .700a .490 .456 5.496

a. Predictors: (Constant), Top 10% HS, Median SAT, Acceptance Rate

Giải thích R-square:


53% of the variation in the dependent variable (Graduation %) is explained by the
independent variables in the model (specifically, Top 10% HS, Median SAT,
Expenditures/Student, Acceptance Rate).
53% biến thiên của biến phụ thuộc (Graduation %) được giải thích bởi các biến
độc lập trong mô hình (cụ thể, Top 10% HS, Median SAT, Expenditures/Student,
Acceptance Rate).
R-square has 2 limitations:
1. When increasing sample size  R-square increases
2. When adding more independent variables to the model  R-square
increases
Adjusted R Square (R bình phương hiệu chỉnh)
Adjusted R Square = 0.492
49% of the variation in the dependent variable (Graduation %) is explained by the
independent variables in the model (specifically, Top 10% HS, Median SAT,
Expenditures/Student, Acceptance Rate), adjusting for sample size and the number
of independent variables in the model.
49% biến thiên của biến phụ thuộc (Graduation %) được giải thích bởi các biến
độc lập trong mô hình (cụ thể, Top 10% HS, Median SAT, Expenditures/Student,
Acceptance Rate), có điều chỉnh về kích cỡ mẫu và số lượng biến độc lập trong mô
hình.

ANOVAa
Sum of Mean
Model Squares df Square F Sig.

1 Regression 1303.928 3 434.643 14.391 .000b

Residual 1359.133 45 30.203

Total 2663.061 48

a. Dependent Variable: Graduation %


b. Predictors: (Constant), Top 10% HS, Median SAT, Acceptance
Rate

Sig. compared to 0.05


If sig. <0.05 The overall model is statistically significant at the 5% significance
level
If sig. > 0.05 -> The overall model is not statistically significant at the 5%
significance level

6/11/2024
COEFFICIENTS:
Cách đọc kết quả hồi quy: (Regression model)
Bước 1: Xem hệ số hồi quy có ý nghĩa thống kê hay không? (Step 1: Check if
the regression coefficient is statistically significant?)
Sig. = significant = p-value (ý nghĩa thống kê); mức ý nghĩa thống kê phổ biến là
0.05
Nếu sig. < 0.05 → có ý nghĩa thống kê → bước 2
Nếu sig. > 0.05 → không có ý nghĩa thống kê → X không có tác động lên Y.
Step 1: Check whether the regression coefficient is statistically significant or not?
Sig. = significant = p-value (statistical significance); the common statistical
significance level is 0.05
If sig. < 0.05 → statistically significant → step 2
If sig. > 0.05 → not statistically significant → X has no impact on Y.
Bước 2: Xem tác động tích cực hay tiêu cực?
Nếu dấu của hệ số (+): X có tác động tích cực lên Y. Ví dụ: Điểm Median SAT có
tác động tích cực lên tỷ lệ tốt nghiệp (graduation).
Nếu dấu của hệ số (-): X có tác động tiêu cực/ngược chiêu lên Y.
Step 2: Is the impact positive or negative?
If the coefficient sign is (+): X has a positive impact on Y. For example, Median
SAT scores have a positive impact on graduation rates.
If the coefficient sign is (-): X has a negative/negative impact on Y.
Notes:
 The signs of Unstandardized and Standardized coefficients are always the
same
Unstandardized Coefficients: coefficients that still retain the original units of the
variable
Standardized Coefficients: coefficients that have been standardized, no longer
retain the original units of the variable
To explain the economic meaning, Unstandardized Coefficients will be used
 Dấu của hệ số Unstandardized và Standardized là luôn giống nhau
Unstandardized Coefficients: hệ số còn giữ đơn vị gốc của biến
Standardized Coefficients: hệ số đã chuẩn hóa, không còn giữ đơn vị gốc của biến
Đề giải thích ý nghĩa kinh tế, sẽ dùng Unstandardized Coefficients
These are continuous variable
If X increases by 1 unit, Y increases how many units?, keeping other factors
unchanged
Nếu X tăng 1 đơn vị, Y tăng bao nhiêu đơn vị, giữ nguyên các yếu tố khác không
thay đổi?
Nếu điểm trung vị SAT (median SAT) tăng lên 1 điểm, thì tỷ lệ tôt nghiệp tăng lên
0.072%, giữ nguyên các yêu tô khác không thay đôi.)
Nêu tỷ lệ châp thuận (acceptance rate) tăng 1%, thì tỷ lệ tốt nghiệp giảm 0.249%,
giữ nguyên các yếu tố khác không thay đổi)
Multicollinearity (Đa cộng tuyến) Một vấn đề xảy ra khi có sự tương quan rất
mạnh xảy ra giữa các biến độc lập trong mô hình.
Khi |r| > 0.7: biểu hiện của Đa cộng tuyến
2 hậu quả của mô hình có Đa cộng tuyến
1. Dấu của hệ số có thể thay đổi (v.d: đúng bản chất dấu của hệ số là dương,
nhưng vì đa cộng tuyến nên dấu chuyển sang âm)
2. Giá trị Sig. của hệ số bị tăng lên, khiến cho biến mất ý nghĩa thống kê.
Cach lam: Analyze-> correlate-> bivariate

Correlations
Median Acceptance Expenditur Top 10%
SAT Rate es/Student HS
** **
Median SAT Pearson 1 -.602 .573 .503**
Correlation
Sig. (2-tailed) .000 .000 .000
N 49 49 49 49
**
Acceptance Rate Pearson -.602 1 -.284* -.610**
Correlation
Sig. (2-tailed) .000 .048 .000
N 49 49 49 49
** *
Expenditures/ Pearson .573 -.284 1 .506**
Student Correlation
Sig. (2-tailed) .000 .048 .000
N 49 49 49 49
** **
Top 10% HS Pearson .503 -.610 .506** 1
Correlation
Sig. (2-tailed) .000 .000 .000
N 49 49 49 49
**. Correlation is significant at the 0.01 level (2-tailed).
*. Correlation is significant at the 0.05 level (2-tailed).
Conclusion: do ko có biến nào > 0.7 nên ko có hiện tượng đa cộng tuyến
Y = x1 + x2 +x3 +x4
Model 1: x1+x2 +x3+x4
Nếu có nhiều cặp tương quan mạnh thì nên chạy lại hoặc thu thập lại số liệu

REGRESSION WITH CATEGORICAL VARIABLE: BIEN PHAN LOAI

Categorical data can be included as independent variables, but must be coded


numeric using dummy variables.
For variables with 2 categories, code as 0 and 1.
Cách làm: analyze-> regression -> linear -> chọn 1 dependent variable là salary ->
các biến independent variable là age và MBA
Coefficientsa
Standardize
d
Unstandardized Coefficient
Coefficients s
Model B Std. Error Beta t Sig.
1 (Constan 893.588 1824.575 .490 .628
t)
Age 1044.146 42.141 .975 24.777 .000
MBA 14767.232 1351.802 .430 10.924 .000
a. Dependent Variable: Salary
Salary = 893.59 + 1044.15 × Age + 14767.23 × MBA
◦ If MBA = 0, salary = 893.59 + 1044 × Age
◦ If MBA = 1, salary = 893.59 + 1044 × Age + 14767.23

- The coefficient of MBA is statistically significant and positive. This


indicates that having an MBA degree has a positive association/influence on
salary.
- If an employee has an MBA degree, the salary increases by 14767.23 USDs
compared with the one without an MBA degree, keeping all other
independent variables constant.

MBA (MODERATING IMPACT)


Transhorm -> compute variable -> đặt tên target varibale age_MBA -> đẩy các
biến qua ( age*MBA) => có 1 cột mới trong data view tên age_MBA
Age on salary has 3 solutions
->1 positive moderatin impact
-> negative
-> no moderating impact
Chạy regression => analyze => linear
Coefficientsa
Standardized
Unstandardized Coefficients Coefficients
Model B Std. Error Beta t Sig.
1 (Constant) 3902.509 1336.398 2.920 .006
Age 971.309 31.069 .907 31.263 .000
MBA -2971.080 3026.242 -.086 -.982 .334
age_MBA 501.848 81.552 .531 6.154 .000
a. Dependent Variable: Salary
CATEGORICAL VARIABLES WITH MORE THAN 2 LEVELS
K levels
k-1 dummy variables
cutting tool we have A B C D => 4 cutting tools = 4 ks
 4-1= 3 dummy variable

You might also like