analyze and comparison
analyze and comparison
Case 1: the table shows that females make up majority of the sample at 61%, while
males constitute 39%
Case 2: this table indicates that females represent 61% of the sample, making them
the majority, while males account for 39%
Case 3: the data indicates that women comprise the majority of the sample,
accounting for 61% of participants, while men represents the remaining 39%
ethnicity
Frequen Valid Cumulative
cy Percent Percent Percent
Vali Native 5 4.8 4.8 4.8
d Asian 20 19.0 19.0 23.8
Black 24 22.9 22.9 46.7
White 45 42.9 42.9 89.5
Hispani 11 10.5 10.5 100.0
c
Total 105 100.0 100.0
Analyze:
The provided table offers a comprehensive snapshot of the ethnic composition
within a specific population. The table clearly indicates a predominance of Whites
in the population. This group represents the largest segment, accounting for nearly
half of the total individuals 42.9%. followedd by black and asians, account for
22.9% and 19% respectively.
The table shows that the largest group in the sample is White, comprising 42.9%,
followed by Black individuals at 22.9%. Asian make up 19& of the sample,
Hispanic account for 10.5%, and Native represent the smallest group at 4.8%.
The demographic breakdown (thuat ngu chuyen nganh) reveals that White
individuals constitute the largest group in the sample at 42.9%. Black participants
form the second largest group, representing 22.9% of the sample. Asian
participants represent the smallest group, making up 4.8% of the sample.
3.
Để hiện percentage, vô analyze -> descriptive ana-> crosstab-> cells-> chọn row ở
mục percentage
Row:
gender * ethnicity Crosstabulation
ethnicity
Hispani
Native Asian Black White c Total
gender Femal Count 4 13 14 26 7 64
e % within 6.3% 20.3% 21.9% 40.6% 10.9% 100.0%
gender
Male Count 1 7 10 19 4 41
% within 2.4% 17.1% 24.4% 46.3% 9.8% 100.0%
gender
Total Count 5 20 24 45 11 105
% within 4.8% 19.0% 22.9% 42.9% 10.5% 100.0%
gender
Females make up the majority of the sample, and their distribution across ethnic
groups shows that the largest proportion are White (40.6%), followed by Black
(21.9%), Asian (20.3%), Hispanic (10.9%), and Native (6.3%).
Males also have the highest representation in the White ethic group (46.3%),
followed by Black (24.4%), Asian (17.1%), Hispanic (9.8%), and Native (2.4%)
Column:
Để hiện %, ta chọn
Bar 2: 1 categorical variable
T1: continuous variable ( chia cho mean)
Graph-> chart builder-> bar-> gender hang ngang, total hang doc-> ok
The bar chart reveals some variability in the mean total among different ethnicities.
While the overall average is around 100, the Asian ethnicity demonstrates a
slightly higher mean total compared to the other groups. The Hispanic ethnicity, on
the other hand, shows a lower mean total. The remaining ethnicities exhibit
relatively similar values, indicating a relatively homogeneous distribution of the
mean total within those groups. The Asian ethnicity has the highest mean total at
approximately 103. This suggests that individuals from the Asian ethnicity tend to
have higher total scores compared to the other groups. On the other hand, The
Hispanic ethnicity has the lowest mean total at around 93, which shows that this
group tend to have the least students to have good scores. The remaining
ethnicities (Native, Black, and White) have relatively similar mean totals, ranging
from 95 to 102. indicating that there is less variation in scores among these groups.
LINE GRAPH
PIE CHART
SCATTER PLOT
Relationship between 2 variables: positive, negative, no relationship
T-TEST:
Sig. = significant = p-value (ý nghĩa thống kê); mức ý nghĩa thống kê phổ biến là
0.05
Không có trường hợp =0.05
Group Statistics
Std. Std. Error
gender N Mean Deviation Mean
total Femal 64 102.03 13.896 1.737
e
Male 41 98.29 17.196 2.686
EXERCISE:
Group Statistics
Std. Std. Error
gender N Mean Deviation Mean
gpa Femal 64 2.8967 .74622 .09328
e
Male 41 2.5949 .76346 .11923
Since Sig. of the test = 0.556 > 0.05 the variances of the two populations are not
different. Therefore, use the result of t-test for Equality of Means in the line “Equal
variances assumed”.
sig. = 0.048 < 0.05, so it is concluded that there is a statistically significant
difference in mean value of the variable “gpa” between 2 gender groups
(Female and male)
This paired-sample t test analysis indicates that for the 105 subjects, there is a
significant difference in the mean of the 2 quizzes (sig. = 0.005<0.05) có sự khác
nhau có ý nghĩa thống kê về giá trị trung bình của 2 bài quiz), In other words, the
mean score on the second quiz (M=7.98) was significantly greater than the mean
score on the first quiz (M=7.47)
ANOVA
Tạo bar chart, để hiện số liệu phần trăm: double click -> element ->number
format -> ghi số 2 ở phần demical number/place j đó quên r
Analyze -> One-way ANOVA-> option -> chọn descriptive, Homogeneity of
variance test, Welch
Dependent lish: overall_sat
Factor: status
Bupwsc 2
Since sig. = 0.000 <0.05, it í concluded that there is a statistically significant
difference between at least 2 groups in the mean value of price satisfaction
Multiple Comparisons
Dependent Variable: Overall, I am satisfied with the price performance ratio of Oddjob
Airways.
Tukey HSD
95% Confidence
Mean Interval
(I) Traveler (J) Traveler Difference Std. Lower Upper
Status Status (I-J) Error Sig. Bound Bound
*
Blue Silver .440 .120 .001 .16 .72
Gold .487* .148 .003 .14 .83
*
Silver Blue -.440 .120 .001 -.72 -.16
Gold .047 .170 .959 -.35 .44
*
Gold Blue -.487 .148 .003 -.83 -.14
Silver -.047 .170 .959 -.44 .35
*. The mean difference is significant at the 0.05 level.
30/10/20204
ANOVA
total
Sum of Mean
Squares df Square F Sig.
Between 1033.572 4 258.393 1.109 .357
Groups
Within Groups23310.142 100 233.101
Since sig. (0.357) > 0.05, it can be concluded that there is no statistically
significant difference between ethinic groups in the mean value of the total
variable.
CORRELATION
what is correlation: is a common measure of how strongly two vảiables related to
each other.
How to measure the correlation: a Methods to Statistically Measure Data
Correlation is Pearson Correlation. It is a measure of the linear correlation between
two sets of numeric data.
The correlation coefficient ranges from −1 to 1, where −1 indicates a perfect
negative relationship and 1 indicates a perfect positive relationship.
r=0: no relationship
Correlations
S1 S2 S3 S4
Khi một biến tương quan với chính nó, thì hệ số Pearson Correlation = 1
Relationship between many variables of Pearson Correlation: The correlation
matrix shows the correlation between each pairwise combination of variables.
the correlation between s1 (“ … with Oddjob Airways you will arrive on time.”)
and s2 (“ … the entire journey with Oddjob Airways will occur as booked.”) is
0.739, which indicates a strong relationship
the correlation between s1 (“ … with Oddjob Airways you will arrive on time.”)
and s3 (“ … ... in case something does not work out as planned, Oddjob Airways
will find a good solution..”) is 0.619, which indicates a quite strong relationship
the result is that S1,S2,S3 and S4: strong relationship.
REGRESSION ANALYSIS
Slide
Model Summary
Std. Error of the
Model R R Square Adjusted R Square Estimate
ANOVAa
Sum of Mean
Model Squares df Square F Sig.
Total 2663.061 48
6/11/2024
COEFFICIENTS:
Cách đọc kết quả hồi quy: (Regression model)
Bước 1: Xem hệ số hồi quy có ý nghĩa thống kê hay không? (Step 1: Check if
the regression coefficient is statistically significant?)
Sig. = significant = p-value (ý nghĩa thống kê); mức ý nghĩa thống kê phổ biến là
0.05
Nếu sig. < 0.05 → có ý nghĩa thống kê → bước 2
Nếu sig. > 0.05 → không có ý nghĩa thống kê → X không có tác động lên Y.
Step 1: Check whether the regression coefficient is statistically significant or not?
Sig. = significant = p-value (statistical significance); the common statistical
significance level is 0.05
If sig. < 0.05 → statistically significant → step 2
If sig. > 0.05 → not statistically significant → X has no impact on Y.
Bước 2: Xem tác động tích cực hay tiêu cực?
Nếu dấu của hệ số (+): X có tác động tích cực lên Y. Ví dụ: Điểm Median SAT có
tác động tích cực lên tỷ lệ tốt nghiệp (graduation).
Nếu dấu của hệ số (-): X có tác động tiêu cực/ngược chiêu lên Y.
Step 2: Is the impact positive or negative?
If the coefficient sign is (+): X has a positive impact on Y. For example, Median
SAT scores have a positive impact on graduation rates.
If the coefficient sign is (-): X has a negative/negative impact on Y.
Notes:
The signs of Unstandardized and Standardized coefficients are always the
same
Unstandardized Coefficients: coefficients that still retain the original units of the
variable
Standardized Coefficients: coefficients that have been standardized, no longer
retain the original units of the variable
To explain the economic meaning, Unstandardized Coefficients will be used
Dấu của hệ số Unstandardized và Standardized là luôn giống nhau
Unstandardized Coefficients: hệ số còn giữ đơn vị gốc của biến
Standardized Coefficients: hệ số đã chuẩn hóa, không còn giữ đơn vị gốc của biến
Đề giải thích ý nghĩa kinh tế, sẽ dùng Unstandardized Coefficients
These are continuous variable
If X increases by 1 unit, Y increases how many units?, keeping other factors
unchanged
Nếu X tăng 1 đơn vị, Y tăng bao nhiêu đơn vị, giữ nguyên các yếu tố khác không
thay đổi?
Nếu điểm trung vị SAT (median SAT) tăng lên 1 điểm, thì tỷ lệ tôt nghiệp tăng lên
0.072%, giữ nguyên các yêu tô khác không thay đôi.)
Nêu tỷ lệ châp thuận (acceptance rate) tăng 1%, thì tỷ lệ tốt nghiệp giảm 0.249%,
giữ nguyên các yếu tố khác không thay đổi)
Multicollinearity (Đa cộng tuyến) Một vấn đề xảy ra khi có sự tương quan rất
mạnh xảy ra giữa các biến độc lập trong mô hình.
Khi |r| > 0.7: biểu hiện của Đa cộng tuyến
2 hậu quả của mô hình có Đa cộng tuyến
1. Dấu của hệ số có thể thay đổi (v.d: đúng bản chất dấu của hệ số là dương,
nhưng vì đa cộng tuyến nên dấu chuyển sang âm)
2. Giá trị Sig. của hệ số bị tăng lên, khiến cho biến mất ý nghĩa thống kê.
Cach lam: Analyze-> correlate-> bivariate
Correlations
Median Acceptance Expenditur Top 10%
SAT Rate es/Student HS
** **
Median SAT Pearson 1 -.602 .573 .503**
Correlation
Sig. (2-tailed) .000 .000 .000
N 49 49 49 49
**
Acceptance Rate Pearson -.602 1 -.284* -.610**
Correlation
Sig. (2-tailed) .000 .048 .000
N 49 49 49 49
** *
Expenditures/ Pearson .573 -.284 1 .506**
Student Correlation
Sig. (2-tailed) .000 .048 .000
N 49 49 49 49
** **
Top 10% HS Pearson .503 -.610 .506** 1
Correlation
Sig. (2-tailed) .000 .000 .000
N 49 49 49 49
**. Correlation is significant at the 0.01 level (2-tailed).
*. Correlation is significant at the 0.05 level (2-tailed).
Conclusion: do ko có biến nào > 0.7 nên ko có hiện tượng đa cộng tuyến
Y = x1 + x2 +x3 +x4
Model 1: x1+x2 +x3+x4
Nếu có nhiều cặp tương quan mạnh thì nên chạy lại hoặc thu thập lại số liệu