AFM_Module 6
AFM_Module 6
2
Decision making process
3
Statistical methods
Descriptive statistics
▪ It is the discipline of quantitatively describing the main features of a
collection of data
▪ Collecting, summarizing, processing data into useful information
Inferential statistics
▪ Provides the basis for prediction, and estimates that are used to
transform information about sample into inferences about population
4
Descriptive vs. inferential
5
Statistical analysis
▪ Data/information (what)
▪ Scientific reasoning (Who? where? how? what happens?)
▪ Finding (what results?)
▪ Lesson/conclusion (so what? so how? )
Key definitions
7
Descriptive statistics
8
Descriptive statistics
The Mean:
• Mean is another word for average.
• Most commonly used statistic that tells us about the centre of a data
set.
• The mean is important because it is involved in many statistical tests.
• E.g., T-test, ANOVA etc
• The sum of all values divided by the number of observations.
n=4
+ + + = 307
307 ÷4
9
Descriptive statistics
10
Descriptive statistics
3+1+4+2.5+50
• Mean= =12.1
5
• Median: 1,2.5,3,4,50 =3
11
Measures of spread
12
Descriptive statistics
The Variance (𝑠 2 )
• A measure to show how spread out our observations are
• Based on the distance of each observation from the mean
13
Descriptive statistics
The Variance (𝑠 2 )
• Squared
• The average of the squared differences from the mean
Dog 1: 10kg
Dog 2: 12kg
Dog 3: 13kg
Dog 4: 17kg
Dog 5: 20kg
Dog 6: 24kg
Mean = 16kg
14
Descriptive statistics
The Variance (𝑠 2 )
1. Work out the difference from the mean and square
each value
2. Add all the squared values together
Dog 1: -6 36
Dog 2: -4 16
Dog 3: -3 9
Dog 4: 1 1
Dog 5: 4 16
Dog 6: 8 64
Mean = 16kg 142
15
Descriptive statistics
The Variance (𝑠 2 )
3. Divide the squared value by the number of
observations (n).
n=6
142
6 5- 1
Variance value:
28.4kg²
16
Descriptive statistics
17
Descriptive statistics
28.4
= 5.3kg
18
Descriptive statistics
15
10
• Large value indicates
5 data are gathered far
0 from the mean
19
Know the data and explore it
20
Types of data
Types of data based on number
of attributes
▪ Univariate Data
▪ Bivariate Data
▪ Multivariate Data
21
Types of data
22
Types of data
23
Normal Distribution (Bell-curve)
24
Bivariate measures
25
Contingency Table
26
Covariance and correlation
27
Covariance and correlation
28
Covariance and correlation
29
Diagrammatic Representations of Data
▪ Easy to understand:
▪ Numbers do not tell all the story.
▪ Diagrammatic representation of data makes it easier to understand
▪ Simplified Presentation:
▪ Large volumes of complex data can be represented in a simplified and diagram
▪ Reveals hidden facts:
▪ Diagrams help in bringing out the facts and relationships between data not
noticeable in raw/tabular form
▪ Easy to compare:
▪ Diagrams make it easier to compare data
30
Diagrammatic Representations of Dat
▪ Bar Charts
▪ Histogram
▪ Box Plot
▪ Scatter Plot
▪ Heat map
▪ Line Graph
31
Parametric vs. Non-parametric tests
Field 1 Field 2
15.2 15.9
15.3 15.9
16 15.2
▪ Null hypothesis:
15.3 14.9
15 15.9
Dof = n1 + n2 - 2
Null hypothesis:
There is no statistically significant difference between the samples.
▪ t-value > t-critical
▪ 2.3 > 2.04
▪That means, there is some statistically significant difference between
the samples
Rubber Rubber Rubber
Analysis of variance (ANOVA) supplier 1 supplier 2 supplier 3
1 2 2
Step 1: 2 4 3
Null hypothesis: 5 2 4
There is no difference between means
µ1 = µ2 = µ3
Alternative:
At least there is one difference among the means
Alpha = 0.05
Step 2
Find critical F-value
- Dof (between-numerator) = k – 1 = 3 – 1 = 2
k (number of conditions in our group)
- Dof (within-denominator) = N – k = 9 – 3 = 6,
N (total number of scores we have in sample
- Dof (total) = 8
- F-Critical = 5.14 (from table)
Rubber Rubber Rubber
supplier 1 supplier 2 supplier 3
1 2 2
ANOVA 2
5
4
2
3
4
Step 3:
Analysis of sum of squares-total variability SS (within) = Sum of squares (within)
Mean for each condition/group = Sum (x1 – mean (x1))^2
- Mean x1 = 2.67 + Sum (x2 – mean (x2))^2
- Mean x2 = 2.67 + Sum (x3 – mean (x3))^2
- Mean x3 = 3.00
= (1-2.67)^2 + (2- 2.67)^2 + (5-2.67)^
Grand mean (G) = G/N = + (2-2.67)^2 + (4- 2.67)^2 + (2-2.67)^
= (1+2+5+2+4+2+2+3+4)/9 + (2-3)^2 + (3- 3)^2 + (4-3)^2
G = 2.78
SS (within) = 13.34
SS (total) = Sum of squares (total)
= Sum (x - G)^2 SS (between) = SS(total) – SS (within)
= (1-2.78)^2 + (2- 2.78)^2 + (5-2.78)^2 = 13.6 – 13.34
+ (2-2.78)^2 + (4- 2.78)^2 + (2-2.78)^2 = 0.24
+ (2-2.78)^2 + (3- 2.78)^2 + (4-2.78)^2
SS (total) = 13.6
Rubber Rubber Rubber
supplier 1 supplier 2 supplier 3
1 2 2
ANOVA 2
5
4
2
3
4
- Dof = k – 1
- Dof = N – k
- Dof = n1 + n2 - 2
SS (between) = SS(total) – SS (within)
Variance (between)
Variance (within)
Mean square = MS (between)
Mean square = MS (within)
Null hypothesis:
There is no difference between means
ANOVA µ1 = µ2 = µ3
Step 4: Step 5:
Variance (between) F-value = MS(between)/MS (within)
Variance (within) = 0.12/2.22
= 0.054
Mean square = MS (between)
= SS (between)/Dof (between) F-critical = 5.14 Remember!!!
= 0.24/2
= 0.12 F-value < F-critical
Mean square = MS (within) 0.054 < 5.14
= SS (within)/Dof (within)
= 13.34/6 Conclusion:
= 2.22 We fail to reject null hypothesis
▪Decision:
▪If F-value < F-critical >>>>Don’t reject Null
▪If F-value > F-critical >>>>Reject Null
Practicing ANOVA
A company is evaluating three different goods suppliers based on their delivery
performance. The goal is to determine whether there is a significant difference
in performance among the three suppliers.
▪ When the two sample variances are tested and found not to be
equal
• As we cannot use the sample variances
• thus we cannot use the t-test for independent samples. Instead, we use
the Wilcoxon Rank Sum Test
Wilcoxon Rank Sum Test
The Z test and the t test are “parametric tests” – that is, they answer a
question about the difference between populations by comparing
sample statistics (e.g., X1 and X2) and making an inference to the
population parameters (μ1 and μ2).
- These are small samples, and they are independent (“random samples
of Cajun and Creole dishes”)
- Therefore, we must begin with the test of equality of variances
Cajun Creole
3500 3100
4200 4700
4100 2700
4700 3500
4200 2000
3705 3100
4100 1550
Test of hypothesis of equal variances
H0: 12 = 22
HA: 12 ≠ 22 F = larger value-numerator
smaller value-denominator
Reject H0 – variances are not equal, so we do the Wilcoxon rank sum test
Example 1 – Wilcoxon Rank Sum Test
H0 : The median cost of preparing Cajun dishes is the same as Creole dishes
H1: The median cost of preparing Cajun dishes is different from Creole dishes
Statistical test:
Rejection region:
- TA ≥ TU or TA ≤ TL
- Reject H0 if TCajun > 66 (or if TCreole < 39)
Example 2 – Wilcoxon Rank Sum Test
A company is analyzing the performance of employees based on gender to determine
whether there is a significant difference in their output (e.g., sales numbers, efficiency
H0: 12 = 22 scores, or productivity ratings). The goal is to assess whether males and females perform
differently in this specific metric.
HA: 12 ≠ 22 H0: There is no significant difference in performance between male and female employees
H1: There is a significant difference in performance between male and female employees
Male Female
Test statistic: F= S22 6.4 2.7
S12 1.7 3.9
3.2 4.6
5.9 3.0
Rej. region: F > Fα/2 = F(7,8,.025) = 4.53 2.0 3.4
3.6 4.1
or F < (1/4.90) = .204 5.4 3.4
7.2 4.7
3.8
Example 2 – Wilcoxon Rank Sum Test
6.4 16 2.7 3
Fobt = 4.316 = 9.38 1.7 1 3.9 10
.46 3.2 5 4.6 12
5.9 15 3.0 4
Reject H0 – do Wilcoxon
2.0 2 3.4 6.5
Statistical test: T 3.6 8 4.1 11
Rejection region: 5.4 14 3.4 6.5
T > TU = 90 (or T < TL = 54) 7.2 17 4.7 13
3.8 9
Σ 78 75
T = 78 < TU = 90
Failed to reject H0. Hence, there is no significant difference in performance between male and female
employees
To compare the delivery times of Supplier A and Supplier B to determine if one