Handout
Handout
Examples
Consider the grades obtained by five high school students 76, 84, 90, 85, and 74. Thus X 1=76,
X2=84, X3=90, X4=85, and X5=74.
Find
5
∑ Xi
1. i=1
4
∑ Xi
2. i=2
5
∑ Xi
3. i=3
n
∑ X i=X 1 + X 2 + X 3+ .. .+ X n
The summation of n number of observations is represented as i =1
n
∑ nc
Present the symbol i=1 , where c is a constant, is equal to the product of the constant and n.
Proof:
n
∑ nc
i=1 = c1+c2+c3+…+cn = nc
n
∑ ( X i +Y i )
Present the symbol i=1 , where X and Y are two variables, is equal to the sum of their
summations.
Proof:
n
∑ ( X i +Y i )
i=1
Examples
Find the summation of the following
4
∑6
1. i=1
2. Given:
4 xi yi xi+yi
∑ ( X i +Y i ) 1 3 4
i=1 2 8 10
n
5 7 12
∑ nc 4 6 10
The sum of n number of constant is i=1 = c1+c2+c3+…+cn = nc
The summation of the sum of several variables is equal to the sum of the terms taken separately.
n
∑ ( X i +Y i ) ( X 1+Y 1 ) +( X 2+Y 2) + ( X 3+Y 3) +. . .+ ( X n +Y n )
i=1 =
n n n
∑ ( X i +Y i ) ∑ X i +∑ Y i
or i=1 = i=1 1
Another rule use in summation notation is the summation of the sum of variable and a constant
which is equal to the summation of the variable plus the product of n and the constant.
n n
∑ ( X i +c )=∑ X i + nc
i=1 i=1
Proof:
n
∑ ( X i +c ) ( X 1 +c ) +( X 2 +c ) + ( X 3 +c ) +. ..+ ( X n +c )
i=1 =
n
∑ ( X i +c ) ( X 1+X 2+ X 3 +. . .+ X n ) +nc
i=1 =
or
n n
∑ ( X i +c ) ∑ X i +nc
i=1 = i=1
Example
Present the table below
Xi c Xi + c
2 4 6
5 4 9
6 4 10
7 4 11
10 4 14
Total 30 20 50
5
∑ ( X i +4 )
i=1
n
∑ ( X i +c )
The sum of n number of variable added to the constant c is i=1 =
( X 1+X 2+ X 3 +. . .+ X n ) +nc
or
n n
∑ ( X i +c ) ∑ X i +nc
i=1 = i=1
To obtain the sum of the squares of variables, we first take the square of all observations, and then
get the sum.
Then,
n
∑ ( X i )2 2 2 2
( X 1 ) + ( X 2 ) + ( X 3 ) +. ..+ ( X n )
2
i=1 =
5
∑ ( X i )2
i=1
In contrast to the sum of the squares of variables, take the sum of the variable first before getting
the square to evaluate the square of the sum of variables
Then,
n 2
( )
∑ Xi
i=1 = ( X 1+ X 2 + X 3 +. . .+ X n )
2
5 2
(∑ )
i=1
Xi
Generalization
1. The sum of the square of n number of observations is
n
∑ ( X i )2 X
2
+X
2
+X
2
+. ..+ X
2
i=1 = ( 1) ( 2 ) ( 3 ) ( n)
2. The square of the sum of n number of observations is
n 2
( )
∑ Xi
i=1 = ( X 1+ X 2 + X 3 +. . .+ X n )
2
To obtain the sum of the product of n variables, we first take the product of each pair of
observations, and then get the sum.
Xi Yi (Xi)(Yi)
3 5 15
4 6 24
6 2 12
7 4 28
10 7 70
n
Total
∑ X iY i
i−1 =14
9
Solution:
5
∑ X i Y i =( 3 ) ( 5 ) +( 4 ) ( 6 ) +( 6 ) ( 2 ) +( 7 ) ( 4 )+ ( 10 )( 7 )
i=1
5
∑ X iY i
i=1 = 149
Generalization
The sum of the product of n pairs of observations is
n
∑ X i Y i =( X 1 )( Y 1 ) +( X 2)( Y 2 ) +( X 3 )(Y 3 )+ .. .+( X n)( Y n)
i=1
n
∑ cx i =cx 1+ cx 2 +cx 3 +. ..+cx n
Present the i=1
n
∑ cx i =c ( x 1+ x 2+ x 3+. ..+ x n )
By factoring i=1 or
n n
∑ cx i =c ∑ xi
i=1 i=1
PRESENTATION OF DATA:
Suppose a statistics class with 30 students is given an examination and the raw scores are shown:
48 70 60 35 59
79 59 71 59 47
30 49 68 78 59
32 36 50 38 68
32 57 65 65 58
73 45 55 66 50
How can we make our own frequency distribution based on the given data?
B. Development of the Lesson:
1. Compute the range of the data. Look for the lowest score and the highest score and then get the
difference plus one, that is:
Range = HS – LS +1
2. Get the interval size (i) by dividing the range by the average of 10 and 15.
Range
i=
12. 5
The divisor (12.5) is actually the average of 10 to 15 which is the ideal number of classes to be
made. Round off your in to the nearest whole number.
3. Construct your class intervals – start it with a score which is divisible by your interval size (i) take
note of your lowest score. If your i is 6 and your lowest score is 19 then start at 18 because 19 is not
divisible by 6. Twenty-four is divisible by 6 but if you start at 24, 19 will be excluded. So our class
interval starts at 18-23, then 24-29 then 30-35 and so on.
4. Afterwards get the frequency/ies (f) of each class interval that is how many got a score from 18-
23, 24-29 and so on.
Example:
f
30-35 2
24-29 5
18-23 3
After you have finished, get the sum of the frequencies. It should be equal to N (number of cases), if
∑ fM ∑ fd
Mean for Grouped data
x̄=
n
x́= Am+ ( )
n
i
Ungrouped data:
1. What is the mean of the ages of 9 children in a slum area given below:
9, 8, 1, 3, 4, 5, 6, 7, 2
Grouped data:
2. A frequency distribution of the scores in Statistics III of 34 BSN students.
x f M fM
35-39 3 37 111
30-34 5 32 160
25-29 8 27 216
20-24 10 22 220
15-19 4 17 68
10-14 2 12 24
5-9 2 7 14
i=5 n = 34 fM
∑ =8
13
Class f M fM
Interval
20-24 2
25-29 6
30-34 9
35-39 10
40-44 12
45-49 7
50-54 4
i=5 n = 50
The following example may illustrate the second formula, but instead using M, we will use w to show
that it represents the weight of the data and x is the given data aside for weight.
This mean is called the weighted mean.
Example:
Here are the grades obtained by a student in the different criteria for grading. The weight for each
criterion is given.
Criteria Grades (x) Weight (w) xw
Long Tests 80 3.0 240
Quizzes 85 2.0 170
Departmental 82 2.5 205
Tests
Class 88 1.5 132
Participation
Homework and 85 1.0 85
Projects
Total 10.0
∑ xw =832
x̄=
∑ xw 832
=83 . 20
Applying the formula n = 10 .0
Determine the weight mean:
1. Grades of a student
Subjects Grades No. of Units
Science 88 5
English 81 3
Social Studies 79 3
Mathematics 82 3
Physical Education 85 1
2. A questionnaire elicited some answers on attitudes of parents toward the 3-day-a-week classes.
The range from strongly agree to strongly disagree with the assigned weights.
Response Weights Frequency
Strongly Agree 19
Agree 23
Undecided 5
Disagree 2
Strongly Disagree 1
The MEDIAN, ungrouped data, is the value found at the middle when the data arranged in an array
from the lowest to highest or from highest to the lowest.
If there are two middle values, the average is taken.
FORMULA for the Median of grouped data:
n
~x=LCB+
where:
2
f[ ]
−cf
i
~
Md or x = median
L = lower limit
n/2 = half sum
cf = cumulative frequency
f = frequency where the lower limit is located
i = interval
Examples
Solve for the value of the Median.
Ungrouped data
121, 108, 120, 98, 132, 100, 92, 140, 102, 98
solution:
Arranging the values in ascending order, we obtain
92, 98, 98, 100, 102, 108, 120, 121, 132, 140
since there are two middle point values, 102 and 108, then:
102+108
Md=
2
Md=105
Grouped data
Frequency distribution of the scores in statistics of 34 students.
Scores (X) f cf
35-39 3 34
30-34 5 31
25-29 8 26
20-24 10 18
15-19 4 8
10-14 2 4
5-9 2 2
n = 34
Solve for the value of the median.
Distribution of the lives of 60 batteries.
x f cf
4.6 – 4.9 5
4.2 – 4.5 4
3.8 – 4.1 5
3.4 – 3.7 14
3.0 – 3.3 15
2.6 – 2.9 9
2.2 – 2.5 5
1.8 – 2.1 3
i = 0.4 n=60
The MODE is the value that occurs most often or with greatest frequency.
For grouped data, we consider two formulas,
by inspection: get the midpoint of the highest frequency (and decide whether the mode is unimodal-
one mode, bimodal-two modes, multimodal-three or more modes or no mode)
by formula (you can only use this if the data is unimodal, if other kind of data, we cannot use this
formula-the final answer will be the inspected value.)
Mo=3 Md−2 x́
where:
Mo = Mode
3 Md = three times the median
2 x́ = two times the mean
Examples
Solve for the mode.
Ungrouped data
The following are scores 11 students in spelling test of 20 items.
4, 5, 8, 8, 8, 9, 12, 12, 15, 19, 20
Grouped Data
Scores f cf
35-39 3 34
30-34 5 31
25-29 8 26
20-24 10 18
15-19 4 8
10-14 2 4
5-9 2 2
n = 34
The value of the mean and the median of this example were already computed in the previous
discussion, the values are 23.91 and 24 respectively.
Substitute these values to the formula
solution:
by inspection: 22 (unimodal, so we can use the formula)
Mo=3 ( 24 )−2 (23.91 )
Mo=72−47.82
Mo=24 . 18
Solve for the mode.
Distribution of the lives of 60 batteries.
x f cf
4.6 – 4.9 5
4.2 – 4.5 4
3.8 – 4.1 5
3.4 – 3.7 14
3.0 – 3.3 15
2.6 – 2.9 9
2.2 – 2.5 5
1.8 – 2.1 3
i = 0.4 n=60
Comparison of mean, median and mode:
1. mean – stable, dependent and reliable measure of central tendency. It determines the normality
(Peakedness) of the normal curve.
where:
Q k = quartile where k is from 1, 2, 3
LCB = lower class boundary
n = sample size
cf = cumulative frequency
f = the frequency where the lower limit is located
Examples
1. Ungrouped data
The following is a list of scores resulting from an English examination administered to 40 students,
Find the first , second , and third quartiles:
91 61 46 62 54
62 93 90 99 76
48 83 59 96 66
94 52 51 59 62
89 100 92 70 59
91 73 68 49 54
85 43 78 50 45
98 69 77 42 46
solution:
Arrange the following scores from the lowest to the highest.
Scores(x): n = 40
42 43 45 46 46 48 49 50 51 52 54 54 59
59 59 61 62 62 62 66 68 69 70 73 76 77
78 83 85 89 90 91 91 92 93 94 96 98 99
100
For
40
Q 1=
4
¿ 10
Q 1 = the 10th observation is 52
40
Q 2=
2
¿ 20
Q 2 = the 20th observation is 66
3 ( 40 )
Q 1=
4
¿ 30
Q 3 = the 30th observation is 89
2. Grouped data
Find the first quartiles of the frequency distribution of the scores of fifty students in a History class.
scores f cf
45-49 2 50
40-44 6 48
35-39 11 42
30-34 10 31
25-29 12 21
20-24 5 9
15-19 4 4
n=50
1n
Q 1=L+ ( )4
−F
f
i
Solve for
1n 50
4
=
4
12.5−9
¿ 24.5+( ) 12
5 = 12.5
3.5
¿ 24.5+( )12
5
17.5
¿ 24.5+
12
¿ 24.5+1.46
Q1=25.96
Solve for the quartiles of the following.
1. Ungrouped data: Q 1 and Q 3
36 38 37 28 21 23 42 22
28 27 38 39 55 37 34 28
29 25 27 42 48 28 31 33
40 23 49 45 48 31 45 27
43 34 51 28 26 42 28 24
2. Grouped data: Q 2
x f cf
4.6 – 4.9 5
4.2 – 4.5 4
3.8 – 4.1 5
3.4 – 3.7 14
3.0 – 3.3 15
2.6 – 2.9 9
2.2 – 2.5 5
1.8 – 2.1 3
i = 0.4 n=60
The DECILES are the score-points which divide a distribution into ten equal parts. These are
denoted by D1 , D2 , D3 ,… . D 9 .
1. The DECILES for ungrouped data
The formula for decile is:
Kn
Dk =
10
where:
D = the decile
k = from 1, 2, ... 9
n = the sample size
Example
The following is a list of scores resulting from an English examination administered to 40 students
(arranged in an array form from lowest to highest). Solve for D 3 , D 5 ,∧D 8 .
Scores
42 54 68 90
43 54 D3 69 91 D8
45 59 70 91
46 59 73 93
46 59 76 93
48 61 77 94
49 62 78 96
50 62 83 98
51 62 85 99
52 66 D5 89 100
Solution:
for D3 for D5 for D 8
3n 5n 8n
D 3= D 5= D 8=
10 10 10
3 ( 40 ) ¿ 5 ( 40 ) 8 ( 40 )
¿ ¿
10 10 10
120 200 320
¿ ¿ ¿
10 10 10
= 12 = 20 = 32
D 3=¿12 th¿ D 5=¿20 th ¿ D 8=¿32 th¿
Observation Observation Observation
54 66 91
2. The DECILES for grouped data
Formula:
kn
Dk =LCB+
10
[ ]
−cf
fk
i D k = L+
Kn
( )
10
f
−F
i
where:
Dk =¿ the deciles where k from 1, 2, 3,..., 9, 10
LCB = lower class boundary
n = sample size
cf = cumulative frequency
f = frequency where the lower limit is located
i = the interval
Example
Find the values of D1from the given frequency distribution of the scores in a History class of fifty
students.
Scores f cf
45 – 49 2 50
D9 L = 39.5 40 – 44 6 48
35 – 39 11 42
D5 L = 29.5 30 – 34 10 31
25 – 29 12 21
D1 L = 19.5 20 – 24 5 9
15 – 19 4 4
n = 50
Solution:
kn
D1=L+
1n
( )
10
−F
f
i
Dk =LCB+ [ ]
10
−cf
fk
i
Solve for
1n 1 ( 50 )
10
=
10
¿ 19.5+( 5−4
5 )
5
¿ 50
10
1
¿ 19.5+( ) 5 =5
5
5
¿ 19.5+
5
= 19.5 +1
D 1=20.5
Solve for the deciles of the following.
1. Ungrouped data: D3 and D7
36 38 37 28 21 23 42 22
28 27 38 39 55 37 34 28
29 25 27 42 48 28 31 33
40 23 49 45 48 31 45 27
43 34 51 28 26 42 28 24
2. Grouped data: D2
x f cf
4.6 – 4.9 5
4.2 – 4.5 4
3.8 – 4.1 5
3.4 – 3.7 14
3.0 – 3.3 15
2.6 – 2.9 9
2.2 – 2.5 5
1.8 – 2.1 3
i = 0.4 n=60
Percentiles are ninety-nine score points which divide a distribution into one hundred equal parts.
1. The PERCENTILES for ungrouped data
The formula for percentile is:
Kn
Pk =
100
where:
P = the percentile
k = from 1, 2, ... 99,100
n = the sample size
Example
Below are the scores of 40 students in an English examination (arranged in an array form). Solve
for P50 , P66 ,∧P98 .
Scores
42 54 68 90
43 54 69 91
45 59 70 91
46 59 73 93
46 59 76 93
48 61 77 P66 94
49 62 78 96
50 62 83 98
51 62 85 99 P98
52 66 P50 89 100
Solution:
for D3 for D5 for D8
50 n 66 n 98 n
P50= P66= P98=
100 100 100
50 ( 40 ) ¿ 66 ( 40 ) 98 ( 40 )
¿ ¿
100 100 100
2000 2640 3920
¿ ¿ ¿
100 100 100
P50=¿20 th¿ P66=¿26 th ¿ P98=¿39 th¿
Observation Observation Observation
(
100
f
−F
i )
where:
Pk =¿ the deciles where k from 1,2,3,...99,100
LCB = lower class boundary
n = sample size
cf = cumulative frequency
f = frequency where the lower limit is located
i = the interval
Example
Scores f cf
45 – 49 2 50
40 – 44 6 48
P80 L = 34.5 35 – 39 11 42
P50 L = 29.5 30 – 34 10 31
P20 L = 24.5 25 – 29 12 21
20 – 24 5 9
15 – 19 4 4
n = 50
Compute percentile P20from data above.
Solution:
20 n
P20=L+ ( 100
f
−F
)i
Solve for
20 n 20 ( 50 )
100
=
100
¿ 24.5+ ( 10−9
12 )
5
¿ 1000
100
1
¿ 24.5+ ( ) 5 = 10
12
5
¿ 24.5+
12
= 24.5 +0.41
P20=24.91
Solve for the quartiles of the following.
1. Ungrouped data: P25 and P38
36 38 37 28 21 23 42 22
28 27 38 39 55 37 34 28
29 25 27 42 48 28 31 33
40 23 49 45 48 31 45 27
43 34 51 28 26 42 28 24
2. Grouped data: P50
x f cf
4.6 – 4.9 5
4.2 – 4.5 4
3.8 – 4.1 5
3.4 – 3.7 14
3.0 – 3.3 15
2.6 – 2.9 9
2.2 – 2.5 5
1.8 – 2.1 3
i = 0.4 n=60
Measures of Dispersion/Variability:
1. Range for ungrouped data
The range is the difference between the highest value (HV) and the lowest value (LV) in the given
distribution.
Range=HS−LS+1
Example
Scores of 10 students in a spelling test.
11 8 12 7 5 3 4 15 9 6
Range=HS−LS+1
Range=15−3+1
Range=13 Range=12
2. Range for grouped data
To find range for a frequency distribution, just get the difference between the upper limit of the
highest class and the lower limit of the class interval.
Example:
Find the range for the frequency distribution shown below.
Score f
45 – 49 2
40 – 44 6
35 – 39 11
30 – 34 10
25 – 29 12
20 – 24 5
15 - 19 4
n = 50
solution:
Range = 49 – 15 + 1
= 35
Standard Deviation:
1. Standard Deviation of Ungrouped data
The formula is:
SD=√ s 2
Where:
SD = standard deviation
S2=¿ sample variance
OR
2
SD=
where:
√ ∑ ( X− X́ )
n−1
SD = standard deviation
∑ ( X− X́ )2=¿ sum of squares of x minus the mean, X́
n = sample size
Example
Scores of ten students in a spelling test. Find the standard deviation.
x ¿ X − X́ /¿ ¿¿
11 3 9
8 0 0
12 4 16
7 -1 1
5 -3 9
3 -5 25
4 -4 16
15 7 49
9 1 1
6 -2 4
∑ x =80 ∑ ¿ X− X́ /¿30 ∑ ¿¿ ¿
n = 10
Solution:
2
SD=
√ ∑ ( X− X́ )
n−1
130
OR SD=√ s 2
¿
√
10−1
130
¿ √ 14.44
¿
√9
¿ √ 14.44
SD=3.8
SD=3.8
2. Standard Deviation of Grouped data
The formula is:
∑ fM 2 −( x̄ )2
√
2
where:
SD =
n
SD=i
√ ∑ fd − 2
n−1
( ∑ fd )
n
SD = standard deviation
∑ fd 2 ∑ fM 2 = sum of the products between frequency and square of the midpoint.
n = sample size
x̄ = mean
Example:
In a given frequency distribution below, solve for the SD.
Scores f M fM
fM 2
45 – 49 2 47 94 4418
40 – 44 6 42 252 10584
35 – 39 11 37 407 15059
30 - 34 10 32 320 10240
25 - 29 12 27 324 8748
20 - 24 5 22 110 2420
15 – 19 4 17 68 1156
n=50
∑ fM= ∑ fM 2
1575 =
52625
Solution:
∑ fM 2
√
2
√ ( ∑ fd ) SD = −( x̄ )
2
∑ fd 2− n
n
SD=i
n−1
2
52625 1575
¿5
√
121−
(−5 )2
50−1
50
SD=
√
50
−
50 ( )
SD = 7.76
Solve for the Standard deviation of the following data.
1. Ungrouped data:
8, 9, 10, 12, 17, 18, 18, 19, 20, 21
2. Grouped data:
x f cf
4.6 – 4.9 5
4.2 – 4.5 4
3.8 – 4.1 5
3.4 – 3.7 14
3.0 – 3.3 15
2.6 – 2.9 9
2.2 – 2.5 5
1.8 – 2.1 3
i = 0.4 n=60
Quartile Deviation:
1. Quartile Deviation of Ungrouped data
The formula is:
Q 3 −Q1
QD=
2
where:
QD = Quartile Deviation
Q3= Quartile sub 3
Q1= Quartile sub 1
2 = constant
Example
Below are the scores of ten students in spelling. Solve for Quartile deviation.
Scores:
11 8 12 7 5 3 4 15 9 6
Solution;
Arrange the data in an array form from the lowest to the highest score.
3 4 5 6 7 8 9 11 12 15
Solve for Q 1
n
Q 1= Q 1 is between 4 and 5 thus
4
10 4+ 5 9
¿ = =4.5
4 2 2
Q 1=2.5 th
Solve for Q 3
3n
Q 3= Q 3 is between 9 and 11 thus
4
3(10) 9+11 20
¿ = =10
4 2 2
30
¿
4
Q 3=7.5 th
2. Quartile Deviation of Grouped data
The formula is:
Q 3 −Q1
QD=
2
where:
QD = Quartile Deviation
Q3= Quartile sub 3
Q1= Quartile sub 1
2 = constant
Example
Solve for the Quartile deviation for the given distribution.
scores f cf
45-49 2 50
40-44 6 48
35-39 11 42
30-34 10 31
25-29 12 21
20-24 5 9
15-19 4 4
n=50
3n
Q 3=L+ ( )
4
−F
f
i
Solve for
3 n 3 (50)
4
=
4
¿ 34.5+( 37.5−31
11 )
5 =
150
4
6.5
¿ 34.5+(
11 )
5 = 37.5
32.5
¿ 34.5+
11
¿ 34.5+2.95
Q 3=37.45
1n
Q1=L+
4
( )
−F
f
i
Solve for
1n 50
4
=
4
¿ 24.5+ ( 12.5−9
12 )
5 = 12.5
3.5
¿ 24.5+ (
12 )
5
17.5
¿ 24.5+
12
¿ 24.5+1.46
Q1=25.96
Determine the value of z such that the proportion of the area of the normal curve.
In determining z, we consider the upper value before the exact proportion of Area.
Example:
1. between the mean and z is 0.30
2. to the left of z is 0.70
4. between ± z is 0.30
2. Between 85 and 95
x− x̄ 85−100 x− x̄ 95−100
z= = =−1 z= = =−0 . 33
SD 15 SD 15
Area of Proportion = 0.212
3. Below 97
x− x̄ 97−100
z= = =−0 . 2
SD 15
Nonparametric tests – it do not require normality of the distribution. Under these test, the levels of
measurement are the nominal and ordinal data.
Nominal data – are data such as female and male, yes and no responses, political affiliations like
LP, Lakas, LDP and religious groupings Christian and non-Christian and other organization.
Ordinal data - are data such as Strongly agree, Agree, No opinion, Disagree and Strongly disagree
and also other data which employ rankings.
Statistical test depends on 3 things:
1. Type of Question
Difference – test of difference
Relationship – test of relationship
Expected and Observe Data – goodness-of –fit
Average – mean
Variable – Variability
2. Bell Shaped Graph
normally distributed
abnormally distributed
3. Level and Scale of Measurement
Nominal – proportion or percentages
Ordinal – median and other rank-order correlation test
Interval /Ratio – mean, SD, t – test, Pearson r, F – test, etc.
Hypothesis Testing:
1. Problem: Determine if you are looking for relationship, difference, goodness of fit, etc.
2. Hypotheses:
Null Hypothesis (H0) – means non existence
there is no difference…
there is no correlation…
there is no effect…
Alternate Hypothesis ( H1) – means there is an existence
there is a difference…
there is a correlation…
there is a effect…
3. Level of Significance ( α )
Significant --- Reject H0 --- 0.05
Not Significant --- Accept H0 --- 0.01
if applicable, degree of freedom (df)
-- is the cardinality of the numbers to get the specific mean, such that,
5, 3, 4, 6, 5, ___ average = 5, df = 6
-- we always use a formula/s for particular test to be use
4. Decision Rule: If the computed value (absolute value) is greater than the
critical value (positive value), reject the null hypothesis.
5. Apply the Appropriate Test
6. Decision: Whether to accept or reject the null hypothesis. We do not accept alternate
hypothesis and null hypothesis at the same time.
7. Interpretation: Rewrite your accepted hypothesis (include the word “significant” in your
statement).
Directional and Non-Directional Test:
Directional Test (One-Tailed Test)
We are considering one null hypothesis and one alternate hypothesis. If the computed value
(absolute value) is greater than the critical value (positive value, values can be seen in specific table
for particular test), reject the null hypothesis. Otherwise, accept null hypothesis.
P1=P 2 : H 0
P1≠P 2 : H 1
Non-Directional Test (Two-Tailed Test)
We will consider one null hypothesis and three alternate hypotheses.
P 1= P 2 : H 0 P 1≠ P 2 : H 1
P1 > P2 : H 1 P1 < P 2 : H 1
Pearson r
The Pearson Product Moment coefficient of Correlation, r is an index of relationship between two
variables of interval/ratio type. The independent variable can be represented by x while the
dependent variable can also be represented by y. It can be said that x influence y or y depends on x.
The formula is:
n ∑ xy−∑ x ∑ y
r=
√( n ∑ x 2−(∑ x )2 )( n ∑ y 2 −(∑ y )2 )
where:
r = the Pearson Product Moment coefficient of Correlation, r
n = sample size
∑ xy = the sum of the product of x and y
∑ x ∑ y = the product of the sum of ∑ x and the sum of ∑ y
∑ x 2= sum of the squares x
∑ y2= sum of the squares y
For positive/negative value:
0.80 – above high correlation
0.60 – 0.79 moderate high correlation
0.40 – 0.59 average/moderate correlation
0.30 – 0.39 low correlation
0.29 – below negligible correlation
Below are the midterm (x) and final (y) grades.
x 75 70 65 90 85 85 80 70 65 90
y 80 75 65 95 90 85 90 75 70 90
Spearman Rank-Order Correlation Rho
Rank the following data from higher value to lower value and number each data (rank 1 will be the
highest value).
4, 6, 7, 5, 2
How about this one:
4, 4, 7, 8, 8, 8, 3, 2, 1, 1
In ranking data, number the data from highest value to lowest value. In case of tie, get the average
of the same data and continue the ranking to the next number.
Example:
Set 1:
Data: 7 6 5 4 2
Rank: 1 2 3 4 5
Set 2:
Data: 8 8 8 7 4 4 3 2 1 1
Rank: 2 2 2 4 5.5 5.5 7 8 9.5 9.5
We have 3 eights, with the rank 1, 2, 3 then we get the average and that is 2. We continued with
rank 4. Two 4 has the rank of 5 and 6, and the average is 5.5, then rank 7, and rank 8. For the last
two data, we have 2 ones, with the rank of 9 and 10; the average of those numbers is 9.5.
2. The following is the ranking of two judges given to the work of 8 artists. Use ρ at 0.05 level to
test the null hypothesis that the two judges differ most in their opinions about these artist
Judge A Judge B
5 8
8 5
4 6
2 4
1 2
7 1
3 3
6 7
z-test
The z-test is another test under parametric statistics which requires the normality of the distribution.
It utilizes the two population parameter μ and σ . It is used to compare two means, the sample mean,
and the perceive population mean.
The tabular value of z-test at .01 and .05 level of significance.
Level of Significance
Test
.01 .05
one-tailed ± 2.33 ± 1.645
two-tailed ± 2.575 ± 1.96
The One-sample Mean Test
The one-sample mean test is used when the sample mean is being compared to the perceive
population mean. Used when your data is interval/ratio. The formula is
( x́−μ ) √ n
z=
σ
where:
x́ = sample mean
μ = hypothesized value of the population mean
σ = population standard deviation
n = sample size
Example:
1. ABC company claims that the average life time of a certain tire is at least 28 000
km. To check the claim, a taxi company puts 40 of these tires on its taxis and
gets a mean lifetime of 25 560 km. With a standard deviation of 1 350 km, is the
claim true? Use z-test at .05
2. A school principal in a laboratory school claimed that the reading
comprehension test of grade six pupils should have an average of 72.3 with the
standard deviation of 7.8. If 50 randomly selected grade six pupils have an
average of 76.7. Use the z-test to test the null hypothesis that m=72.3 against
the alternative hypothesis of m=72.3 at .05 level of significance.
z – test of Independent Proportion and Dependent Proportion
Used when the data is nominal.
z – test of Independent Proportion
This is used to determine if there is a significant difference between two different/independent
groups on situations that call for two types of responses of the nominal data.
table:
n1 n2
response 1 A B
response 2 C D
P1 −P2
z=
1 1
√
where :
pq
( +
n1 n2 )
A B
P1= P2 =
n1 n2
p=population proportion estimate
A+ B
p= q=1− p
n 1 +n2
Example:
Fifty teachers and 50 students are asked if they are in favor or against in RH bill.
Distribution of students and teachers who are in favor and against RH bill:
Students Teachers
In Favor A 25 B 30
Against C 25 D 20
50 50
Formula:
P1 −P2
z=
a+d
n
where :
√
A +B B+ D
P1= P 2=
n n
A D
a= d=
n n
Example:
Fifty Teachers were asked if they are in favor in RH bill. Thirty answered in favor and 20
answered against. Then all the teachers were invited to attend a house hiring in RH bill. The result is
written below:
Before Against in favor
30 in favor A 22 B 8
20 against C 16 D 4
50
t-test
The t-Test is used to compare two means. Ideally the t-test is used when there are less than 30
samples, but some researchers use t-test even if there are more data 30 samples. Actually it has the
same description with z-test, except that it is used for interval data.
z – test t – test
- Independent Proportion - Independent Means
| |
2 groups 2 groups
| |
nominal – frequencies interval/ratio (scores, grades,
weight, heights)
z – test t – test
-Dependent Proportion -Dependent Means
| |
1 group 1 group
| |
Before and After Responses Pretest and Posttest
| weight before and weight after
nominal |
interval/ratio
t - test of Independent Means/Uncorrelated Means
Used to determine if there is significant difference between two different groups/independent
groups in terms of means.
Experimental Variable (Independent Variable) – can be manipulated. Commonly called
treatment variable
Control Variable (Dependent Variable) – result e.g. age, height, IQ.
Formula:
x̄ 1− x̄ 2
t=
SD 11 SD 22
√
n1
where :
+
n2
t =t −ratio
x̄=mean
SD =S tan dard Deviation
n=size of the sample
Example:
1. An admission test was administered to incoming freshmen in College of Nursing and
Veterinary Medicine with 15 students each course. Each was randomly selected. The
mean scores of the given samples were 90 and 85, and the variances of the test score
were 40 and 35, respectively. Is there a significant difference between the two groups?
Use 0.01 level of significance.
2. Is there a significant difference between the average heights of males born from the two
different countries? A random samples yielded the following results:
t=
∑D
2
n ∑ D 2 −( ∑ D )
√
n−1
where :
t =t −ratio
D=difference of pretest and posttest
n=sample
Example:
An experimental study was conducted on the effect of programmed materials in English on the
performance of 20 selected college students. Before the program was implemented the pre-test was
administered and after 5 months the same instrument was used to get the post test result. The
following is the result of the experiment.
Pre-test Post test
X1 X2 D D2
20 25 -5 25
30 35 -5 25
10 25 -15 225
15 25 -10 100
20 20 0 0
10 20 -10 100
18 22 -4 16
14 20 -6 36
15 20 -5 25
20 15 5 25
18 30 -12 144
15 10 5 25
15 16 -1 1
20 25 -5 25
18 10 8 64
40 45 -5 25
10 15 -5 25
10 10 0 0
12 18 -6 36
20 25 -5 25
∑ D=−81 ∑ D 2=947
−81
D́=
20
= -4.05
Summary of t-test
Chi-Square Test
The Chi-Square is considered a unique test due to its 3 functions which are as follows:
The test of goodness-of-fit
The test of independence
The test of goodness-of-fit
This is a test of difference between the observed frequencies and expected frequencies. The
formula for Chi-square test is:
2 ( O−E )2
X =∑
E
where:
X 2 = the chi-square test
O = the observed frequencies
E = the expected frequencies
Example:
The theory of Mendel regarding crossing of peas is in the ratio of 9:3:3:1, meaning 9 parts are
smooth yellow and 1 part wrinkled green. The researcher conducted an experiment and the result
was that out of 560 peas, 310 were smooth yellow, 110 were smooth green and 40 were wrinkled
green. Is there significant difference between the observed and the expected? Use X 2 –test at .05
level of significance.
Solving by the stepwise method:
I. Problem:
Is there significant difference between the observed (actual experiment) and the expected (theory)
frequencies?
II. Hypotheses:
H0: There is no significant difference between the observed and the expected frequencies.
H1: There is significant difference between the observed and the expected frequencies.
III. Level of Significance:
α =.05
df =h−1
= 4-1
=3
2
X .05 =7.815
IV. Statistics:
X 2 - The test of goodness-of-fit
Computation: Add the ratio 9:3:3:1 = 16
Attributes Ratio (Actual result) (Theory)
Observed Expected
Smooth yellow :9 310 315
Wrinkled yellow :3 100 105
Smooth green :3 110 105
Wrinkled green :1 40 35
Total 16 560 560
Then divide 560 by 16 = 35
35
16 560
560
0
For expected frequencies multiply
35 x 9 = 315
35 x 3 = 105
35 x 3 = 105
35 x 1 = 35
2 ( O−E )2
X =∑
E
( 310−315 )2 ( 100−105 )2 ( 110−105 )2 ( 40−35 )2
¿ + + +
315 105 105 35
= .079 + .238 +.238 + .714
2
X = 1.269
V. Decision:
If the chi-square computed value is greater than chi-square tabular value reject H 0.
VI. Conclusion:
The Chi-square computed value of 1.269 is lesser than the chi-square tabular value of 7.815 at .05
level of significance with 3 degrees of freedom, so the null hypothesis is accepted.
The Chi-Square Test of Independence
The one-sample test of independence is used to determine if there is a significant relationship
between variables of the nominal type. The sample used in this test consists of members randomly
drawn from the sample population.
( O−E )2
X 2 =∑
E
where:
X 2 = the chi-square test
O = the observed frequencies
E = the expected frequencies
Σ = summation
Example:
1. Ninety individuals, male and female, were given a test in psychomotor skills and their scores
were classified into high and low. Using the –test of independence at .05 level of significance,
the table is shown as follows:
Scores
Sex High Low Total
O E O E
Male 18 28 46
Female 32 12 44
Total 50 40 90
Solving by the stepwise method:
I. Problem:
Is there significant relationship between sex and scores in psychomotor skill?
II. Hypotheses:
H0: There is no significant relationship between sex and scores in psychomotor skill
H1: There is significant relationship between sex and scores in psychomotor skill.
III. Level of Significance:
α =.05
df =( c−1 ) (r −1)
= (2 – 1)(2 – 1)
= (1)(1)
=1
2
X .05 =3.841 tabular value
IV. Statistics:
x 2-test of independence
Score
Sex High Low Total
O E O E
Male 18 (25.56) 28 (20.44) 46
Female 32 (24.44) 12 (19.56) 44
Total 50 40 90
For expected values: Multiply the column total to the row total and divide the product by the grand
total.
(50 x 46)/90 = 25.56
(50 x 44)/90 = 24.44
(40 x 46)/90 = 20.44
(40 x 44)/90 = 19.56
( O−E )2
X 2 =∑
E
(18−25.56 )2 ( 32−24.44 )2 ( 28−20.44 )2 ( 12−19.56 )2
¿ + + +
25.56 24.44 20.44 19.56
= 2.236 + 2.338 + 2.796 + 2.922
2
X =10.292
V. Decision:
If the X 2 computed value is greater than X 2 tabular value, reject H0.
VI. Conclusion:
The X 2 computed value of 10.292 is greater than the X 2 tabular value of 3.841 at .05 level of
significance with one degree freedom.
2. One hundred teachers were asked if they are in favour of k2-12 program implementation. The
result are written below:
O
Strongly Agreed 30
Agreed 35
Disagreed 15
Strongly Disagreed 15
Express No Opinion 5
Total: 100
F-test
The F-test is the analysis of variance (ANOVA). This is used in comparing the means of two
or more independent groups. One-way ANOVA is used when there is only one variable involve. The
two-way ANOVA is used when two variables are involved: the column and the row variables. The
researcher is interested to know if there are significant differences between and among columns and
rows. This also used in looking at the interaction effect between the variables being analyzed.
Like the t-test, the F-test is also a parametric test, which has to meet some conditions, and
the data to be analyzed if they are normal and expressed in an interval or ratio data. This test is
more efficient than other tests of difference.
I. Problem: Is there a significant difference in the average sales of the four brands of
shampoo?
II. Hypotheses:
H0:There is no significant difference in the average age sales of the four brands of
shampoo.
H 1 : There is a significant difference in the average sales of the four brands of shampoo.
III. Level of Significance:
α =0 . 05
df b = K −1= 4−1=3
where : K=no . of groups
df w = N − K =28− 4=24
IV. Decision Rule: If the computed value is greater than or equal to critical value, reject null
hypothesis.
V. Statistics:
F-test one-way-analysis of variance
Computation:
A B C D
X1 X 21 X2 X 22 X3 X 23 X4 X 24
7 49 9 81 2 4 4 16
3 9 8 64 3 9 5 25
5 25 8 64 4 16 7 49
6 36 7 49 5 25 8 64
9 81 6 36 6 36 3 9
4 16 9 81 4 16 4 16
3 9 10 100 2 4 5 25
∑ X 1=37 ∑ X 2=57 ∑ X 3=26 ∑ X 4=36
2 2 2
n1 =7 ∑ X 1=225 n 2=7 ∑ X 2=475 n3 =7 ∑ X 3=110 n 4=7 ∑ X 24=204
X̄ 1 =5 . 28 X̄ 2=8 . 14 X̄ 3 =3 . 71 X̄ 4 =5.14
∑ X =156 ∑ X 2=1014
V.1. Compute the total sum of squares
2
2 (∑ x )
SS t =∑ X −
N
( 156 )2
SS t =( 1014 )−
28
SS t =144 . 86
V.2. Compute the sum of squares for between group.
2 2 2 2 2
(∑ X 1 ) ( ∑ X 2 ) (∑ X 3) (∑ X 4 ) (∑ X )
SS b = + + + −
n1 n2 n3 n4 N
2 2 2 2 2
( 37 ) ( 57 ) ( 26 ) ( 36 ) ( 156 )
SS b = + + + −
7 7 7 7 28
SS b =72 .29
V.3. Compute the sum of squares for within groups.
SS w =SSt −SS b
SS w =114 .86−72 .29
SS w =42. 57
V.4. Compute the mean squares/variance estimate
- for between group
SSb 72. 29 72. 29
MS b = = = =24 .10
K −1 4−1 3
- for within groups
SS w 42. 57 42 .57
MS w = = = =1 .77
N −K 28−4 24
V.5. Compute for F-ratio
MS b 24 . 10
F= = =13. 62
MS w 1. 77
V.6. Summary
Source of Sum of df Mean F-Ratio
Variance Squares Squares
Between 72.29 3 24.10 13.62
Within 42.57 24 1.77
Total 114.86
VI. Decision: Reject null hypothesis
VII. Interpretation:
There is a significant difference in the average sales of the four brands of shampoo.
1.
H0: There is no significant difference in the performance of the three groups
of students under three different instructors.
H 1 : There is a significant difference in the performance of the three groups of
students under three different instructors.
2.
H0: There is no significant difference in the performance of the three groups
of students under three different methods of teaching.
H 1 : There is a significant difference in the performance of the three groups of
students under three different methods of teaching.
3.
H0: Interaction effects are not present.
H 1 : Interaction effects are present.
Teacher Factor
A B C
40 50 40
41 50 41
Method of
40 48 40
Teaching 1
39 48 38
38 45 38
Total
40 45 50
41 42 46
Method of
39 42 43
Teaching 2
38 41 43
38 40 42
Total
40 40 40
43 45 41
Method of 41 44 41
Teaching 3 39 44 39
38 43 38
Total
Total
1.
H0: There is no significant difference in the performance of the three groups
of students under three different instructors.
H 1 : There is a significant difference in the performance of the three groups of
students under three different instructors.
2.
H0: There is no significant difference in the performance of the three groups
of students under three different methods of teaching.
H 1 : There is a significant difference in the performance of the three groups of
students under three different methods of teaching.
3.
H0: Interaction effects are not present.
H 1 : Interaction effects are present.
III. Level of Significance:
α=0 . 05
df t =N −1=45−1=44
df w =k (n−1)=9 ( 5−1 )=9 ( 4 ) =36
df c =c−1=3−1=2
df r =r−1=3−1=2
df c⋅r =( c −1 )( r−1 )=( 3−1 )( 3−1 )=( 2 ) ( 2 )=4
Critical value :
df (columns )=2/36=3 . 26
df (rows)=2/36=3 . 26
df ( int eraction )=4/36=2. 63
IV. Decision Rule: If the computed value is greater or equal to the critical value, reject null
hypothesis.
V. Statistics:
F-test two-factor ANOVA
Computation:
A B C
40 50 40
41 50 41
Method of
40 48 40
Teaching 1
39 48 38
38 45 38
Total 198 241 197 636
40 45 50
41 42 46
Method of
39 42 43
Teaching 2
38 41 43
38 40 42
Total 196 210 224 630
40 40 40
43 45 41
Method of 41 44 41
Teaching 3 39 44 39
38 43 38
Total 201 216 199 616
Total 595 667 62 1882
(GT )2 ( 1882 )2
CF= = =78709 . 42
N 45
2 2 2 2
SS t =40 +41 +. ..+39 +38 −CF
SS t =79218−78709. 42=508. 58
1982 196 2 2012 2412 210 2 2162 1972 2242 1992
SS w =79218− + + + + + + + +
5 5 5 5 5 5 5 5 5
SS w =79218−79088. 8=129. 20
5952 +667 2 +6202
SS c = −CF
15
1183314
SS c = −78709 . 42=78887 . 6−78709 . 42=178 . 18
15
636 2 +6302 + 6162 1180852
SS r = −CF= −78709 . 42=78723. 47−78709 . 42=14 . 05
15 15
SS c⋅r =SSt −SS w −SS c−SS r =508 .58−129. 2−178 . 18−4 . 05=187 . 15
Source of F-Value
SS df MS
Variation Computed Tabular Interpretation
Between
Column 178.18 2 89.09 24.82 3.26 S
s
Rows 14.05 2 7.02 1.95 3.26 NS
Interaction 187.15 4 46.79 13.03 2.63 S
Within 129.20 3 3.59
Total: 508.58 44
F-Ratio:
MS c 89. 09
Columns= = =24 . 82
MS w 3. 59
MS 7 . 02
Rows= r = =1 . 95
MS w 3 . 59
MS 46 .79
Interaction= i = =13 . 03
MS w 3 . 59
VI. Decision:
columns – reject null hypothesis
rows – accept null hypothesis
interaction – reject null hypothesis
VII. Interpretation:
1. There is a significant difference in the performance of the three groups of students
under three different instructors.
2. There is no significant difference in the performance of the three groups of students
under three different methods of teaching.
3. Interaction effects are present.
Generalization:
Test of Relationship:
1. Pearson r – used to determine if there is a correlation between two variables of interval/ratio
type. (interval/ratio)
2. Spearman Rank Order Correlation – used to determine if there is a correlation between two
variables of the ordinal type. (ordinal)
3. Chi-Square of Independence/Association – used to determine if there is a significant
relationship between variables of the nominal type. (nominal)
Tests of Difference
1. z-test of Independent Proportion – used to determine if there is a significant difference
between two different/independent groups on situations that call for two type of responses.
(nominal)
2. z-test of Dependent Proportion – used to determine if there is a significant difference
between pairs of observation from a single group or used to determine if the responses of the
members of a group in two situation are correlated. (nominal)
3. t-test of Independent/Uncorrelated Means – used to determine if there is a significant
difference between group/independent group in terms of means. (interval/ratio)
4. t-test of Dependent/Correlated Means – used to determine if there is a significant difference
between two groups of two sets of scores/weight/height etc. (interval/ratio)
5. F-test/ANOVA – extension of t-test. Used for two and more independent variables.
(interval/ratio)
Other Test:
1. One-Population z-test – used to determine if a mean was drwn from the given population.
(interval/ratio)
2. Chi-Square of Goodness-of-fit – used to determine if there is a significant difference between
observed distribution and expected distribution. (nominal)
Determine which test to be used based in the given problem.
1. The following are vocabulary and spelling scores of 10 students which is ranked. Is there a
significant relationship between the two subjects?
2. A special education teacher wishes to determine preferences of the mentally gifted children –
boys and girls for three game activities.
3. Is there a significant relationship between Academic Achievement and Motivation of the
students in AGS?
4. Creativity and Personality was ranked. Is there a significant relationship between the two
variables?
5. Fifty male and 50 female students where asked about their color preferences.
6. Is there a significant difference between stressful and unstressful situations in terms of short-
term memory?
7. Do the five groups of special children differ in terms of error scores on the psychomotor ability
test?
8. Is there a significant difference between teachers and school administrators in terms of
attitude toward mentally impaired children?
9. Are the proportions of children passing the two items significantly different from each other?
10. Is there a significant difference between the proportions in each group that responded
“infavor”?
11. Does this differ from what the pupil would expect to obtain if he or she had guessed all the
items?
12. It is claimed by a softdrinks manufacturing company that their one-liter bottle contains on the
average, 0.97 liter with standard deviation of 0.02. A certain sample was got from the said
company. Is the company deceiving its customers?
Summary:
Statistical Test Type of Data
Test of Relationship:
1. Pearson r Interval/Ratio
2. Spearman Rank Order Ordinal
3. Chi-Square of Nominal
Independence
Test of Difference:
1. z-test Nominal
independent
dependent
2. t-test Interval/Ratio
independent
dependent
3. F-test Interval/Ratio
ANOVA I
ANOVA II
Other Test:
1. z-test one population Interval/Ratio
mean
2. chi-square test Nominal
Below are the lists of thesis to be used (SY 2010-2011):
1. Positive Effects of Reading the School Newspaper to the Academic Performance and
Behaviour of Selected High School Students of AGS
2. Disturbances Affecting the Performance of AGS High School Students in Mathematics
3. Perceptions Toward College Education of Selected Senior Students of AGS
4. The Perceptions of Fourth Year High School Students Towards the NCAE Review
5. Effects of Numerous New Teachers to the Relational Aspect of High School Students
of AGS
6. Techniques Approaches of Teachers that Contribute to the Comprehension of
Selected Grade Six Students of AGS Towards the Introduction to Algebra
7. The Effect of Weekly Mastery Test to the Academic Performance of the High School
Students of AGS
8. Effects of Cumulative Grading System to the High School Students of AGS
9. Factors Affecting Students’ Failure in Science Subject as Perceived by Selected High
School Students
10. The Effectiveness of the Techniques Used by Mathematics Teachers in Introducing
and Discussing the Lessons to the High School Students of AGS
11. The Effects of Social Networking Sites to the Academic Performance of the 3 rd Year
and 4th Year High School Students of AGS
Summary of Statistics: