0% found this document useful (0 votes)
71 views46 pages

Handout

Statistics is the branch of mathematics dealing with collecting, organizing, analyzing, and interpreting data. There are different types of data including qualitative data representing attributes and quantitative numerical data. Variables can be classified based on their continuity, functional relationships, and scale of measurement. Common scales include nominal for categorical variables, ordinal for ordered variables, interval for differences that can be calculated, and ratio where all arithmetic is possible. Summarizing and describing data is descriptive statistics, while drawing conclusions is inferential statistics. Population refers to all individuals under study, while a sample is a subset. Parameters describe populations and statistics describe samples. Sample size is determined based on population size and margin of error. Summation notation uses the Greek letter sigma to represent adding

Uploaded by

Rylle
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
71 views46 pages

Handout

Statistics is the branch of mathematics dealing with collecting, organizing, analyzing, and interpreting data. There are different types of data including qualitative data representing attributes and quantitative numerical data. Variables can be classified based on their continuity, functional relationships, and scale of measurement. Common scales include nominal for categorical variables, ordinal for ordered variables, interval for differences that can be calculated, and ratio where all arithmetic is possible. Summarizing and describing data is descriptive statistics, while drawing conclusions is inferential statistics. Population refers to all individuals under study, while a sample is a subset. Parameters describe populations and statistics describe samples. Sample size is determined based on population size and margin of error. Summation notation uses the Greek letter sigma to represent adding

Uploaded by

Rylle
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 46

INTRODUCTION TO STATISTICS:

Statistics – branch of Mathematics which deals with aspect of scientific method:


 Collection
 Presentation or Organization
 Analysis
 Interpretation
Statistics is a Habit- you are not aware that you’re doing it.
Data – point to statistical facts, principles, opinions and variation of items of different sources.
We categorize different data according in different characteristics.
1. Data – facts, or a set of information, or observation under study
2. Qualitative Data – data which can assume values that manifest the concepts of attributes.
3.Quantitative Data – data which are numerical in nature
4. Variable – a characteristic or property of sample or population which make the members different
from each other
5. Constant - a characteristic or property of a population or sample which make the members of the
group similar to each other
Variables According to Continuity of Values:
1. Discrete Variable – one that can assume a finite number of values
2. Continuous Variable – one that can assume infinite values within a specified interval
Reporting by group, divide the class into 4 groups. Each group is given a newspaper for their
example/s.
Variable according to Functional Relationship/Levels of Measurements According to Scale:
1. Dependent Variable – a variable which is affected or influenced by another variable
2. Independent Variable – variable which affects or influences the dependent variable
Variable According to Scale of Measurements:
1.Nominal Variables – variable which can be classified into two or more categories
a. Real Nominal Variables are those who area
classified based on the naturally occurring
attribute.
b. Artificial Nominal Variables are those
classified based on “man-made” attributes
following certain rules.
2. Ordinal Variables – grouped variables according to rank or order of the categories
3. Interval Variables – variables that can be ranked and for which the difference between two values
can be calculated and interpreted
4. Ratio Variables – can be ranked and for which all arithmetic operations can be done.
Explanation:
1. With Nominal Variables, we can only say that one object is different from another, but the amount
of difference between them cannot be determined. We cannot tell that one is better or worse than
the other.
e.g. gender, nationality, and civil status
2. With Ordinal Variables, we can say that one is better or greater than the other, but we cannot tell
the amount of difference.
e.g. the ranking of beauty contests, siblings in the family
3. With Interval Variables and Ratio Variables, we can say not only that one object is greater or les
than the other, but we can also specify the amount of difference.
4. The difference of Interval Variables and Ratio variables is that the ratio variables has the concept
of absolute zero or true zero point. The zero point of Interval variables is arbitrary and does not
reflect an absence of the attribute.
e.g. Suppose a student got zero in test in Statistics. Does it mean that the student has absolutely no
knowledge of statistics? But if a person has no money, we can say that he has zero pesos.

Classify the given variables according to scale of measurement.


1. Family Income (in peso)
2. Candidate voted for in 2002 barangay elections
3. Tax identification
4. Gender
5. Average number of glasses of water consumed per day
6. Blood Pressure
7. Height of students
8. Number of clients
9. number of won cases in court.
10. academic ranks in high school.
Two Main Division of Statistics
1. Descriptive statistics
- to summarize and describe data
2. Inferential statistics
- to draw conclusion from them
Examples
Determine whether the given statement is Descriptive or Inferential statistics.
1. An instructor ranks his students according to their average grade. (descriptive)
2. A psychologist predicts the effect of music to a student academic performance. (Inferential)
In statistics, we always consider different topics to analyze and interpret. Considering a topic we
consider the group of people or things/object. Consider the figure below:

Define the following:


1. Population (– large collection of objects, persons, places, or things)
2. Sample (– small portion or part of the population)
3. Parameter (– a value or measure obtained from population)
4. Statistics (– a value obtain from sample)
Note:
Population is usually denoted by N
Sample is usually denoted by n
The problem that is commonly encountered is determining the sample size. It is not advisable to set
a certain percentage: instead, the margin of error which is from 1% to 10% (we will use 1% or 5%) in
social sciences researches should be considered. The computation of the sample size, relative to
the population size has this formula:
N
n=
1+ Ne 2
where:
N = the population size
e 2 = the margin of error
n = the sample size
Example:
Find the sample size if the population size is 2500 at 95% accuracy.
Solution: At 95% accuracy, the corresponding percentage margin of error is 5% or 0.05 using the
formula,
N
n=
1+ Ne 2
2500
n=
1+2500(0 . 05 )2
2500
n=
7 . 25
n=344 .83 or 345
What is the sample size of each of the following given population size with the corresponding margin
of error?
a. N = 500; e = 5%
b. N = 20 000; e = 10%
c. N = 3 600; e = 7%
d. N = 10 200; e = 8%
e. N = 100 000; e = 5%
Suppose my weight 3kg heavier than last month (January) and gain another 2kg for next month. And
another 3kg for the next month. Then I consulted my dietary doctor for medication. My doctor
suggests that I must take slimming pills. (Gain/loss in my weight is summarize in the table below.)
January February March April May
3 kg 2kg 3 kg - 2 kg - 4 kg
What is the gain/lose in my weight from January to March? From March to May?
What is the total/sum of the gain/lose weight of my body?
Talking about sum or total, we are also talking about one mathematical symbol indicated with a
Greek letter “sigma” - Σ .
The calculation of many statistical measures requires taking the sum of a number of variates. We
hereby introduce a standard symbol, the capital Greek letter sigma, Σ , to denote a sum.
n
∑ Xi
The symbol i=1 shall mean that we will add all values of X from 1 to n . The notation is read
“the summation of all values of X from 1 to n.”

Examples
Consider the grades obtained by five high school students 76, 84, 90, 85, and 74. Thus X 1=76,
X2=84, X3=90, X4=85, and X5=74.
Find
5
∑ Xi
1. i=1
4
∑ Xi
2. i=2
5
∑ Xi
3. i=3
n
∑ X i=X 1 + X 2 + X 3+ .. .+ X n
The summation of n number of observations is represented as i =1
n
∑ nc
Present the symbol i=1 , where c is a constant, is equal to the product of the constant and n.

Proof:
n
∑ nc
i=1 = c1+c2+c3+…+cn = nc
n
∑ ( X i +Y i )
Present the symbol i=1 , where X and Y are two variables, is equal to the sum of their
summations.
Proof:
n
∑ ( X i +Y i )
i=1

= ( X 1+Y 1 ) +( X 2+Y 2) + ( X 3+Y 3) +. . .+ ( X n +Y n )


= ( X 1+X 2+ X 3+. ..+ X n )+ (Y 1 +Y 2 +Y 3+. ...Y n ) or
n n
∑ X i +∑ Y i
= i=1 1

Examples
Find the summation of the following
4
∑6
1. i=1
2. Given:
4 xi yi xi+yi
∑ ( X i +Y i ) 1 3 4
i=1 2 8 10
n
5 7 12
∑ nc 4 6 10
The sum of n number of constant is i=1 = c1+c2+c3+…+cn = nc

The summation of the sum of several variables is equal to the sum of the terms taken separately.
n
∑ ( X i +Y i ) ( X 1+Y 1 ) +( X 2+Y 2) + ( X 3+Y 3) +. . .+ ( X n +Y n )
i=1 =
n n n
∑ ( X i +Y i ) ∑ X i +∑ Y i
or i=1 = i=1 1

Another rule use in summation notation is the summation of the sum of variable and a constant
which is equal to the summation of the variable plus the product of n and the constant.
n n
∑ ( X i +c )=∑ X i + nc
i=1 i=1

Proof:
n
∑ ( X i +c ) ( X 1 +c ) +( X 2 +c ) + ( X 3 +c ) +. ..+ ( X n +c )
i=1 =
n
∑ ( X i +c ) ( X 1+X 2+ X 3 +. . .+ X n ) +nc
i=1 =
or
n n
∑ ( X i +c ) ∑ X i +nc
i=1 = i=1
Example
Present the table below
Xi c Xi + c
2 4 6
5 4 9
6 4 10
7 4 11
10 4 14
Total 30 20 50
5
∑ ( X i +4 )
i=1
n
∑ ( X i +c )
The sum of n number of variable added to the constant c is i=1 =
( X 1+X 2+ X 3 +. . .+ X n ) +nc
or
n n
∑ ( X i +c ) ∑ X i +nc
i=1 = i=1
To obtain the sum of the squares of variables, we first take the square of all observations, and then
get the sum.
Then,
n
∑ ( X i )2 2 2 2
( X 1 ) + ( X 2 ) + ( X 3 ) +. ..+ ( X n )
2
i=1 =
5
∑ ( X i )2
i=1
In contrast to the sum of the squares of variables, take the sum of the variable first before getting
the square to evaluate the square of the sum of variables
Then,
n 2

( )
∑ Xi
i=1 = ( X 1+ X 2 + X 3 +. . .+ X n )
2

5 2

(∑ )
i=1
Xi

Generalization
1. The sum of the square of n number of observations is
n
∑ ( X i )2 X
2
+X
2
+X
2
+. ..+ X
2
i=1 = ( 1) ( 2 ) ( 3 ) ( n)
2. The square of the sum of n number of observations is
n 2

( )
∑ Xi
i=1 = ( X 1+ X 2 + X 3 +. . .+ X n )
2

To obtain the sum of the product of n variables, we first take the product of each pair of
observations, and then get the sum.
Xi Yi (Xi)(Yi)
3 5 15
4 6 24
6 2 12
7 4 28
10 7 70
n

Total
∑ X iY i
i−1 =14
9

Solution:
5
∑ X i Y i =( 3 ) ( 5 ) +( 4 ) ( 6 ) +( 6 ) ( 2 ) +( 7 ) ( 4 )+ ( 10 )( 7 )
i=1
5
∑ X iY i
i=1 = 149

Generalization
The sum of the product of n pairs of observations is
n
∑ X i Y i =( X 1 )( Y 1 ) +( X 2)( Y 2 ) +( X 3 )(Y 3 )+ .. .+( X n)( Y n)
i=1

n
∑ cx i =cx 1+ cx 2 +cx 3 +. ..+cx n
Present the i=1
n
∑ cx i =c ( x 1+ x 2+ x 3+. ..+ x n )
By factoring i=1 or
n n
∑ cx i =c ∑ xi
i=1 i=1

Consider the value of i x = {1,2,3,4,5 }


. Get the summation of the variable and constant 3.
The sum of the product of a constant and variable is
n n n
∑ cx i =cx 1+ cx 2 +cx 3 +. ..+cx n ∑ cx i =c ∑ xi
i=1 or i=1 i=1
COLLECTING AND GATHERING DATA:
Introduce the Characteristic of a Good Question:
1. A good question is unbiased.
Ex. Do you like classical music?
2. A good question must be clear and simply stated.
Ex. What is your average grade last school year?
3. Questions must be precise.
Ex. In terms of mathematical ability, do you think male and female are equal?
4. Good questionnaires lend themselves to easy analyses.
Types of Questions Asked in Survey Questionnaire
A. According to form:
1. Free-Answer Type (Essay)
2. Guided Response Type
 Recall Type
Please supply the information asked for:
age____ sex_____ Date of Birth______ Place of Birth ______
 Recognition Type
a. Dichotomous (two options and one is selected)
Are you taking the school service? Yes____ No_____
b. Multiple Choice
Example:
What is your highest educational attainment?
Please put a check mark before your reply:
___ Elementary Graduate
___ High School Graduate
___ College Graduate
c. Multiple Response (two or more options may be chosen)
Why do you use toothpaste in brushing your teeth?
Please check marks before your choices
___ It prevents tooth decay
___ It freshens the breath
___ It is soothing to the mouth
___ It is cheap
___ It is imported
___ Others: _________
B. According to the Kind of Data Asked for:
1. Descriptive (Verbal) Data
What kind of house do you live in?
____ Concrete ____ Semi-Concrete Wooden
____ Bamboo ____ Others: _____________
2. Quantified (Numerical) Data:
How old are you? ____
What is your average monthly income? P_______
3. Intensity of Feeling, Emotion or Attitude
Do you agree to have only one day election, set for local and national election?
___ Strongly Agree
___ Agree
___ Fairly Agree
___ Disagree
___ Strongly Disagree
4. Degree of Judgment
How serious is the problem (drug addiction, drinking, stealing, etc.)?
___ Very Serious
___ Fairly Serious
___ Serious
___ Not Serious
___ Not a Problem
5. Understanding
Explain what democracy is.
6. Reasoning
Why do you prefer democracy to dictatorship?

PRESENTATION OF DATA:
Suppose a statistics class with 30 students is given an examination and the raw scores are shown:
48 70 60 35 59
79 59 71 59 47
30 49 68 78 59
32 36 50 38 68
32 57 65 65 58
73 45 55 66 50
How can we make our own frequency distribution based on the given data?
B. Development of the Lesson:
1. Compute the range of the data. Look for the lowest score and the highest score and then get the
difference plus one, that is:
Range = HS – LS +1
2. Get the interval size (i) by dividing the range by the average of 10 and 15.
Range
i=
12. 5
The divisor (12.5) is actually the average of 10 to 15 which is the ideal number of classes to be
made. Round off your in to the nearest whole number.
3. Construct your class intervals – start it with a score which is divisible by your interval size (i) take
note of your lowest score. If your i is 6 and your lowest score is 19 then start at 18 because 19 is not
divisible by 6. Twenty-four is divisible by 6 but if you start at 24, 19 will be excluded. So our class
interval starts at 18-23, then 24-29 then 30-35 and so on.
4. Afterwards get the frequency/ies (f) of each class interval that is how many got a score from 18-
23, 24-29 and so on.
Example:
f
30-35 2
24-29 5
18-23 3
After you have finished, get the sum of the frequencies. It should be equal to N (number of cases), if

your N is 40, your ∑ f should also be 40.


5. Then supply the following numbers, midpoint and cumulative frequency
Suppose a mathematics class with 30 students is given an examination and the raw score are given
below.
48 73 57 50 78 47
79 70 45 65 38 59
30 59 60 55 65 68
32 49 71 35 66 58
32 36 68 59 59 50
Determine the frequency distribution of the data.
Statistical data collected should be arranged in such a manner that will allow a reader to distinguish
their essential features.
Data may be presented by;
Textual form is utilized when the data to be presented are purely qualitative or when very few
numbers are involved.
Tabular a more effective device of presenting data. Sometimes it also referred to as Frequency
Distribution Table.
Graphical or pictorial form is the most effective device for attracting people’s attention.
Throughout Day 2 to Day 5, the students will create their own graph, based in the given data.
ANALYZING DATA:
Any measure indicating the center/attribute of a set of data, arrange in an increasing or decreasing
magnitude is called measure of central tendency.
Among the measures of central tendency, the mean is considered the most popular and most
widely used.
Formula
Mean for Ungrouped data x́=
∑x
n

∑ fM ∑ fd
Mean for Grouped data
x̄=
n
x́= Am+ ( )
n
i

Ungrouped data:
1. What is the mean of the ages of 9 children in a slum area given below:
9, 8, 1, 3, 4, 5, 6, 7, 2
Grouped data:
2. A frequency distribution of the scores in Statistics III of 34 BSN students.
x f M fM
35-39 3 37 111
30-34 5 32 160
25-29 8 27 216
20-24 10 22 220
15-19 4 17 68
10-14 2 12 24
5-9 2 7 14
i=5 n = 34 fM
∑ =8
13

Class f M fM
Interval
20-24 2
25-29 6
30-34 9
35-39 10
40-44 12
45-49 7
50-54 4
i=5 n = 50

The following example may illustrate the second formula, but instead using M, we will use w to show
that it represents the weight of the data and x is the given data aside for weight.
This mean is called the weighted mean.
Example:
Here are the grades obtained by a student in the different criteria for grading. The weight for each
criterion is given.
Criteria Grades (x) Weight (w) xw
Long Tests 80 3.0 240
Quizzes 85 2.0 170
Departmental 82 2.5 205
Tests
Class 88 1.5 132
Participation
Homework and 85 1.0 85
Projects
Total 10.0
∑ xw =832

x̄=
∑ xw 832
=83 . 20
Applying the formula n = 10 .0
Determine the weight mean:
1. Grades of a student
Subjects Grades No. of Units
Science 88 5
English 81 3
Social Studies 79 3
Mathematics 82 3
Physical Education 85 1
2. A questionnaire elicited some answers on attitudes of parents toward the 3-day-a-week classes.
The range from strongly agree to strongly disagree with the assigned weights.
Response Weights Frequency
Strongly Agree 19
Agree 23
Undecided 5
Disagree 2
Strongly Disagree 1
The MEDIAN, ungrouped data, is the value found at the middle when the data arranged in an array
from the lowest to highest or from highest to the lowest.
If there are two middle values, the average is taken.
FORMULA for the Median of grouped data:
n
~x=LCB+

where:
2
f[ ]
−cf
i

~
Md or x = median
L = lower limit
n/2 = half sum
cf = cumulative frequency
f = frequency where the lower limit is located
i = interval
Examples
Solve for the value of the Median.
Ungrouped data
121, 108, 120, 98, 132, 100, 92, 140, 102, 98
solution:
Arranging the values in ascending order, we obtain
92, 98, 98, 100, 102, 108, 120, 121, 132, 140
since there are two middle point values, 102 and 108, then:
102+108
Md=
2
Md=105
Grouped data
Frequency distribution of the scores in statistics of 34 students.

Scores (X) f cf
35-39 3 34
30-34 5 31
25-29 8 26
20-24 10 18
15-19 4 8
10-14 2 4
5-9 2 2
n = 34
Solve for the value of the median.
Distribution of the lives of 60 batteries.
x f cf
4.6 – 4.9 5
4.2 – 4.5 4
3.8 – 4.1 5
3.4 – 3.7 14
3.0 – 3.3 15
2.6 – 2.9 9
2.2 – 2.5 5
1.8 – 2.1 3
i = 0.4 n=60
The MODE is the value that occurs most often or with greatest frequency.
For grouped data, we consider two formulas,
by inspection: get the midpoint of the highest frequency (and decide whether the mode is unimodal-
one mode, bimodal-two modes, multimodal-three or more modes or no mode)
by formula (you can only use this if the data is unimodal, if other kind of data, we cannot use this
formula-the final answer will be the inspected value.)
Mo=3 Md−2 x́
where:
Mo = Mode
3 Md = three times the median
2 x́ = two times the mean
Examples
Solve for the mode.
Ungrouped data
The following are scores 11 students in spelling test of 20 items.
4, 5, 8, 8, 8, 9, 12, 12, 15, 19, 20
Grouped Data
Scores f cf
35-39 3 34
30-34 5 31
25-29 8 26
20-24 10 18
15-19 4 8
10-14 2 4
5-9 2 2
n = 34
The value of the mean and the median of this example were already computed in the previous
discussion, the values are 23.91 and 24 respectively.
Substitute these values to the formula
solution:
by inspection: 22 (unimodal, so we can use the formula)
Mo=3 ( 24 )−2 (23.91 )
Mo=72−47.82
Mo=24 . 18
Solve for the mode.
Distribution of the lives of 60 batteries.
x f cf
4.6 – 4.9 5
4.2 – 4.5 4
3.8 – 4.1 5
3.4 – 3.7 14
3.0 – 3.3 15
2.6 – 2.9 9
2.2 – 2.5 5
1.8 – 2.1 3
i = 0.4 n=60
Comparison of mean, median and mode:
1. mean – stable, dependent and reliable measure of central tendency. It determines the normality
(Peakedness) of the normal curve.

2. Median – it determines the skewness of the given normal distribution.

3. Mode – unstable, undependable and unreliable measure of central tendency.


Measures of central locations are important measures which divide the distribution into parts or
subgroups.
The QUARTILES are the score-points which divide a distribution into four equal parts.
The following are the formula for Q1, Q 2, and Q3for ungrouped data:
n 2n n 3n
Q 1= , Q 2= or Q 2= , Q 3=
4 4 2 4
Formula for grouped data:
kn
Qk =LCB+
4
[ ]
−cf
fk
i Qk =L+
kn
4
( )
−F
f
i

where:
Q k = quartile where k is from 1, 2, 3
LCB = lower class boundary
n = sample size
cf = cumulative frequency
f = the frequency where the lower limit is located

Examples
1. Ungrouped data
The following is a list of scores resulting from an English examination administered to 40 students,
Find the first , second , and third quartiles:
91 61 46 62 54
62 93 90 99 76
48 83 59 96 66
94 52 51 59 62
89 100 92 70 59
91 73 68 49 54
85 43 78 50 45
98 69 77 42 46
solution:
Arrange the following scores from the lowest to the highest.
Scores(x): n = 40
42 43 45 46 46 48 49 50 51 52 54 54 59
59 59 61 62 62 62 66 68 69 70 73 76 77
78 83 85 89 90 91 91 92 93 94 96 98 99
100
For
40
Q 1=
4
¿ 10
Q 1 = the 10th observation is 52
40
Q 2=
2
¿ 20
Q 2 = the 20th observation is 66
3 ( 40 )
Q 1=
4
¿ 30
Q 3 = the 30th observation is 89

2. Grouped data
Find the first quartiles of the frequency distribution of the scores of fifty students in a History class.
scores f cf
45-49 2 50
40-44 6 48
35-39 11 42
30-34 10 31
25-29 12 21
20-24 5 9
15-19 4 4
n=50
1n
Q 1=L+ ( )4
−F
f
i
Solve for
1n 50
4
=
4

12.5−9
¿ 24.5+( ) 12
5 = 12.5
3.5
¿ 24.5+( )12
5
17.5
¿ 24.5+
12
¿ 24.5+1.46
Q1=25.96
Solve for the quartiles of the following.
1. Ungrouped data: Q 1 and Q 3
36 38 37 28 21 23 42 22
28 27 38 39 55 37 34 28
29 25 27 42 48 28 31 33
40 23 49 45 48 31 45 27
43 34 51 28 26 42 28 24

2. Grouped data: Q 2
x f cf
4.6 – 4.9 5
4.2 – 4.5 4
3.8 – 4.1 5
3.4 – 3.7 14
3.0 – 3.3 15
2.6 – 2.9 9
2.2 – 2.5 5
1.8 – 2.1 3
i = 0.4 n=60
The DECILES are the score-points which divide a distribution into ten equal parts. These are
denoted by D1 , D2 , D3 ,… . D 9 .
1. The DECILES for ungrouped data
The formula for decile is:
Kn
Dk =
10
where:
D = the decile
k = from 1, 2, ... 9
n = the sample size
Example
The following is a list of scores resulting from an English examination administered to 40 students
(arranged in an array form from lowest to highest). Solve for D 3 , D 5 ,∧D 8 .
Scores
42 54 68 90
43 54 D3 69 91 D8
45 59 70 91
46 59 73 93
46 59 76 93
48 61 77 94
49 62 78 96
50 62 83 98
51 62 85 99
52 66 D5 89 100
Solution:
for D3 for D5 for D 8
3n 5n 8n
D 3= D 5= D 8=
10 10 10
3 ( 40 ) ¿ 5 ( 40 ) 8 ( 40 )
¿ ¿
10 10 10
120 200 320
¿ ¿ ¿
10 10 10
= 12 = 20 = 32
D 3=¿12 th¿ D 5=¿20 th ¿ D 8=¿32 th¿
Observation Observation Observation
54 66 91
2. The DECILES for grouped data
Formula:
kn
Dk =LCB+
10
[ ]
−cf
fk
i D k = L+
Kn

( )
10
f
−F
i

where:
Dk =¿ the deciles where k from 1, 2, 3,..., 9, 10
LCB = lower class boundary
n = sample size
cf = cumulative frequency
f = frequency where the lower limit is located
i = the interval
Example
Find the values of D1from the given frequency distribution of the scores in a History class of fifty
students.
Scores f cf
45 – 49 2 50
D9 L = 39.5 40 – 44 6 48
35 – 39 11 42
D5 L = 29.5 30 – 34 10 31
25 – 29 12 21
D1 L = 19.5 20 – 24 5 9
15 – 19 4 4
n = 50
Solution:
kn

D1=L+
1n

( )
10
−F
f
i
Dk =LCB+ [ ]
10
−cf
fk
i
Solve for
1n 1 ( 50 )
10
=
10

¿ 19.5+( 5−4
5 )
5
¿ 50
10
1
¿ 19.5+( ) 5 =5
5
5
¿ 19.5+
5
= 19.5 +1
D 1=20.5
Solve for the deciles of the following.
1. Ungrouped data: D3 and D7
36 38 37 28 21 23 42 22
28 27 38 39 55 37 34 28
29 25 27 42 48 28 31 33
40 23 49 45 48 31 45 27
43 34 51 28 26 42 28 24

2. Grouped data: D2
x f cf
4.6 – 4.9 5
4.2 – 4.5 4
3.8 – 4.1 5
3.4 – 3.7 14
3.0 – 3.3 15
2.6 – 2.9 9
2.2 – 2.5 5
1.8 – 2.1 3
i = 0.4 n=60
Percentiles are ninety-nine score points which divide a distribution into one hundred equal parts.
1. The PERCENTILES for ungrouped data
The formula for percentile is:
Kn
Pk =
100
where:
P = the percentile
k = from 1, 2, ... 99,100
n = the sample size
Example
Below are the scores of 40 students in an English examination (arranged in an array form). Solve
for P50 , P66 ,∧P98 .
Scores
42 54 68 90
43 54 69 91
45 59 70 91
46 59 73 93
46 59 76 93
48 61 77 P66 94
49 62 78 96
50 62 83 98
51 62 85 99 P98
52 66 P50 89 100
Solution:
for D3 for D5 for D8
50 n 66 n 98 n
P50= P66= P98=
100 100 100
50 ( 40 ) ¿ 66 ( 40 ) 98 ( 40 )
¿ ¿
100 100 100
2000 2640 3920
¿ ¿ ¿
100 100 100
P50=¿20 th¿ P66=¿26 th ¿ P98=¿39 th¿
Observation Observation Observation

2. The PERCENTILES for grouped data


Formula:
kn
Pk =LCB+ [ ]
100
−cf
fk
i Pk =L+
Kn

(
100
f
−F
i )
where:
Pk =¿ the deciles where k from 1,2,3,...99,100
LCB = lower class boundary
n = sample size
cf = cumulative frequency
f = frequency where the lower limit is located
i = the interval
Example
Scores f cf
45 – 49 2 50
40 – 44 6 48
P80 L = 34.5 35 – 39 11 42
P50 L = 29.5 30 – 34 10 31
P20 L = 24.5 25 – 29 12 21
20 – 24 5 9
15 – 19 4 4
n = 50
Compute percentile P20from data above.
Solution:
20 n
P20=L+ ( 100
f
−F
)i
Solve for
20 n 20 ( 50 )
100
=
100
¿ 24.5+ ( 10−9
12 )
5
¿ 1000
100
1
¿ 24.5+ ( ) 5 = 10
12
5
¿ 24.5+
12
= 24.5 +0.41
P20=24.91
Solve for the quartiles of the following.
1. Ungrouped data: P25 and P38
36 38 37 28 21 23 42 22
28 27 38 39 55 37 34 28
29 25 27 42 48 28 31 33
40 23 49 45 48 31 45 27
43 34 51 28 26 42 28 24
2. Grouped data: P50
x f cf
4.6 – 4.9 5
4.2 – 4.5 4
3.8 – 4.1 5
3.4 – 3.7 14
3.0 – 3.3 15
2.6 – 2.9 9
2.2 – 2.5 5
1.8 – 2.1 3
i = 0.4 n=60
Measures of Dispersion/Variability:
1. Range for ungrouped data
The range is the difference between the highest value (HV) and the lowest value (LV) in the given
distribution.
Range=HS−LS+1
Example
Scores of 10 students in a spelling test.
11 8 12 7 5 3 4 15 9 6

Range=HS−LS+1
Range=15−3+1
Range=13 Range=12
2. Range for grouped data
To find range for a frequency distribution, just get the difference between the upper limit of the
highest class and the lower limit of the class interval.
Example:
Find the range for the frequency distribution shown below.
Score f
45 – 49 2
40 – 44 6
35 – 39 11
30 – 34 10
25 – 29 12
20 – 24 5
15 - 19 4
n = 50
solution:
Range = 49 – 15 + 1
= 35
Standard Deviation:
1. Standard Deviation of Ungrouped data
The formula is:
SD=√ s 2
Where:
SD = standard deviation
S2=¿ sample variance
OR
2

SD=
where:
√ ∑ ( X− X́ )
n−1

SD = standard deviation
∑ ( X− X́ )2=¿ sum of squares of x minus the mean, X́
n = sample size
Example
Scores of ten students in a spelling test. Find the standard deviation.
x ¿ X − X́ /¿ ¿¿
11 3 9
8 0 0
12 4 16
7 -1 1
5 -3 9
3 -5 25
4 -4 16
15 7 49
9 1 1
6 -2 4
∑ x =80 ∑ ¿ X− X́ /¿30 ∑ ¿¿ ¿
n = 10
Solution:
2

SD=
√ ∑ ( X− X́ )
n−1
130
OR SD=√ s 2

¿

10−1
130
¿ √ 14.44

¿
√9
¿ √ 14.44
SD=3.8
SD=3.8
2. Standard Deviation of Grouped data
The formula is:
∑ fM 2 −( x̄ )2

2

where:
SD =
n
SD=i
√ ∑ fd − 2

n−1
( ∑ fd )
n

SD = standard deviation
∑ fd 2 ∑ fM 2 = sum of the products between frequency and square of the midpoint.
n = sample size
x̄ = mean
Example:
In a given frequency distribution below, solve for the SD.
Scores f M fM
fM 2
45 – 49 2 47 94 4418
40 – 44 6 42 252 10584
35 – 39 11 37 407 15059
30 - 34 10 32 320 10240
25 - 29 12 27 324 8748
20 - 24 5 22 110 2420
15 – 19 4 17 68 1156
n=50
∑ fM= ∑ fM 2
1575 =
52625
Solution:
∑ fM 2

2

√ ( ∑ fd ) SD = −( x̄ )
2
∑ fd 2− n
n
SD=i
n−1
2
52625 1575
¿5

121−
(−5 )2

50−1
50
SD=

50

50 ( )
SD = 7.76
Solve for the Standard deviation of the following data.
1. Ungrouped data:
8, 9, 10, 12, 17, 18, 18, 19, 20, 21
2. Grouped data:
x f cf
4.6 – 4.9 5
4.2 – 4.5 4
3.8 – 4.1 5
3.4 – 3.7 14
3.0 – 3.3 15
2.6 – 2.9 9
2.2 – 2.5 5
1.8 – 2.1 3
i = 0.4 n=60
Quartile Deviation:
1. Quartile Deviation of Ungrouped data
The formula is:
Q 3 −Q1
QD=
2
where:
QD = Quartile Deviation
Q3= Quartile sub 3
Q1= Quartile sub 1
2 = constant
Example
Below are the scores of ten students in spelling. Solve for Quartile deviation.
Scores:
11 8 12 7 5 3 4 15 9 6
Solution;
Arrange the data in an array form from the lowest to the highest score.
3 4 5 6 7 8 9 11 12 15
Solve for Q 1
n
Q 1= Q 1 is between 4 and 5 thus
4
10 4+ 5 9
¿ = =4.5
4 2 2
Q 1=2.5 th
Solve for Q 3
3n
Q 3= Q 3 is between 9 and 11 thus
4
3(10) 9+11 20
¿ = =10
4 2 2
30
¿
4
Q 3=7.5 th
2. Quartile Deviation of Grouped data
The formula is:
Q 3 −Q1
QD=
2
where:
QD = Quartile Deviation
Q3= Quartile sub 3
Q1= Quartile sub 1
2 = constant
Example
Solve for the Quartile deviation for the given distribution.
scores f cf
45-49 2 50
40-44 6 48
35-39 11 42
30-34 10 31
25-29 12 21
20-24 5 9
15-19 4 4
n=50
3n
Q 3=L+ ( )
4
−F
f
i
Solve for
3 n 3 (50)
4
=
4

¿ 34.5+( 37.5−31
11 )
5 =
150
4
6.5
¿ 34.5+(
11 )
5 = 37.5
32.5
¿ 34.5+
11
¿ 34.5+2.95
Q 3=37.45

1n
Q1=L+
4
( )
−F
f
i
Solve for
1n 50
4
=
4

¿ 24.5+ ( 12.5−9
12 )
5 = 12.5

3.5
¿ 24.5+ (
12 )
5

17.5
¿ 24.5+
12
¿ 24.5+1.46
Q1=25.96

Q 3−Q 1 QD= Q 3 −Q1


Q= 2
2
37.45−25−96
¿
2
11.49
¿
2
Q = 5.74
Solve for the quartile deviation of the following data.
1. Ungrouped data:
8, 9, 10, 12, 17, 18, 18, 19, 20, 21
2. Grouped data:
X f M fM 2
fM
1.8 – 2.1 5
2.2 – 2.5 4
2.6 – 2.9 5
3.0 – 3.3 14
3.4 – 3.7 15
3.8 – 4.1 9
4.2 – 4.5 5
4.6 – 4.9 3
i = 0.4 n=60
Measure of variability corresponds with measure of central tendency, such that:
1. Mean and Standard Deviation are used to determine the normality of the curve.
2. Median and Quartile Deviation are used to determine the skewness of the curve.
3. Mode and Range are seldomly used because both measures are unstable, undependable, and
unreliable.
Homogeneous – small difference of SD upon comparing two or more data considering similar mean;
Heterogeneous – large difference of SD upon comparing two or more data considering similar mean
Probability is a branch of Mathematics which deals with chances. Commonly in the form:
number of actual outcome
Pr obability =
total possible outcome
The Normal Probability Curve or Bell-Shaped Curve was introduced by de Moivre during the 18 th
century. It was created upon the request of the gamblers to compute for the highest possible
outcome of one of their game. The curve is based in probability and the area of the whole region is 1
and the other areas expressed in decimal or percentages.

Characteristic of Normal Curve


 Symmetrical – it can be folded in 2 equal parts
 Asymptotic – the tail will extend both sides and it will be closer to the horizontal axis
but it will not touch the line.
Determine the proportion of the area of the normal probability curve. Use Ordinates and Areas of the
Normal Curve Table.
1. between the mean and z = 1.35

2. to the left of z = 2.11

3. to the right of z = 1.46

4. to the left of z = -1.69

5. to the right of z = 2.77

Determine the value of z such that the proportion of the area of the normal curve.
In determining z, we consider the upper value before the exact proportion of Area.
Example:
1. between the mean and z is 0.30
2. to the left of z is 0.70

3.to the right of z is 0.85

4. between ± z is 0.30

5. to the left of z is 0.30

Application of Normal Curve:


Standard Score:
x− x̄
z=
SD
where:
x = score
x̄ = mean
SD = standard deviation
Example:
The mean of the IQ test is 100 and SD is 15.
1. Determine the proportion of pupils whose IQ above 85
x− x̄ 85−100
z= = =−1
SD 15

Area of Proportion = 0.8413

2. Between 85 and 95
x− x̄ 85−100 x− x̄ 95−100
z= = =−1 z= = =−0 . 33
SD 15 SD 15
Area of Proportion = 0.212
3. Below 97
x− x̄ 97−100
z= = =−0 . 2
SD 15

Area of Proportion = 0.4207


In psychology, we use normal curve in many different ways such that:

Answer the following exercises:


1. The test scores of 80 students are normally distributed with mean 70 and standard deviation of 6.
Find the number of students who are expected to get scores:
a. Below 55 d. Above 85
b. Below 79 e. Between 61 and 82
c. Above 67
2. The life span of 150 bulbs is found to be normally distributed with mean 700 hours and standard
deviation of 80 hours. Find the number of bulbs which expected to last for:
a. More than 860 hours d. Less than 844 hours
b. Less than 580 hours e. Between 644 and 852 hours
c. More than 514 hours
INTERPRETATION OF DATA:
Inferential Statistics deals with the analysis and interpretation of data. This statistics consists of
different statistical tools/test used in the analysis of interval, ratio, nominal and ordinal data. These
tests are used in making inferences from or conclusions on larger groups, populations, or
generalizations about them on the basis of information obtained by the study of one or more
samples. The extent to which the use of this statistics can be done with accuracy depends on the
goodness of samples. The sampling technique/procedures are also of great importance with regard
to the use of these different statistical tests.
Kinds of Statistical Tests
Statistical test can be grouped into two. The parametric and the nonparametric tests.
Parametric test – to use the parametric tests, there are some conditions that should be met. The
data must be normally distributed and the level of measurement must be either interval or ratio.
The data are said to be normal when the value of Skewness equals zero and the value of kurtosis
is .0265
Interval – the interval data provide numbers that reflect difference among items. With the interval
scales the measurements units are equal.
Example:
Scores of intelligence tests, and time as reckoned from the calendar. They have no true zero
value.
Ratio – the ratio scale is the highest type of scale. The basic difference between the interval and the
ratio scales is that the interval scale has no true zero value while the ratio scale has an absolute
zero value.

Nonparametric tests – it do not require normality of the distribution. Under these test, the levels of
measurement are the nominal and ordinal data.
Nominal data – are data such as female and male, yes and no responses, political affiliations like
LP, Lakas, LDP and religious groupings Christian and non-Christian and other organization.
Ordinal data - are data such as Strongly agree, Agree, No opinion, Disagree and Strongly disagree
and also other data which employ rankings.
Statistical test depends on 3 things:
1. Type of Question
 Difference – test of difference
 Relationship – test of relationship
 Expected and Observe Data – goodness-of –fit
 Average – mean
 Variable – Variability
2. Bell Shaped Graph
 normally distributed
 abnormally distributed
3. Level and Scale of Measurement
 Nominal – proportion or percentages
 Ordinal – median and other rank-order correlation test
 Interval /Ratio – mean, SD, t – test, Pearson r, F – test, etc.
Hypothesis Testing:
1. Problem: Determine if you are looking for relationship, difference, goodness of fit, etc.
2. Hypotheses:
 Null Hypothesis (H0) – means non existence
there is no difference…
there is no correlation…
there is no effect…
 Alternate Hypothesis ( H1) – means there is an existence
there is a difference…
there is a correlation…
there is a effect…
3. Level of Significance ( α )
 Significant --- Reject H0 --- 0.05
 Not Significant --- Accept H0 --- 0.01
if applicable, degree of freedom (df)
-- is the cardinality of the numbers to get the specific mean, such that,
5, 3, 4, 6, 5, ___ average = 5, df = 6
-- we always use a formula/s for particular test to be use
4. Decision Rule: If the computed value (absolute value) is greater than the
critical value (positive value), reject the null hypothesis.
5. Apply the Appropriate Test
6. Decision: Whether to accept or reject the null hypothesis. We do not accept alternate
hypothesis and null hypothesis at the same time.
7. Interpretation: Rewrite your accepted hypothesis (include the word “significant” in your
statement).
Directional and Non-Directional Test:
Directional Test (One-Tailed Test)
We are considering one null hypothesis and one alternate hypothesis. If the computed value
(absolute value) is greater than the critical value (positive value, values can be seen in specific table
for particular test), reject the null hypothesis. Otherwise, accept null hypothesis.
P1=P 2 : H 0
P1≠P 2 : H 1
Non-Directional Test (Two-Tailed Test)
We will consider one null hypothesis and three alternate hypotheses.
P 1= P 2 : H 0 P 1≠ P 2 : H 1

P1 > P2 : H 1 P1 < P 2 : H 1

Region of Acceptance/Rejection of Null Hypothesis:


α =0 . 01 α= 0 . 05

 Significant --- Reject H0 --- 0.05 (there is significant…)


 Not Significant --- Accept H0 --- 0.01 (there is no significant…)

Type I and Type II Error in Decision Making


1. Type I Error ( α error) – means rejecting null hypothesis when it should be accepted.
Occurs when we use 0.05 level, it means that 5/100 chances of getting wrong, to avoid the
error use 0.01 level.
2. Type II Error ( β error) – means accepting null hypothesis when it should be rejected.
Occurs when we use 0.01 level, it means that 1/100 chances of getting wrong, to avoid the
error use 0.05 level.

Pearson r
The Pearson Product Moment coefficient of Correlation, r is an index of relationship between two
variables of interval/ratio type. The independent variable can be represented by x while the
dependent variable can also be represented by y. It can be said that x influence y or y depends on x.
The formula is:
n ∑ xy−∑ x ∑ y
r=
√( n ∑ x 2−(∑ x )2 )( n ∑ y 2 −(∑ y )2 )
where:
r = the Pearson Product Moment coefficient of Correlation, r
n = sample size
∑ xy = the sum of the product of x and y
∑ x ∑ y = the product of the sum of ∑ x and the sum of ∑ y
∑ x 2= sum of the squares x
∑ y2= sum of the squares y
For positive/negative value:
0.80 – above high correlation
0.60 – 0.79 moderate high correlation
0.40 – 0.59 average/moderate correlation
0.30 – 0.39 low correlation
0.29 – below negligible correlation
Below are the midterm (x) and final (y) grades.
x 75 70 65 90 85 85 80 70 65 90
y 80 75 65 95 90 85 90 75 70 90
Spearman Rank-Order Correlation Rho
Rank the following data from higher value to lower value and number each data (rank 1 will be the
highest value).
4, 6, 7, 5, 2
How about this one:
4, 4, 7, 8, 8, 8, 3, 2, 1, 1
In ranking data, number the data from highest value to lowest value. In case of tie, get the average
of the same data and continue the ranking to the next number.
Example:
Set 1:
Data: 7 6 5 4 2
Rank: 1 2 3 4 5
Set 2:
Data: 8 8 8 7 4 4 3 2 1 1
Rank: 2 2 2 4 5.5 5.5 7 8 9.5 9.5
We have 3 eights, with the rank 1, 2, 3 then we get the average and that is 2. We continued with
rank 4. Two 4 has the rank of 5 and 6, and the average is 5.5, then rank 7, and rank 8. For the last
two data, we have 2 ones, with the rank of 9 and 10; the average of those numbers is 9.5.

Arrange the rank of the following numbers:


Set 1:
130, 135, 129, 115, 119, 120, 123
Set 2:
20, 35, 20, 18, 35, 18, 18, 35, 20, 24, 19
The Spearman Rank Order Coefficient of Correlation ( ρ ):
This test of correlation does not require the stringent assumption like Pearson r (nonparametric
test). It is also used in a smaller sample ideally 25 to 30 or less ordinal data.
2
6∑ D
ρ=1− 2
n ( n −1 )
where:
ρ= Spearman Rank Order Coefficient Correlation

∑ D2 = sum of the squares of the difference between rank x and y


n = sample size
6 = constant
Example:
1. The following are the number of hours which 12 students studied for a midterm examination and
the grades they obtained in English. Calculate ρ at 0.05 level of significance.
Number of Midterm
Hours Studied Grades
(x) (y)
5 50
6 60
11 79
20 90
19 85
20 92
10 80
12 82
8 65
15 85
18 94
10 70

2. The following is the ranking of two judges given to the work of 8 artists. Use ρ at 0.05 level to
test the null hypothesis that the two judges differ most in their opinions about these artist
Judge A Judge B
5 8
8 5
4 6
2 4
1 2
7 1
3 3
6 7
z-test
The z-test is another test under parametric statistics which requires the normality of the distribution.
It utilizes the two population parameter μ and σ . It is used to compare two means, the sample mean,
and the perceive population mean.
The tabular value of z-test at .01 and .05 level of significance.
Level of Significance
Test
.01 .05
one-tailed ± 2.33 ± 1.645
two-tailed ± 2.575 ± 1.96
The One-sample Mean Test
The one-sample mean test is used when the sample mean is being compared to the perceive
population mean. Used when your data is interval/ratio. The formula is
( x́−μ ) √ n
z=
σ
where:
x́ = sample mean
μ = hypothesized value of the population mean
σ = population standard deviation
n = sample size
Example:
1. ABC company claims that the average life time of a certain tire is at least 28 000
km. To check the claim, a taxi company puts 40 of these tires on its taxis and
gets a mean lifetime of 25 560 km. With a standard deviation of 1 350 km, is the
claim true? Use z-test at .05
2. A school principal in a laboratory school claimed that the reading
comprehension test of grade six pupils should have an average of 72.3 with the
standard deviation of 7.8. If 50 randomly selected grade six pupils have an
average of 76.7. Use the z-test to test the null hypothesis that m=72.3 against
the alternative hypothesis of m=72.3 at .05 level of significance.
z – test of Independent Proportion and Dependent Proportion
Used when the data is nominal.
z – test of Independent Proportion
This is used to determine if there is a significant difference between two different/independent
groups on situations that call for two types of responses of the nominal data.
table:
n1 n2
response 1 A B
response 2 C D
P1 −P2
z=
1 1

where :
pq
( +
n1 n2 )
A B
P1= P2 =
n1 n2
p=population proportion estimate
A+ B
p= q=1− p
n 1 +n2
Example:
Fifty teachers and 50 students are asked if they are in favor or against in RH bill.
Distribution of students and teachers who are in favor and against RH bill:
Students Teachers
In Favor A 25 B 30
Against C 25 D 20
50 50

z – test for Dependent Proportion


Used to determine if there is a significant difference between pairs of observation from a
single group or used to determine if the responses of the members of a group in 2 situations are
correlated of nominal type.
table:
Y
response 2 response 1
X response 1 A B
response 2 C D

Formula:
P1 −P2
z=
a+d
n
where :

A +B B+ D
P1= P 2=
n n
A D
a= d=
n n
Example:
Fifty Teachers were asked if they are in favor in RH bill. Thirty answered in favor and 20
answered against. Then all the teachers were invited to attend a house hiring in RH bill. The result is
written below:
Before Against in favor
30 in favor A 22 B 8
20 against C 16 D 4
50
t-test
The t-Test is used to compare two means. Ideally the t-test is used when there are less than 30
samples, but some researchers use t-test even if there are more data 30 samples. Actually it has the
same description with z-test, except that it is used for interval data.
z – test t – test
- Independent Proportion - Independent Means
| |
2 groups 2 groups
| |
nominal – frequencies interval/ratio (scores, grades,
weight, heights)

z – test t – test
-Dependent Proportion -Dependent Means
| |
1 group 1 group
| |
Before and After Responses Pretest and Posttest
| weight before and weight after
nominal |
interval/ratio
t - test of Independent Means/Uncorrelated Means
Used to determine if there is significant difference between two different groups/independent
groups in terms of means.
 Experimental Variable (Independent Variable) – can be manipulated. Commonly called
treatment variable
 Control Variable (Dependent Variable) – result e.g. age, height, IQ.

Formula:
x̄ 1− x̄ 2
t=
SD 11 SD 22

n1
where :
+
n2

t =t −ratio
x̄=mean
SD =S tan dard Deviation
n=size of the sample
Example:
1. An admission test was administered to incoming freshmen in College of Nursing and
Veterinary Medicine with 15 students each course. Each was randomly selected. The
mean scores of the given samples were 90 and 85, and the variances of the test score
were 40 and 35, respectively. Is there a significant difference between the two groups?
Use 0.01 level of significance.
2. Is there a significant difference between the average heights of males born from the two
different countries? A random samples yielded the following results:

n1 =10 x̄ 1=63 . 8 SD 1=2 . 58


n2 =15 x̄ 2=63 . 1 SD 2 =2 .62
Test at 0.05 level of significance.

t-test of Dependent Means/Correlated Means


The t-test for dependent/correlated means is used for comparing the means before and after the
treatment. It is also used to compare the means of pre-test and post test.

The formula is:

t=
∑D
2
n ∑ D 2 −( ∑ D )

n−1
where :
t =t −ratio
D=difference of pretest and posttest
n=sample
Example:
An experimental study was conducted on the effect of programmed materials in English on the
performance of 20 selected college students. Before the program was implemented the pre-test was
administered and after 5 months the same instrument was used to get the post test result. The
following is the result of the experiment.
Pre-test Post test
X1 X2 D D2
20 25 -5 25
30 35 -5 25
10 25 -15 225
15 25 -10 100
20 20 0 0
10 20 -10 100
18 22 -4 16
14 20 -6 36
15 20 -5 25
20 15 5 25
18 30 -12 144
15 10 5 25
15 16 -1 1
20 25 -5 25
18 10 8 64
40 45 -5 25
10 15 -5 25
10 10 0 0
12 18 -6 36
20 25 -5 25
∑ D=−81 ∑ D 2=947
−81
D́=
20
= -4.05

Summary of t-test

Chi-Square Test
The Chi-Square is considered a unique test due to its 3 functions which are as follows:
The test of goodness-of-fit
The test of independence
The test of goodness-of-fit
This is a test of difference between the observed frequencies and expected frequencies. The
formula for Chi-square test is:
2 ( O−E )2
X =∑
E
where:
X 2 = the chi-square test
O = the observed frequencies
E = the expected frequencies
Example:
The theory of Mendel regarding crossing of peas is in the ratio of 9:3:3:1, meaning 9 parts are
smooth yellow and 1 part wrinkled green. The researcher conducted an experiment and the result
was that out of 560 peas, 310 were smooth yellow, 110 were smooth green and 40 were wrinkled
green. Is there significant difference between the observed and the expected? Use X 2 –test at .05
level of significance.
Solving by the stepwise method:
I. Problem:
Is there significant difference between the observed (actual experiment) and the expected (theory)
frequencies?
II. Hypotheses:
H0: There is no significant difference between the observed and the expected frequencies.
H1: There is significant difference between the observed and the expected frequencies.
III. Level of Significance:
α =.05
df =h−1
= 4-1
=3
2
X .05 =7.815
IV. Statistics:
X 2 - The test of goodness-of-fit
Computation: Add the ratio 9:3:3:1 = 16
Attributes Ratio (Actual result) (Theory)
Observed Expected
Smooth yellow :9 310 315
Wrinkled yellow :3 100 105
Smooth green :3 110 105
Wrinkled green :1 40 35
Total 16 560 560
Then divide 560 by 16 = 35
35
16 560
560
0
For expected frequencies multiply
35 x 9 = 315
35 x 3 = 105
35 x 3 = 105
35 x 1 = 35
2 ( O−E )2
X =∑
E
( 310−315 )2 ( 100−105 )2 ( 110−105 )2 ( 40−35 )2
¿ + + +
315 105 105 35
= .079 + .238 +.238 + .714
2
X = 1.269
V. Decision:
If the chi-square computed value is greater than chi-square tabular value reject H 0.
VI. Conclusion:
The Chi-square computed value of 1.269 is lesser than the chi-square tabular value of 7.815 at .05
level of significance with 3 degrees of freedom, so the null hypothesis is accepted.
The Chi-Square Test of Independence
The one-sample test of independence is used to determine if there is a significant relationship
between variables of the nominal type. The sample used in this test consists of members randomly
drawn from the sample population.

( O−E )2
X 2 =∑
E
where:
X 2 = the chi-square test
O = the observed frequencies
E = the expected frequencies
Σ = summation
Example:
1. Ninety individuals, male and female, were given a test in psychomotor skills and their scores
were classified into high and low. Using the –test of independence at .05 level of significance,
the table is shown as follows:
Scores
Sex High Low Total
O E O E
Male 18 28 46
Female 32 12 44
Total 50 40 90
Solving by the stepwise method:
I. Problem:
Is there significant relationship between sex and scores in psychomotor skill?
II. Hypotheses:
H0: There is no significant relationship between sex and scores in psychomotor skill
H1: There is significant relationship between sex and scores in psychomotor skill.
III. Level of Significance:
α =.05
df =( c−1 ) (r −1)
= (2 – 1)(2 – 1)
= (1)(1)
=1
2
X .05 =3.841 tabular value
IV. Statistics:
x 2-test of independence
Score
Sex High Low Total
O E O E
Male 18 (25.56) 28 (20.44) 46
Female 32 (24.44) 12 (19.56) 44
Total 50 40 90
For expected values: Multiply the column total to the row total and divide the product by the grand
total.
(50 x 46)/90 = 25.56
(50 x 44)/90 = 24.44
(40 x 46)/90 = 20.44
(40 x 44)/90 = 19.56
( O−E )2
X 2 =∑
E
(18−25.56 )2 ( 32−24.44 )2 ( 28−20.44 )2 ( 12−19.56 )2
¿ + + +
25.56 24.44 20.44 19.56
= 2.236 + 2.338 + 2.796 + 2.922
2
X =10.292
V. Decision:
If the X 2 computed value is greater than X 2 tabular value, reject H0.
VI. Conclusion:
The X 2 computed value of 10.292 is greater than the X 2 tabular value of 3.841 at .05 level of
significance with one degree freedom.
2. One hundred teachers were asked if they are in favour of k2-12 program implementation. The
result are written below:
O
Strongly Agreed 30
Agreed 35
Disagreed 15
Strongly Disagreed 15
Express No Opinion 5
Total: 100
F-test
The F-test is the analysis of variance (ANOVA). This is used in comparing the means of two
or more independent groups. One-way ANOVA is used when there is only one variable involve. The
two-way ANOVA is used when two variables are involved: the column and the row variables. The
researcher is interested to know if there are significant differences between and among columns and
rows. This also used in looking at the interaction effect between the variables being analyzed.
Like the t-test, the F-test is also a parametric test, which has to meet some conditions, and
the data to be analyzed if they are normal and expressed in an interval or ratio data. This test is
more efficient than other tests of difference.

The F – test (One-way-ANOVA)


A sari-sari store is selling 4 brands of shampoo. The owner is interested if there is a
significant difference in the average sales for one week. The following data are recorded.
Brand
A B C D
7 9 2 4
3 8 3 5
5 8 4 7
6 7 5 8
9 6 6 3
4 9 4 4
3 10 2 5
Perform the ANOVA I and test the hypothesis at 0.05 level of significance that the average
sales of the four brands of shampoo are equal.

Solving by the Stepwise Method:

I. Problem: Is there a significant difference in the average sales of the four brands of
shampoo?
II. Hypotheses:
H0:There is no significant difference in the average age sales of the four brands of
shampoo.
H 1 : There is a significant difference in the average sales of the four brands of shampoo.
III. Level of Significance:
α =0 . 05
df b = K −1= 4−1=3
where : K=no . of groups
df w = N − K =28− 4=24
IV. Decision Rule: If the computed value is greater than or equal to critical value, reject null
hypothesis.
V. Statistics:
F-test one-way-analysis of variance
Computation:
A B C D
X1 X 21 X2 X 22 X3 X 23 X4 X 24
7 49 9 81 2 4 4 16
3 9 8 64 3 9 5 25
5 25 8 64 4 16 7 49
6 36 7 49 5 25 8 64
9 81 6 36 6 36 3 9
4 16 9 81 4 16 4 16
3 9 10 100 2 4 5 25
∑ X 1=37 ∑ X 2=57 ∑ X 3=26 ∑ X 4=36
2 2 2
n1 =7 ∑ X 1=225 n 2=7 ∑ X 2=475 n3 =7 ∑ X 3=110 n 4=7 ∑ X 24=204
X̄ 1 =5 . 28 X̄ 2=8 . 14 X̄ 3 =3 . 71 X̄ 4 =5.14

∑ X =156 ∑ X 2=1014
V.1. Compute the total sum of squares
2
2 (∑ x )
SS t =∑ X −
N
( 156 )2
SS t =( 1014 )−
28
SS t =144 . 86
V.2. Compute the sum of squares for between group.
2 2 2 2 2
(∑ X 1 ) ( ∑ X 2 ) (∑ X 3) (∑ X 4 ) (∑ X )
SS b = + + + −
n1 n2 n3 n4 N
2 2 2 2 2
( 37 ) ( 57 ) ( 26 ) ( 36 ) ( 156 )
SS b = + + + −
7 7 7 7 28
SS b =72 .29
V.3. Compute the sum of squares for within groups.
SS w =SSt −SS b
SS w =114 .86−72 .29
SS w =42. 57
V.4. Compute the mean squares/variance estimate
- for between group
SSb 72. 29 72. 29
MS b = = = =24 .10
K −1 4−1 3
- for within groups
SS w 42. 57 42 .57
MS w = = = =1 .77
N −K 28−4 24
V.5. Compute for F-ratio
MS b 24 . 10
F= = =13. 62
MS w 1. 77
V.6. Summary
Source of Sum of df Mean F-Ratio
Variance Squares Squares
Between 72.29 3 24.10 13.62
Within 42.57 24 1.77
Total 114.86
VI. Decision: Reject null hypothesis
VII. Interpretation:
There is a significant difference in the average sales of the four brands of shampoo.

The F –Test (Two-Way-ANOVA with Interaction Effect)


Forty five language students were randomly assigned to one of three instructors and to one of
the three methods of teaching. Achievement was measured on a test administered at the end of the
term. Use two-way ANOVA with interaction effect at 0.05 level of significance to test the following
hypotheses:

1.
H0: There is no significant difference in the performance of the three groups
of students under three different instructors.
H 1 : There is a significant difference in the performance of the three groups of
students under three different instructors.

2.
H0: There is no significant difference in the performance of the three groups
of students under three different methods of teaching.
H 1 : There is a significant difference in the performance of the three groups of
students under three different methods of teaching.

3.
H0: Interaction effects are not present.
H 1 : Interaction effects are present.
Teacher Factor
A B C
40 50 40
41 50 41
Method of
40 48 40
Teaching 1
39 48 38
38 45 38
Total
40 45 50
41 42 46
Method of
39 42 43
Teaching 2
38 41 43
38 40 42
Total
40 40 40
43 45 41
Method of 41 44 41
Teaching 3 39 44 39
38 43 38
Total
Total

Solving by Stepwise Method:


I. Problem:
1. Is there a significant difference in the performance of students under three different
teachers?
2. Is there a significant difference in the performance of students under the three different
methods of teaching?
3. Is there an interaction effect between teachers and method of teaching factors?
II. Hypotheses:

1.
H0: There is no significant difference in the performance of the three groups
of students under three different instructors.
H 1 : There is a significant difference in the performance of the three groups of
students under three different instructors.

2.
H0: There is no significant difference in the performance of the three groups
of students under three different methods of teaching.
H 1 : There is a significant difference in the performance of the three groups of
students under three different methods of teaching.

3.
H0: Interaction effects are not present.
H 1 : Interaction effects are present.
III. Level of Significance:
α=0 . 05
df t =N −1=45−1=44
df w =k (n−1)=9 ( 5−1 )=9 ( 4 ) =36
df c =c−1=3−1=2
df r =r−1=3−1=2
df c⋅r =( c −1 )( r−1 )=( 3−1 )( 3−1 )=( 2 ) ( 2 )=4
Critical value :
df (columns )=2/36=3 . 26
df (rows)=2/36=3 . 26
df ( int eraction )=4/36=2. 63
IV. Decision Rule: If the computed value is greater or equal to the critical value, reject null
hypothesis.
V. Statistics:
F-test two-factor ANOVA
Computation:

A B C
40 50 40
41 50 41
Method of
40 48 40
Teaching 1
39 48 38
38 45 38
Total 198 241 197 636
40 45 50
41 42 46
Method of
39 42 43
Teaching 2
38 41 43
38 40 42
Total 196 210 224 630
40 40 40
43 45 41
Method of 41 44 41
Teaching 3 39 44 39
38 43 38
Total 201 216 199 616
Total 595 667 62 1882
(GT )2 ( 1882 )2
CF= = =78709 . 42
N 45

2 2 2 2
SS t =40 +41 +. ..+39 +38 −CF
SS t =79218−78709. 42=508. 58
1982 196 2 2012 2412 210 2 2162 1972 2242 1992
SS w =79218− + + + + + + + +
5 5 5 5 5 5 5 5 5
SS w =79218−79088. 8=129. 20
5952 +667 2 +6202
SS c = −CF
15
1183314
SS c = −78709 . 42=78887 . 6−78709 . 42=178 . 18
15
636 2 +6302 + 6162 1180852
SS r = −CF= −78709 . 42=78723. 47−78709 . 42=14 . 05
15 15
SS c⋅r =SSt −SS w −SS c−SS r =508 .58−129. 2−178 . 18−4 . 05=187 . 15

Source of F-Value
SS df MS
Variation Computed Tabular Interpretation
Between
Column 178.18 2 89.09 24.82 3.26 S
s
Rows 14.05 2 7.02 1.95 3.26 NS
Interaction 187.15 4 46.79 13.03 2.63 S
Within 129.20 3 3.59
Total: 508.58 44

F-Ratio:
MS c 89. 09
Columns= = =24 . 82
MS w 3. 59
MS 7 . 02
Rows= r = =1 . 95
MS w 3 . 59
MS 46 .79
Interaction= i = =13 . 03
MS w 3 . 59
VI. Decision:
columns – reject null hypothesis
rows – accept null hypothesis
interaction – reject null hypothesis
VII. Interpretation:
1. There is a significant difference in the performance of the three groups of students
under three different instructors.
2. There is no significant difference in the performance of the three groups of students
under three different methods of teaching.
3. Interaction effects are present.
Generalization:

Test of Relationship:
1. Pearson r – used to determine if there is a correlation between two variables of interval/ratio
type. (interval/ratio)
2. Spearman Rank Order Correlation – used to determine if there is a correlation between two
variables of the ordinal type. (ordinal)
3. Chi-Square of Independence/Association – used to determine if there is a significant
relationship between variables of the nominal type. (nominal)
Tests of Difference
1. z-test of Independent Proportion – used to determine if there is a significant difference
between two different/independent groups on situations that call for two type of responses.
(nominal)
2. z-test of Dependent Proportion – used to determine if there is a significant difference
between pairs of observation from a single group or used to determine if the responses of the
members of a group in two situation are correlated. (nominal)
3. t-test of Independent/Uncorrelated Means – used to determine if there is a significant
difference between group/independent group in terms of means. (interval/ratio)
4. t-test of Dependent/Correlated Means – used to determine if there is a significant difference
between two groups of two sets of scores/weight/height etc. (interval/ratio)
5. F-test/ANOVA – extension of t-test. Used for two and more independent variables.
(interval/ratio)
Other Test:
1. One-Population z-test – used to determine if a mean was drwn from the given population.
(interval/ratio)
2. Chi-Square of Goodness-of-fit – used to determine if there is a significant difference between
observed distribution and expected distribution. (nominal)
Determine which test to be used based in the given problem.
1. The following are vocabulary and spelling scores of 10 students which is ranked. Is there a
significant relationship between the two subjects?
2. A special education teacher wishes to determine preferences of the mentally gifted children –
boys and girls for three game activities.
3. Is there a significant relationship between Academic Achievement and Motivation of the
students in AGS?
4. Creativity and Personality was ranked. Is there a significant relationship between the two
variables?
5. Fifty male and 50 female students where asked about their color preferences.
6. Is there a significant difference between stressful and unstressful situations in terms of short-
term memory?
7. Do the five groups of special children differ in terms of error scores on the psychomotor ability
test?
8. Is there a significant difference between teachers and school administrators in terms of
attitude toward mentally impaired children?
9. Are the proportions of children passing the two items significantly different from each other?
10. Is there a significant difference between the proportions in each group that responded
“infavor”?
11. Does this differ from what the pupil would expect to obtain if he or she had guessed all the
items?
12. It is claimed by a softdrinks manufacturing company that their one-liter bottle contains on the
average, 0.97 liter with standard deviation of 0.02. A certain sample was got from the said
company. Is the company deceiving its customers?
Summary:
Statistical Test Type of Data
Test of Relationship:
1. Pearson r Interval/Ratio
2. Spearman Rank Order Ordinal
3. Chi-Square of Nominal
Independence
Test of Difference:
1. z-test Nominal
 independent
 dependent
2. t-test Interval/Ratio
 independent
 dependent
3. F-test Interval/Ratio
 ANOVA I
 ANOVA II
Other Test:
1. z-test one population Interval/Ratio
mean
2. chi-square test Nominal
Below are the lists of thesis to be used (SY 2010-2011):
1. Positive Effects of Reading the School Newspaper to the Academic Performance and
Behaviour of Selected High School Students of AGS
2. Disturbances Affecting the Performance of AGS High School Students in Mathematics
3. Perceptions Toward College Education of Selected Senior Students of AGS
4. The Perceptions of Fourth Year High School Students Towards the NCAE Review
5. Effects of Numerous New Teachers to the Relational Aspect of High School Students
of AGS
6. Techniques Approaches of Teachers that Contribute to the Comprehension of
Selected Grade Six Students of AGS Towards the Introduction to Algebra
7. The Effect of Weekly Mastery Test to the Academic Performance of the High School
Students of AGS
8. Effects of Cumulative Grading System to the High School Students of AGS
9. Factors Affecting Students’ Failure in Science Subject as Perceived by Selected High
School Students
10. The Effectiveness of the Techniques Used by Mathematics Teachers in Introducing
and Discussing the Lessons to the High School Students of AGS
11. The Effects of Social Networking Sites to the Academic Performance of the 3 rd Year
and 4th Year High School Students of AGS
Summary of Statistics:

You might also like