STATISTICS
Dr. Emellie G. Palomo
Definition
Statistics is a branch of mathematics that deals with the
scientific collection, organization, presentation,
analysis, and interpretation of numerical data in order
to obtain useful and meaningful information
Definition
Collection of data – refers to the process of obtaining
information.
Organization of data – refers to the ascertaining manner
of presenting the data into tables, graphs, or charts so
that logical and statistical conclusions can be drawn
from the collected measurements.
Definition
Presentation of data – refers to the method of viewing
data
Analysis of data – refers to the process of extracting
from the given data relevant information from which
numerical description can be formulated.
Interpretation of data – refers to the task of drawing
conclusions from the analyzed data.
Areas of Statistics
Descriptive statistics – is a statistical method
concerned with describing the properties and
characteristics of a set of data. It involves data
gathering, organizing, describing, and presenting
these information.
Inferential statistics – is a statistical method
concerned with the analysis of a sample data leading
to prediction, inferences, interpretation, or conclusion
about the entire population.
Exercises
Tell whether the given situation will make use of
descriptive or inferential statistics.
1. A teacher computes the mean score of her class in a
mathematics test to determine if the class mean is
significantly related to their scores in a science test.
2. A sari-sari store owner records the frequency of sales
of the 5 leading detergent soaps.
Uses of Statistics
Statistics is essential in education, government,
business, economics, medicine, psychology,
sociology, sports, and others.
In education, statistical tools are used to get
information on enrollment, finance, facilities, grading
system, and so on.
Uses of Statistics
In government, statistics is used to ascertain the
manpower and material strength of the nation. The data
are needed for military and fiscal reasons. Likewise,
for an intelligent policy-making and administration,
large amounts of concrete and organized records on
movement of population, taxes, cst of living, wages and
resources are necessary.
STATISTICAL TERMS
Data is any quantitative or qualitative information.
a) Quantitative data – refers to numerical information
obtained from counting or measuring that can be manipulated
by any fundamental operation.
Ex. Age, IQ scores, height, weight, etc.
b) Qualitative data – refers to descriptive attributes that cannot
be subjected to mathematical operations.
Ex. Gender, citizenship, religion, educational attainment
STATISTICAL TERMS
Population,N – refers to the totality of all the
elements or persons for which one has an interest at a
particular time.
Sample,n – is a part of a population determined by
sampling procedures.
Parameter – is any statistical information or attribute
taken from a population. It is a true value or actual
statistics since its source is the population itself.
Statistic – is any estimate of statistical attributes
taken from a sample.
STATISTICAL TERMS
Variable – is a specific factor, property,
or characteristic of a population, or a
sample which differentiates a sample or
group of samples from another group.
a) Discrete – a variable that can be obtained by
counting.
b) Continuous – a variable that can be obtained by
measuring objects or attributes.
Scales of Measurements
1. Nominal measurement – this type of statistical data
depicts the presence or absence of a certain attribute
and usually involves random assignment of
numbers to represent the attribute.
Ex. Race, gender, civil status, etc.
2. Ordinal measurement – this provides the degree of
the presence of an attribute.
Ex. Academic ranking, degree of illness
Scales of Measurements
3. Interval measurement – the measurement where data
are arranged in some order and the differences
between data are meaningful. Data at this level
may lack inherent zero starting point.
Ex. Test results
4. Ratio – this measurement is an interval level
modified to include the inherent zero starting point.
Ex. Physical quantities, allowance, etc.
Sigma Notation
Summation or sigma notation, Σ - a statistical symbol
used to abbreviate the sum of the quantities in a given
range.
Exercises. Write in sigma notation.
1. x1 + x2 + x3 + x4 + x5 + . . . + x10
2. y13 + y23 + y33 + . . .
3. (a1 + b1) + (a2 + b2) + (a1 + b3)
Collection of Data
1. Interview method – direct method, requires face-to-face
inquiry with the respondent
2. Questionnaire method – indirect method, makes use of
written questions
3. Observation – makes use of the different
human senses
4. Registration or Census – requires the enactment of law to
take effect because it needs the participation of a large, if not
the entire, population
5. Experimentation – specimens are subjected to some aspects
of control to find out cause and effect relationships.
Data Classification
Primary data – information gathered directly from the
source.
Secondary data – gathered from secondary sources such
as books, journals, magazines, or thesis of other
researchers.
Slovin’s Formula
n = __N__
1 + Ne2 where n = no. of samples
N = no. of population
e = margin of error
Exercises
What is the sample size if the population is 3000 and
the margin of error is set at:
a.) 5%
b.) 3%
Sampling Techniques
Probability Sampling – a procedure where every
element of a population is given an equal chance of
being selected as a member of the sample.
a.) Random sampling – a sampling procedure that is
done by lottery or with the aid of a Table of Random
Numbers, or the random function of a scientific
calculator.
Sampling Techniques
b.) Systematic sampling – an alternative to simple
random sampling especially when the population is too
big that random sampling becomes tedious. Random
starting point is selected from the list of population.
The samples are determined by choosing every nth
element on the list until the desired number of samples
are drawn.
Sampling Techniques
2.) Non probability sampling – a sampling procedure in
which not every element of the population is given an
equal chance of being selected as sample. The drawing
of samples is based purely on the researcher’s
objectives.
a.) Convenience sampling – the researcher’s
convenience is the primary concern in using this
method.
Sampling Techniques
c.) Stratified random sampling – this is done by
creating different classes or strata within the
population.
d.) Cluster sampling – if the population is too big, a
sampling method may be employed to smaller area.
The population may be divided geographically into
regions, divisions, or districts.
Sampling Techniques
b.) Quota sampling – this is similar to stratified
sampling but the drawing of samples in quota sampling
is not done randomly. If the desired quota is reached,
the drawing of samples is terminated.
c.) Purposive sampling – this is used when the specific
objective under study requires a particular sample
which may not cover the entire population.
Presenting Data
1.) Textual presentation – the data are presented in
paragraph form.
2.) Graphical presentation – the data are presented in
visual form. It is a picture that displays numerical
information.
3.) Tabular presentation – the data are presented in
tables to show the relation between the column and row
quantities.
Types of Graphs
Bar graph – is used to show relative sizes of data. Bars
drawn proportional to the data may be horizontal or
vertical. Bar graphs are used to show the comparison of
nominal data, such as nationality, sex, religion, month,
etc., and numerical data– discrete or continuous, such
as population and other frequency information.
Types of Graphs
Line graph – shows the relationship between two or
more sets of continuous data.
Circle graph – is best used to compare parts to a whole.
The size of each sector of the circle is proportional to
the size of the category that it represents.
Types of Graphs
Pictograph or pictogram – is a picture graph used to
show numerical data through symbols. In constructing
a pictograph, the picture to be drawn must symbolize
the data being represented. The legend is also a very
important part of a pictograph for this will tell the
reader the proportionality of the symbol used to
represent the actual data.
Statistical Measures
A. Measures of central tendency
It is a quantitative representation of the set of data
under investigation. It serves as a representative of the
data.
B. Measures of dispersion/variability
It indicates how close or widespread the data are
from the average.
Measures of Central Tendency
The Mean or Arithmetic Average
= x 1 + x2 + x3 + . . . + x n
n
Measures of Central Tendency
Characteristics of the Mean
1. It is a calculated average.
2. It is easily affected by an increase or decrease in the
number of data
3. It is the most widely used average and subject to
further mathematical computation
4. It is the measure for interval or ratio scales such as
scores, grades, temperature and population
Measures of Central Tendency
The Median
It is the middle value in a set of quantities.
Median, Md = n + 1 , no. of terms is odd
2
Measures of Central Tendency
Characteristics of the Median
1. It is a rank or positional average.
2. It may or may not be affected by extreme values.
3. It is less widely used than the mean but can be
subjected to a few mathematical computation
4. It is a measure for ordinal scales such as test scores,
salary
Measures of Central Tendency
The Mode
It is the quantity with the most number of frequency.
a) Unimodal
b) Bimodal
c) Trimodal
d) Polymodal
Measures of Central Tendency
Characteristics of the Mode
1. It is an inspection average.
2. It may or may not be affected by an introduction of
other data.
3. It is rarely used and cannot be mathematically
manipulated.
4. It is a measure for nominal scales such as number of
certain brand of commodities.
Exercises
Compute for the mean, median and mode
1. 89, 92, 83, 88, 80
2. 11, 8, 13, 9, 14, 10
3. 18, 25, 23, 17, 25, 18, 20
4. 5, 8, 9, 5, 8, 6, 7
5. 12, 23, 10, 12, 23, 10
Measures of Dispersion
A smaller dispersion of scores arising from the
comparison often indicates more consistency and
more reliability.
1. The Range
2. The Mean Deviation
3. The Variance
4. The Standard Deviation
Measures of Dispersion
The Range = H – L
Not a reliable measure of dispersion
H = 96 L = 90
H = 94 L = 88
H = 90 L = 84
Measures of Dispersion
The Mean Deviation – the deviation from the mean
MD = Σ| x - |
n
Measures of Dispersion
The Standard Deviation – the most important measure
of dispersion
s= Σ( x - )2
n-1
Variance, s2
Exercises
Compute for the range, mean deviation, standard
deviation and variance
1. 7, 12, 11, 8, 13, 9
2. 2, 7, 5, 3, 6, 8, 3, 4, 5
3. 10, 14, 11, 12, 13
4. 3, 5, 7, 9
5. 8, 8, 4, 4
Parametric and Non-parametric
Statistical Tools
Parametric ST – a test of significance appropriate when
the data represent an interval or ratio scale of
measurement and other specific assumptions have been
met.
Non-parametric ST – a test not involving the estimation
of parameters of a statistical function.
When to Use
Parametric ST Non-parametric ST
1. n ≥ 30 1. n ˂ 30
2. For interval or ratio 2. For ordinal or nominal
3.
Comparison of Parametric and
Non-parametric ST
Test for Difference for independent (2 samples)
Parametric Non-parametric
t-test Mann-Whitney U- test
Test for Difference for Dependent/Paired (2- Samples
t –test Wilcoxon –Signed Rank, H, test
Test for Difference ( 3 or more samples)
ANOVA Kruskall –Wallis,
Inferential Statistics
Definition
It is a statistical method concerned with the analysis of
a sample data leading to prediction, inferences,
interpretation, or conclusion about the entire
population.
Deciding the Nature of a Variable
More than 2 variables?
Dichotomous no yes
Are the distances between
the categories equal?
Interval/ratio yes no
Can the categories be
rank-ordered?
Ordinal yes no
Nominal
Tests of Differences
Nature of Type of Test Type of Data Number of Name of test
Criterion Comparison
Variable Groups
Non-categori Non-paramet Unrelated 1–2 Kolmogorov-
cal ric Smirnov
2 Mann-Whitne
yU
2+ Median
3+ Kruskal-Walli
sH
Related 2 Sign
2 Wilcoxon
3+ Friedman
Tests of Differences
Nature of Type of Test Type of Data Number of Name of test
Criterion Comparison
Variable Groups
Categorical: Non-paramet Unrelated 1 Binomial
Nominal or ric 1 Chi-square
Frequency 2+ Chi-square
Related 2 McNemar
3+ Cochran Q
Tests of Differences
Nature of Type of Test Type of Data Number of Name of test
Criterion Comparison
Variable Groups
Non-categ Parametric: Unrelated 1–2 t
orical means 2+ One-way and
two-way
Anova
Related 2 t
2+ Single factor
repeated
measures
Related and 2+
2-way
unrelated
Analysis of
covariance
Tests of Differences
Nature of Type of Test Type of Data Number of Name of test
Criterion Comparison
Variable Groups
Non-categoric Parametric: Unrelated 2+ Levene’s test
al variances
Related 2 t
Tests of Differences
t –test
t = x- μ or t = x1 - x2
s/√n s
df = n1 + n2 - 2
Exercises
1. Given: μ = 15 mg x = 17.1 s = 3.8 mg
n = 25
Is the amount of nicotine in the cigarette significant?
Exercises
2. Scores of two groups of individuals on the same test
x y
26 23 38 31 32
24 19 26 28 34
18 25 24 27 25
17 26 24 32 29
18 21 30 29 36
20 22 22 33 34
18 33 35 35
Test whether there is a significant difference on the mean
scores of the two test.
Analysis of Variance (ANOVA)
ANOVA is used to test hypotheses about population
means rather than population variances.
In this test, two or more groups are studied to see if the
groups are affected by various treatments.
Sources of Variation
The Total Sum – of – Squares (SST)
SST = ΣX2 – (ΣX)2
N
The Between Sum – of – Squares (SSb)
SSb = Σ (ΣXc)2 – (ΣXT)2
n N
The Within Sum – of – Squares (SSw)
SSw = SST - SSb
Degrees of Freedom
df for total groups = N – 1, N =total cases
df for groups between = k – 1 , k=column
df for groups within = df - dK
Illustration
Test scores of 7 students in three subjects, A, B, and C.
A B C
12 16 6
18 17 4
16 16 14
8 18 4
6 12 6
12 17 12
10 10 14
Ho: μ1 = μ2 = μ3
ANOVA Table
Source of df Sum Mean Square Fcal
Variation –of-Squares
Between
Groups
Within
Groups
Total
Tests after the F Test
A. Scheffé Test
B. F ratio
F = (X1 – X2)_______
s2 (N1+N2)/N1N2)