SPREAD
SPREAD
Based from the graphs, which between the two sets of scores are more spread out?
The mean score for each quiz is 7. Despite the equality of means, you can see that the distributions are quite
different. Specifically, the scores on Quiz 1 are more densely packed and those on Quiz 2 are more spread out. The
differences among students were much greater on Quiz 2 than on Quiz 1.
The terms variability, spread and dispersion are synonymous, and refer to how spread a distribution is. The most
frequently used measures of variability are: RANGE, VARIANCE & STANDARD DEVIATION. Variability describes how
far apart data points lie from each other and from the center of a distribution. Along with measures of central
tendency, measures of variability give you descriptive statistics that summarize your data. While the central
tendency, or average, tells you where most of your points lie, variability summarizes how far apart they are. This is
important because it tells you whether the points tend to be clustered around the center or more widely spread out.
Low variability is ideal because it means that you can better predict information about the population based on
sample data. High variability means that the values are less consistent, so it’s harder to make predictions. Data sets
can have the same central tendency but different levels of variability or vice versa. If you know only the central
tendency or the variability, you can’t say anything about the other aspect. Both of them together give you a complete
picture of your data. Example: variability in normal distributions
You are investigating the amounts of time spent on phones daily by different groups of people.
Using simple random samples, you collect data from 3 groups:
1. RANGE is the distance covered by the scores in a distribution (from smallest value to highest value). It is also
defined as the simplest measure of variability.
• The problem with using the range is that it is extremely sensitive to outliers, and one number far away from
the rest of the data will greatly alter the value of the range. It is considered as imprecise, unreliable measure
of variability.
The standard deviation is derived from variance and tells you, on average, how far each value lies from the
mean. It’s the square root of variance.
Standard deviation is expressed in the same units as the original values (e.g., meters).
Since the units of variance are much larger than those of a typical value of a data set, it’s harder to interpret the
variance number intuitively. That’s why standard deviation is often preferred as a main measure of variability.
However, the variance is more informative about variability than the standard deviation, and it’s used in making
statistical inferences.
You and your friends have just measured the heights of your dogs (in millimeters)
The heights (at the shoulders) are: 600mm, 470mm, 170mm, 430mm and 300mm.
Find out the Range the Mean, the Variance, and the Standard Deviation.
X = 394 mm
= 21 704
= 147.32
= 147 mm
Now we can show which heights are within one Standard Deviation (147mm) of the Mean.
So, using the Standard Deviation we have a "standard" way of knowing what is normal, and what is extra large or
extra small. Rottweilers are tall dogs. And Dachshunds are a bit short, right?
= 1970 5
= 394
Solutions:
N 5
= 108 520
= 147.32 = 147 mm
X
Given the data below, find the following: 9
a. Range 8
4
c. What does the computed value of the Standard Deviation mean?
2
a. Range = 9 – 1 = 8
d. /X – X/ = 16
n 6
f. = 52 = 2.94
g. 6
Standardly, the scores are deviated from the mean by 2.94 or 3.22
The coefficient of variation (CV) is a measure of relative variability. It is the ratio of the standard deviation to
the mean (average). For example, the expression “The standard deviation is 15% of the mean” is a CV.
The CV is particularly useful when you want to compare results from two different surveys or tests that have
different measures or values. For example, if you are comparing the results from two tests that have different scoring
mechanisms. If sample A has a CV of 12% and sample B has a CV of 25%, you would say that sample B has more
variation, relative to its mean.
this is used to compare the variability of two or more sets of data even when they are expressed in different units of
measurement
Formula: CV = s (100%)
The higher the coefficient of variation, the greater the level of dispersion around the mean. It is generally
expressed as a percentage. Without units, it allows for comparison between distributions of values whose scales of
measurement are not comparable.
When we are presented with estimated values, the CV relates the standard deviation of the estimate to the
value of this estimate. The lower the value of the coefficient of variation, the more precise the estimate.
Distributions with a CV of less than 1 are considered to be low-variance, whereas those with CV higher than 1
are considered to be high-variance.
A researcher is comparing two multiple-choice tests with different conditions. In the first test, a typical
multiple-choice test is administered. In the second test, alternative choices (i.e. incorrect answers) are randomly
assigned to test takers. The results from the two tests are:
SD 10.2 12.7
b. If you were asked to take an exam, which is better for you to take?
CV = s (100%)
59.9
44.8
a. The randomized test has a larger variation from the mean since its CV is greater.
b. If I were asked to choose which exam to take, I will choose the regular test since it has a lower CV.
2. The mean score of a Statistics test of Class A is 80 with a standard deviation of 12 while Class B has a mean
score of 88 with a standard deviation of 15.
80
CVB = 15 (100) = 17.05%
88
3. Workers X and Y are assigned to the same job. The table below shows the results of their work performance over a
long period of time.
Mean time of completing job (in hours) Standard deviation (in hours)
Worker X 7 2
Worker Y 6.5 2
7 6.5
A measure of position is a method by which the position that a particular data value has within a given data set
can be identified. As with other types of measures, there is more than one approach to defining such a measure.
Although quantiles have a fairly simple conceptual interpretation, computation is entirely another matter. Many
different rules and formulas are in use, and none of them have become the overwhelming standard. If every data set
was infinite and every variable continuous, then all of the rules and formulas in use would give the same result.
Finite data sets and discrete variables generate issues usually not found in the infinite continuous case.
• Should a percentile give the percentage of data "at or below" rather than just "below"?
• In small data sets, how should the selection of an intermediate location be made (by rounding, or by
interpolation)?
NOTE: The Q2 = D5 = P50. These measurements are also equal to the value of the median.
Below is the frequency distribution of the results for the entrance examination scores of 60 students
18 – 26 6 22 132 54 – 62 8
27 – 35 11 31 341 63 – 71 3
72 – 80 1
36 – 44 17 40 680
i=9 n = 60
45 – 53 14 49 686
54 – 62 8 58 464
63 – 71 3 67 201
72 – 80 1 76 76
45 – 53 14 49 686 6
54 – 62 8 58 464 15
63 – 71 3 67 201 24
72 – 80 1 76 76 33
36 – 44 17 40 680 3 9 153
45 – 53 14 49 686 6 36 504
72 – 80 1 76 76 33 1089 1089
9504
n–1 59
n = 60
100 100
51n
P51 = LB + /100 – <cf i
17
10 10
6n
D6 = LB + /10 – <cf i
Q1 = 44.5 + 36 – 34 9 = 45.79
14
4 4
Q1 = LB + n/4 – <cf i
Q1 = 26.5 + 15 – 6 9 = 33.86
11
d. Q3 = 3n = 3(60) = 45
4 4
3n
Q3 = LB + /4 – <cf i
Q3 = 44.5 + 45 – 34 9 = 51.57
14
d. QD = Q3 – Q1
18 – 26 6 6 0 0 0 0
27 – 35 11 17 1 11 1 11
36 – 44 17 34 2 34 4 68
45 – 53 14 48 3 42 9 126
54 – 62 8 56 4 32 16 128
63 – 71 3 59 5 15 25 75
72 – 80 1 60 6 6 36 36
The table below shows the frequency distribution of the wages per day of the laborers in a certain eatery.
CI (Wages) f
180 – 184 8 2 16 4 32 50
175 – 179 12 1 12 1 12 42
170 – 174 10 0 0 0 0 30
STATISTICS is a scientific body of knowledge/branch of Mathematics that deals with the theory and method of
collecting, tabulating or presenting (summarizing or organizing), analyzing and interpreting numerical data
STATISTICS is a scientific body of knowledge/branch of Mathematics that deals with the theory and method of
collecting, tabulating or presenting (summarizing or organizing), analyzing and interpreting numerical data
Collection of Data is the process of gathering and obtaining numerical data
Tabulation or Presentation of Data involves summarizing of data or information in textual, graphical or tabular
method
Analysis of Data involves describing the data by using statistical methods & procedures
Analysis of data is also the process of extracting from the given data relevant information from which numerical
descriptions can be formulated
Interpretation of Data refers to the process of making or drawing conclusions based on the analyzed data
DESCRIPTIVE STATISTICS
is a statistical procedure concerned with describing the characteristics and properties of a group of persons,
places, or things
it involves gathering, organizing, presenting and describing the data
is used to synopsize data from a sample exercising the mean or standard deviation
INFERENTIAL STATISTICS
it uses sample data to make inferences about a population
it is calculated with the purpose of generalizing the findings from samples to populations it represents,
performing hypothesis testing, determining relationships among variables and making decisions
this kind of statistics uses the concept of probability --- the chance of an event to happen
PARAMETRIC TESTS assume a normal distribution of values. Parametric statistics are the most common type of
inferential statistics
PARAMETIRC TESTS make assumptions about a population or a data set.
NONPARAMETRIC TESTS make fewer assumptions about the data set.
NON-PARAMETRIC TESTS are used in cases where parametric tests are not appropriate and are used when the
distribution is skewed, the distribution is not known or the sample size is too small (n < 30)
Parametric Test for Independent Measures between Two Groups: A T-test is used to compare between the means of
two data sets, when the data is normally distributed.
Parametric Correlation Test: Pearson Product-Moment Correlation is a common parametric method of measuring
correlation between two variables.
The Spearman Rank Correlation is similar to the Pearson coefficient, but it is used when data are ordinal (usually
categorical data, set into a position on some kind.
The Mann-Whitney Test is used to compare the means between two groups of ordinal data.
POPULATION {N} refers to a large collection of objects, persons, places or things or a group of phenomena that have
something in common
The mean income of the subscribers of ABS-CBN TV Plus
The daily maximum temperatures in February for major Isabela towns
The number of registered voters in Region 02
SAMPLE {n} is a small portion or a subset of a population. SAMPLE is a representative group drawn from a sample.
The mean income of 250 subscribers of ABS-CBN TV Plus
The GPA of a freshmen class
PARAMETER is any numerical or nominal measurement describing some characteristics of a population. PARAMETER is
any summary number, like an average or percentage, that describes the entire population
The average weight of all middle-aged female Filipinos
The proportion of likely Filipino voters approving the president’s job
DATA are facts or a set of information or observation under study. DATA are individual pieces of factual information
recorded and used for the purpose of analysis
VARIABLE
is a characteristic or property of a population or sample which makes the numbers different from each other
it also refers to the observable phenomena of a person or object whereby the members of the group or set vary
from one another
this is considered to be the raw data or materials gathered by a researcher or investigator for statistical analysis
EXPERIMENTAL DATA are collected thru active intervention by the researcher to produce and measure change or to
create difference when the variable is altered. These type of data are often reproducible, but they often can be
expensive to do so.
SIMULATION DATA are generated by imitating the operation of a real-world process or system over-time using computer
test models.
For example, to predict weather conditions, economic models, chemical reactions, or seismic activity.
This method is used to try to determine what would or could happen under certain condition.
DERIVED/COMPILED DATA involves using existing data points, often from different data sources, to create new data thru
some sort of transformation such as arithmetic formula or aggregation .
For example, combining the twin cities metro area to create population density data.
QUALITATIVE DATA assume values that manifest the concept of attributes or categories, thus, they are sometimes
called categorical data
The notes taken during a focus group on the quality of a certain fast food chain
Responses from an open-ended questionnaire
Strands/courses of senior high school students
QUANTITATIVE DATA are data that are numerical in nature which are obtained through measuring or counting.
QUANTITATIVE DATA are used when a researcher is trying to address the “WHAT” or “HOW MANY” aspects of a
research question.
DISCRETE DATA only used in whole numbers and it continues and assume exact values only and can be obtained
through counting
CONTINUOUS DATA can take any value on an interval of real numbers and can assume all values between any two
specific values through measuring
An INDEPENDENT VARIABLE, sometimes called an EXPERIMENTAL or PREDICTOR variable, is a variable that is being
manipulated in an experiment in order to observe the effect on a DEPENDENT VARIABLE, sometimes called an
OUTCOME variable.
DEPENDENT VARIABLE
Test Mark (measured from 0-100)
INDEPENDENT VARIABLE
Revision Time (measured in hours)
Intelligence (measured using IQ score)
Treatment factors: brief vs. long-term treatment, in- Physiological variables: measures of physiological
patient vs. out-patient treatment responses such as heart rate, blood pressure and brain
wave activity
a source of data from which firsthand information is obtained usually by means of personal interview and actual
observation
SECONDARY SOURCE
SECONDARY DATA may be gathered from magazines, newspapers, television, radios, internet, etc.
– is one of the most effective methods of collecting original data or accurate responses
• Gathering of data may be done through a personal encounter between the interviewer and the interviewee.
• Data gathering by means of getting information with the use of written questionnaires.
REGISTRATION METHOD
• Data gathering may be done by asking complied files from different offices or organizations.
Example: Getting the number of LET passers in CoEd for the past 2
years
OBSERVATION METHOD
– is a specific method of investigation that makes possible use of all senses to measure or obtain outcomes/responses
from the object of study
Example: Determining the stimuli that causes a mentally-ill patient to go wild all of a
sudden.
EXPERIMENTATION METHOD
– used when the objective is to determine the cause and effect of a certain phenomenon under some controlled
conditions
TABULAR FORM – it is a more effective way of presenting relationships or comparisons of numerical data
it provides a more precise, systematic and orderly presentation of data in rows and columns through the use
of tables
It provides the reader a good grasp of the meaning of the quantitative relationship indicated in the report.
It tells the whole story without the necessity of mixing textual matter with figure.
The systematic arrangement of columns and rows makes them easily read and readily understood.
GRAPHICAL FORM – is the most effective form of presenting data for it uses visual form where important
relationships are brought out more clearly in pictorial form
Types: Line Graph, Bar Graph, Pie Chart, Scatter-Point Diagram, Pictogram,
LINE GRAPH – it shows relationships between two sets of quantities using a straight line
BAR GRAPH – it consists of rectangles of equal widths, either drawn vertically or horizontally, segmented or non –
segmented
PICTURE GRAPH or PICTOGRAM – it is a visual presentation of statistical quantitative by means of drawing pictures or
symbols related to the subject under study
MAP GRAPH or CARTOGRAPH – it is one of the best ways to present geographical data
-this kind of graph is always accompanied by a legend which tells us the meaning of the lines, colors or other symbols
used and positioned in a map
SCATTER POINT DIAGRAM – it is a graphical device to show the degree of relationship between two quantitative
variables
CIRCLE GRAPH or PIE CHART – it represents relationships of the different components of a single total as revealed in the
sectors of a circle where the angles or size of the sectors should be proportional to the percentage components of the
data which gives a total of 100%
Scale of Measurement
It is a classification that describes the nature of information within the values assigned to variables.
this is the most primitive level of measurement the nominal level of measurement is used when we want to
distinguish one object from another for identification purposes
Religion of Teachers
ORDINAL SCALE
it does not only classify items but also gives the order or ranks of classes, items or objects, however it does not
say anything about the differences in two positions with ranking
INTERVAL SCALE
it is the same as the ordinal level, with an additional property that we can determine meaningful amounts of
differences between the data
Salary of an Employee
Temperature measured in ⁰C
RATIO SCALE
it can differentiate between any two classes but it always starts from an absolute or true zero point or a true
zero exists
Weights of students
SAMPLING TECHNIQUES
are utilized to test the validity of conclusions or inferences from the sample to the population
A. RANDOM SAMPLING
is done by dividing the population into categories or strata and getting the members at random proportionate
to each stratum or sub–group
The Slovin’s formula is being used to calculate an appropriate sample size from a population
1. A professor of a certain college institution was commissioned by his dean to conduct an inquiry with regards to the
efficiency of all faculty members. If there are 350 faculty members, then determine the sample size he should
consider if he wants to have a margin of error of only 1%?
2. Mr. Andrews is conducting an inquiry regarding the Study Habits of Nursing students. If there are 975 students and
he wants to allow the margin of error of 5%, what would be the sample size that he should he
take?
N = 975 e = 5%
n = 283.64 or 284
3. The table below shows the population of 4 different courses of a particular college.
Psychology 150
BSE 370
BTVTED 500
BTLED 380
c. What is the total number of respondents that must be used for BTVTED?
fi = (f N) (n)
c. What is the total number of respondents that must be used for BTVTED?
LOTTERY/FISHBOWL SAMPLING
this is done by simply writing the names or numbers of all the member of the population in a small rolled
pieces of paper which are later placed in a container
CLUSTER SAMPLING
advantageous procedures if the population, with common characteristics, is spread out over a wide
geographical area
it also means as a practical sampling technique used if the complete list of the members of the population is
not available
MULTI-STAGE SAMPLING
in this method, not all elements are given equal opportunities to be selected as sample
it plays a major role in the selection of a particular item and/or in making decision in cases of incomplete
responses or observation
CONVENIENCE SAMPLING
this method has been widely used in television and radio programs to find out opinions of TV viewers and
listeners regarding controversial issue
is a non–probability sampling technique where the subjects are selected because of their convenient
accessibility and proximity to the researcher
QUOTA SAMPLING
this is a relatively quick and inexpensive method to operate since the choice of the number of persons or
elements to be included in a sample is done at the researcher’s own convenience or preference and is not
predetermined by some carefully operated randomizing plan
INCIDENTAL SAMPLING
this design is applied to those samples which are taken because they are the most available
the investigator simply takes the nearest individuals as subjects of the study until it reaches the desired
Example: The researcher can simply choose to ask those people around him or in a coffee shop where he is taking a
break.
FREQUENCY DISTRIBUTION
is a tabular arrangement of data by classes, showing the number of observation falling under a class–interval
1. CLASS INTERVAL (CI)– is a grouping or category defined by a lower limit and an upper limit
CLASS SIZE (i) – class width of the distribution which gives the distance between the lower and the upper limit of 3.
CLASS LIMITS – end numbers of a class interval
class interval
4. CLASS BOUNDARIES – are more precise expressions of the class limits by at least 0.5
they are considered to be the true class limits for they leave no gap
these are obtained by getting the average of the lower limit and the upper limit
CUMULATIVE FREQUENCY DISTRIBUTION – is a tabular arrangement of data by class interval whose frequency is
cumulated
Cumulative frequency is a table which shows the number of cases falling below or above a particular value
Relative Frequency – is obtained by dividing the frequency of each interval by the total frequency and always in
percent form